A hands-on tool for the LSO Summer School 2026 at IIT Delhi. Edit the grid, pick an algorithm, change the settings, and watch the agent find its way from S to G.
The world (click a cell to toggle a wall, drag S or G, hover to inspect)
Press Play
The blue dot is the agent. The faint line is its current path. The gold dashed line is the shortest route. S is the start, G is the goal, grey is a wall, black is a cliff.
Learned value of each cell
Optimal -
The update equation will appear here as it trains.
Reward per episode
Algorithm & world
Editing the grid restarts learning, because the old values no longer apply. The Cliff layout shows the difference between Q-learning and SARSA most clearly.
Hyperparameters (live)
0.30
0.95
1.00
0.00
8
Controls
Presets & share
Episode 0
Step 0
Last steps to goal -
Phase -
Optimal length -
Episode reward 0.0
How to try it. Press Play and let it wander while exploration is high. Drag the ε slider down to about 0.05 and the agent walks straight to the goal, then keeps repeating the best path. Switch to the Cliff layout to see how Q-learning hugs the dangerous edge while SARSA plays it safe. Click any cell to drop a wall and watch the route change.