GridWorld RL Lab Q-learning and SARSA

A hands-on tool for the LSO Summer School 2026 at IIT Delhi. Edit the grid, pick an algorithm, change the settings, and watch the agent find its way from S to G.

The world (click a cell to toggle a wall, drag S or G, hover to inspect)

Press Play
The blue dot is the agent. The faint line is its current path. The gold dashed line is the shortest route. S is the start, G is the goal, grey is a wall, black is a cliff.

Learned value of each cell

Optimal -
The update equation will appear here as it trains.

Reward per episode

Algorithm & world

Editing the grid restarts learning, because the old values no longer apply. The Cliff layout shows the difference between Q-learning and SARSA most clearly.

Hyperparameters (live)

0.30
0.95
1.00
0.00
8

Controls

Presets & share

Episode 0
Step 0
Last steps to goal -
Phase -
Optimal length -
Episode reward 0.0
How to try it. Press Play and let it wander while exploration is high. Drag the ε slider down to about 0.05 and the agent walks straight to the goal, then keeps repeating the best path. Switch to the Cliff layout to see how Q-learning hugs the dangerous edge while SARSA plays it safe. Click any cell to drop a wall and watch the route change.