0. Problem redefinition: from a small game to a dynamical system
At first, I thought Serpent was a Snake variant: eat objects, avoid obstacles, survive as long as possible. That description is not wrong, but it is too shallow.
Once Greedy, A*, Lookahead, NoGrow, TailGuard, Depth-2, and MinLength strategies were tested, the game stopped looking like a simple “which AI scores higher” exercise. It exposed a structure: score grows, length grows, size perturbs geometry, walls and hazards trim the body, and timeout forces the system to eat periodically.
So the real question is not “how do we get a high score?” It is: what is the correct long-term control variable in Serpent?
The answer is not simply length, and it is not simply size. The right aggregate variable is:
1. Mathematical abstraction of the game rules
1.1 State variables
Let the current game state be:
| Symbol | Meaning |
|---|---|
N(t) | Snake length, measured in body segments. |
q(t) | Snake size or thickness, taking values 1, 2, or 3. |
S(t) | Current score. |
τ(t) | Time since the last normal food or bonus was eaten. |
M(t) | Map state: food, bonus, hazard, obstacles, and the body shape. |
The code constants translate directly into constraints: MAX_SIZE = 3, no-progress death at NO_PROGRESS_TIMEOUT = 30000ms, normal food gives +10, a big bonus gives +50, hazards give -20, obstacle score costs are -5/-10/-15, and obstacle segment losses are 1/1/2.
1.2 Action variables
On each tick, the agent directly controls direction:
At the strategy level, however, it is choosing between action types:
| Action | Notation | Effect |
|---|---|---|
| Eat normal food | F | +10 score, +1 segment, reset τ |
| Eat big bonus | B | +50 score, +1 segment, size may increase, reset τ |
| Hit fragile wall | W1 | -5 score, -1 segment, keep at least 1 segment |
| Hit brick wall | W2 | -10 score, -1 segment, keep at least 1 segment |
| Hit iron wall | W3 | -15 score, -2 segments, keep at least 1 segment |
| Eat hazard | H | -20 score, -2 segments, keep at least 1 segment, does not reset τ |
| Wait / detour | Ø | No score or length change, but consumes timeout margin. |
lastFoodEaten. So hazards are pressure-relief resources, not survival-clock resources.2. Basic resource equations
2.1 Length equation
Over a time window, let F be normal-food count, B bonus count, W1/W2/W3 wall-hit counts, and H hazard count. The nominal length change is:
But the game has a crucial lower bound:
So the actual trimming is:
This is why a one-segment snake does not get shortened further by walls or hazards. That is not a minor implementation detail; it is one of the reasons small-length steady states can exist.
2.2 Score equation
If obstacle type is treated as a random variable, obstacle probabilities are:
The expected score cost of a random wall hit is:
The expected segment loss of a random wall hit is:
So the expected cost per trimmed segment is approximately:
Therefore:
Key conclusion: as long as the agent can keep trimming away the length it gains, both normal food and big bonuses are positive-return resources.
This is the transition from “avoid obstacles” to “manage a resource account.” A wall is not simply failure; it is paid trimming.
2.3 Timeout constraint
The snake dies after 30 seconds without eating, so:
In long-term strategies, food is not only a score source. It is also a survival clock reset. If average food interval is T_food, then:
In practice the strategy needs margin:
Walls and hazards do not reset τ. The long-run loop must include food or bonus. Walls and hazards can trim pressure, but they cannot keep the game clock alive.
3. Geometric pressure: the real control variable
Early analysis only tracked N = length. That is necessary, but insufficient. In Serpent, each segment occupies more cells as size grows.
| size | Cells per segment |
|---|---|
| 1 | 1×1 = 1 |
| 2 | 2×2 = 4 |
| 3 | 3×3 = 9 |
Define approximate geometric pressure:
A more exact version is:
Because thick adjacent segments overlap, Nq² is an upper bound or proxy rather than exact occupied area. But it captures the essential fact: the same length is not the same state under different sizes.
3.1 Wall trimming efficiency at different sizes
A random wall hit trims 1.15 segments in expectation. If size is q, expected geometric area removed is:
Expected score cost remains about 8, so cost per cleared geometric cell is:
| size | Expected area trimmed | Cost per occupied cell |
|---|---|---|
| 1 | 1.15 | 8 / 1.15 ≈ 6.96 |
| 2 | 4.60 | 8 / 4.60 ≈ 1.74 |
| 3 | 10.35 | 8 / 10.35 ≈ 0.77 |
Counterintuitive conclusion: after the snake gets thicker, walls become more efficient geometric pressure valves.
But thickness also makes navigation harder, increases accidental collisions, and reduces reachable space. That is the central tension of Serpent: size improves trimming efficiency while amplifying geometric risk.
3.2 Steady-state conditions
The mature long-term target is not N ≤ constant. It is:
| size | Suggested length cap | pressure |
|---|---|---|
| 1 | N ≤ 5 | P ≤ 5 |
| 2 | N ≤ 3 | P ≤ 12 |
| 3 | N ≤ 2 or 3 | P ≤ 18 or 27 |
The control objective becomes:
4. L1–L17: strategy evolution and implementation details
Each strategy level is not just a version number. It is a problem redefinition. The improvement is not that the code becomes more complicated; it is that the objective function becomes closer to the real structure of the game.
Greedy and A*: shortest-path thinking
baseline / pathfindingL1 Greedy simply pursues the nearest or highest-return target:
It optimizes immediate gain and ignores future reachability, self-collision, pressure, and timeout margin. It is a useful baseline, but not a sustainable strategy.
L2 A* upgrades the problem into real path search:
Implementation notes
open_set: priority queue ordered by f(n)=g(n)+h(n)
g(n): actual path length from the head to cell n
h(n): Manhattan distance to the target
blocked: snake body + non-passable obstacles
neighbors: four directions, with wrap-around if the engine allows it
A* is better than Greedy because it respects static blockers. But it still answers only one question: can I reach the target now? It does not answer: after I eat this target, will I still have an exit?
Baseline evidence. Useful for showing what pure shortest-path planning misses.
Lookahead / Chase Tail: safety after eating
reachability-constrained planningL3a adds the first important safety test: before committing to food, simulate the path to the food, then check whether the new head can still reach the new tail.
Implementation notes
def safe_to_eat(game, food):
path = astar(game.head, food.position, game.blocked_cells)
if not path:
return False
future = simulate(game, path)
return bfs_reachable(future.head, future.tail, future.blocked_cells)
The tail is a moving exit. If the head can still reach the tail after eating, the agent has at least one dynamic escape route. The limitation is that it still evaluates one food at a time and does not maximize future space.
Evidence for the tail-reachability idea.
Economic Decision: a wall is not an obstacle, it is a price
resource-cost optimizationThis is the first major conceptual turn. A wall is not an absolute blocker. It is a passable cell with a cost.
Normal food after a fragile wall is still profitable:
A big bonus can remain profitable even after hitting a wall:
Implementation notes
def cell_cost(cell):
if cell.is_empty:
return 1
if cell.is_fragile_wall:
return 1 + wall_weight * 5
if cell.is_brick_wall:
return 1 + wall_weight * 10
if cell.is_iron_wall:
return 1 + wall_weight * 15
return INF
The failure mode is also important: the score ledger says bonuses are valuable, but the geometry ledger says size growth can destroy future reachability.
Use this to show wall-as-price behavior, not as a best-score proof.
Why a big bonus can become a liability
score-rush / q=1 metaNoGrow keeps the useful part of A*/economic planning but refuses the big bonus. The strict version treats bonus cells as forbidden.
This simplifies the geometry dramatically. In short-term score-rush mode, preserving mobility can be worth more than the bonus's +50.
Implementation notes
def food_score(food, game):
if food.type == "big_reward":
return -INF # NoGrow-Strict
return 10 - distance_penalty(food)
The core lesson is not “bonuses are bad.” It is narrower and more precise: in the short-term score-rush meta, size growth can be a negative asset because it increases pressure too quickly.
seed0 reaches 1500. A key stage ceiling.
Space: after eating, how much room remains?
future feasible volumeL5 asks a better question than “can I reach the food?” It asks: after I eat the food, how much reachable space do I still have?
Implementation notes
def reachable_area(game, start):
queue = [start]
visited = {start}
while queue:
cell = queue.pop()
for nxt in neighbors(cell):
if nxt not in visited and not blocked(nxt):
visited.add(nxt)
queue.append(nxt)
return len(visited)
This is close to preserving a controllable region. It is no longer only about immediate path length; it is about future feasible state volume.
seed0 reaches 1500. A strong space-preservation proof case.
TailGuard and Depth-2: the score-rush peak
hard safety constraint + shallow planningL6 turns tail reachability from a bonus into a hard safety constraint:
L7 adds a second visible-food lookahead:
Implementation notes
def choose_food_tailguard(game):
candidates = []
for food in visible_foods(game):
path = astar(game.head, food.position, game)
if not path:
continue
future = simulate_path(game, path)
if can_reach_tail(future):
candidates.append((food, path, evaluate(food, path, future)))
return best(candidates)
L6/L7 represent the strongest short-term score-rush line: reject bonuses, keep size at 1, eat normal food, and verify that the post-eating ecosystem is still alive.
seed0 reaches 2190. The strongest score-rush evidence.
seed0 reaches 1660. Useful for demonstrating shallow two-step planning.
Failed branches: TailStall and CycleCruise
negative evidenceL11 TailStall
Hypothesis: when not eating, follow the tail to stall safely. Failure: local tail reachability is not a sufficient condition for long-term shape health. The body can still fold into a bad topology.
L12 CycleCruise / Hamiltonian
Hypothesis: a fixed Hamiltonian-style cycle prevents self-collision. Failure: Serpent has timeout, random food, dynamic obstacles, hazards, and bonuses. A fixed cruise can be too slow and die by timeout.
seed0 reaches about 1010, then self-collision. Good failure evidence.
seed0 reaches 30, then timeout. Good failure evidence.
Long-term control: from slowing a long snake to small-length steady states
endurance-controlThe long-control line begins with a simple positive-return loop:
So in theory:
L8 tries a fixed target length. L9 relaxes strict equilibrium into small positive drift. L10/L13 hybridize this with safer eating. L14 creates the real turn: instead of trying to stabilize a dangerous long snake, keep the system inside a small-length safe set.
L16/L17 then overturn the strong NoGrow assumption: the agent does not need to stay size 1 forever. If pressure is small, even size 3 can remain stable.
Long-control state machine
if timeout_soon:
seek_safe_food_or_bonus()
elif pressure_high:
trim_by_wall_or_hazard()
elif pressure_low:
take_profitable_food()
else:
safe_wander_or_follow_tail()
5000-tick clipped segment. Evidence for the small-thick-snake steady mode.
Long-run steady-state evidence. This is not a full 106-minute replay.
5. Hazard correction
A hazard has this effect:
So its cost per trimmed segment is:
| Trimming resource | Cost per segment |
|---|---|
| Fragile wall | 5 |
| Random obstacle | 6.96 |
| Iron wall | 7.5 |
| Brick wall | 10 |
| Hazard | 10 |
Hazards are not high-quality trimming resources. Their value is emergency relief: they cut two segments at once, and they can substitute for walls if no safe wall is nearby.
6. L18: theoretical design for PressureControl-Profit
The next strategy should not be called MinLength-Profit. A more accurate name is:
Objective:
6.1 State machine
LOW_PRESSURE:
eat normal food freely
take a big bonus if its path is safe and reward is worth the pressure shock
MEDIUM_PRESSURE:
prefer normal food
avoid big bonus unless timeout pressure is high
HIGH_PRESSURE:
stop taking rewards
seek fragile/iron walls for trimming
use hazard as emergency trimming
TIMEOUT_PRESSURE:
ignore pressure temporarily; eat the nearest safe food or bonus
RECOVERY_AFTER_BONUS:
if size just increased, immediately reduce pressure
6.2 Per-action value function
The pressure weight λ should increase with current pressure:
The higher pressure gets, the more the controller should value trimming. When pressure is low, it can afford to eat and profit.
6.3 Action valuation
Normal food
Big bonus
Wall hit
Hazard
A hazard is only worth eating when λ is large: when pressure is high enough that emergency trimming matters more than the score penalty.
7. Nine problem redefinitions
| Stage | Problem frame | Model | Key change |
|---|---|---|---|
| 1 | Snake AI problem | minimize distance to food | Greedy / A* |
| 2 | Safe pathfinding | Reachable(head', tail') = true | Eating must preserve an exit |
| 3 | Economic problem | Net = reward - cost | Walls are paid trimming |
| 4 | NoGrow meta | q=1 fixed | Short-term score-rush rejects bonus |
| 5 | Steady-state control | E[ΔN]=0, E[ΔS]>0 | Wall hits cancel food growth |
| 6 | Low positive drift | 0≤E[ΔN]≤ε | Strict cancellation is unnecessary |
| 7 | Small-length safe set | A={N small} | Do not stabilize a long snake |
| 8 | Pressure control | P=Nq² | Control geometric pressure |
| 9 | Absorb bonus | q∈{1,2,3}, P≤Pmax | Thickness is fine if pressure stays controlled |
The real value of this research cycle is that it did not stop at “which AI scored higher.” It kept asking: what is the long-term dynamical system inside this game?
8. What can be claimed, and what cannot
8.1 Weak proposition: Serpent admits long-run positive-return pressure-controlled strategies
Under the following conditions:
- The agent can find normal food or a bonus before
τ<30sis violated; - The agent can find walls or hazards as trimming resources at finite cost;
- The agent controls pressure so that
P=Nq²stays near an upper bound; q≤3, so size is bounded;- Obstacles and hazards do not create a permanent unreachable dead state.
Then simulation evidence and positive expected-value accounting support strategies with:
In plain language: as long as the controller can keep bringing geometric pressure back into the safe region, score can keep drifting upward.
8.2 The stronger theorem we cannot claim yet
We cannot honestly claim:
The reasons are concrete: map generation is random; food, bonus, and hazard locations can be bad for a while; the agent is not doing perfect dynamic programming; timeout is a hard constraint; size 3 reduces reachability; true occupied cells are more complex than Nq²; and wall-collision accounting may still need calibration.
9. Replay citation review
The article does not embed all 22 recordings. It uses selected replays as evidence for specific claims.
| Use in article | Replay | Judgment |
|---|---|---|
| baseline | astar_seed0.json | Not a high-score run, but useful for showing shortest-path limitations. |
| wall economy | economic_seed0.json | Shows walls as passable priced cells; not used as best-score evidence. |
| NoGrow stage ceiling | no_grow_strict_seed0.json | 1500. Core evidence. |
| space scoring | l5_space_seed0.json | 1500. Good article evidence. |
| score-rush peak | l6_tail_guard_seed0.json | 2190. The most important score-rush replay. |
| Depth-2 planning | l7_depth2_seed0.json | 1660. Useful for showing shallow two-step planning. |
| failed branches | l11_tail_stall_seed0.json / l12_cycle_seed0.json | Accurate negative evidence. |
| pressure steady state | l16_min_length_all_size_seed0.json / l17_endurance_seed1.json | Used as steady-state segments, not full-game records. |
The remaining recordings stay in the replay library as appendices. Long-run or clipped recordings such as L8/L9/L10/L13 should not be cited as final-score proof; they are better used as supplementary strategy evidence.