Leduc DeepStack — Value Network POC

No blueprint. Just a neural value function trained to predict round-2 CFVs — the DeepStack way. Back to index
Examples0
Train loss
Test MSE

Section 1 · What this is

Replacing the blueprint with a neural value function

The other Leduc pages train a tabular CFR blueprint and use it for both range tracking and as Villain's strategy during re-solving. DeepStack does neither. Instead, it trains a deep counterfactual value network that predicts equilibrium values at a depth limit. The network replaces "recurse to terminal" with a learned approximation — a fast intuition that says "given this board and these ranges, here's what each hand is worth."

This page demonstrates that the architecture works at Leduc scale:

Why this matters. At HUNL scale, recursing to the river inside a subgame is too expensive. DeepStack's value network turns a 1017 tree walk into a single forward pass. For Leduc the games are tiny so we don't need this — but verifying it works here proves the architecture is sound.

Section 2 · Network architecture

Small MLP — features in, counterfactual values out

Input (10)
board · ranges · pot
Hidden 1
64 · ReLU
Hidden 2
32 · ReLU
Zero-sum
enforce constraint
Output (6)
CFV per (player, card)

Inputs: board card one-hot (3) · P1 range over J/Q/K (3) · P2 range over J/Q/K (3) · pot fraction (1). Outputs: cfv_P1(J/Q/K) and cfv_P2(J/Q/K). The zero-sum layer post-processes the raw output so the player CFVs sum to zero (a property of the equilibrium in zero-sum games), exactly as in DeepStack's outer network.

Section 3 · Training

Random round-2 situations → exact CFV → MLP regression

Step 0 /
loss = idle

Phase 1 — Generate dataset (~3 sec). Generate 600 random (board, P1-range, P2-range, pot) tuples and compute ground-truth CFVs by solving each round-2 game exactly via tabular CFR. Each example is ~2 ms.
Phase 2 — Train MLP (~10 sec). Sample 32-example batches from the cached dataset, take Adam steps on MSE loss with linear LR decay. About 12,000 steps drives test MSE to ~0.3 chips² (RMSE ~0.5 chips) on fresh spots.

Section 4 · Evaluation

Network prediction vs exact ground truth at a random spot

Click "Train value net", then "New random spot"
Network prediction
Ground truth (exact solver)

Section 5 · How DeepStack uses the value net

Continual re-solving with depth-limited lookahead

Once the value network is trained, DeepStack uses it as the leaf evaluator in a depth-limited subgame solver. Every time it's the bot's turn:

  1. Build a subgame tree rooted at the current public state, restricted to a few actions (fold, call, 2–3 bet sizes, all-in)
  2. Run CFR for some iterations on this tree
  3. At the depth limit (e.g., end of the current betting round), instead of recursing, query the value network for the CFV vector
  4. Use the predicted CFVs as terminal values in the regret update
  5. Read off the strategy at the root after CFR converges
  6. Carry forward own range (Bayes-updated) and opp CFVs to the next decision
The whole game without a blueprint. No precomputed strategy table. No card abstraction at decision time. No action translation. Just: "subgame solver + value network at the depth limit + range/CFV state passed between decisions." That's DeepStack.

This page only validates step (3) — that a small MLP can learn to predict round-2 CFVs accurately enough to substitute for tree-walk recursion. Wiring the trained net into a full continual re-solver is the natural follow-up page.