Section 1 · What this is
Replacing the blueprint with a neural value function
The other Leduc pages train a tabular CFR blueprint and use it for both range tracking and as Villain's strategy during re-solving. DeepStack does neither. Instead, it trains a deep counterfactual value network that predicts equilibrium values at a depth limit. The network replaces "recurse to terminal" with a learned approximation — a fast intuition that says "given this board and these ranges, here's what each hand is worth."
This page demonstrates that the architecture works at Leduc scale:
- Train a small MLP on 10,000+ randomly generated round-2 situations
- Compare its predictions against ground truth from an exact round-2 solver
- If predictions match, DeepStack-style depth-limited solving is feasible
Section 2 · Network architecture
Small MLP — features in, counterfactual values out
Inputs: board card one-hot (3) · P1 range over J/Q/K (3) · P2 range over J/Q/K (3) · pot fraction (1). Outputs: cfv_P1(J/Q/K) and cfv_P2(J/Q/K). The zero-sum layer post-processes the raw output so the player CFVs sum to zero (a property of the equilibrium in zero-sum games), exactly as in DeepStack's outer network.
Section 3 · Training
Random round-2 situations → exact CFV → MLP regression
Phase 1 — Generate dataset (~3 sec). Generate 600 random (board, P1-range, P2-range, pot) tuples and compute ground-truth CFVs by solving each round-2 game exactly via tabular CFR. Each example is ~2 ms.
Phase 2 — Train MLP (~10 sec). Sample 32-example batches from the cached dataset, take Adam steps on MSE loss with linear LR decay. About 12,000 steps drives test MSE to ~0.3 chips² (RMSE ~0.5 chips) on fresh spots.
Section 4 · Evaluation
Network prediction vs exact ground truth at a random spot
Section 5 · How DeepStack uses the value net
Continual re-solving with depth-limited lookahead
Once the value network is trained, DeepStack uses it as the leaf evaluator in a depth-limited subgame solver. Every time it's the bot's turn:
- Build a subgame tree rooted at the current public state, restricted to a few actions (fold, call, 2–3 bet sizes, all-in)
- Run CFR for some iterations on this tree
- At the depth limit (e.g., end of the current betting round), instead of recursing, query the value network for the CFV vector
- Use the predicted CFVs as terminal values in the regret update
- Read off the strategy at the root after CFR converges
- Carry forward own range (Bayes-updated) and opp CFVs to the next decision
This page only validates step (3) — that a small MLP can learn to predict round-2 CFVs accurately enough to substitute for tree-walk recursion. Wiring the trained net into a full continual re-solver is the natural follow-up page.