Section 1 · The game
Leduc-NL — same cards as Leduc, no-limit bet sizes
Same six cards (J Q K, two of each), same single public board. The change: instead of a fixed $2/$4 bet, the actor picks a bet size from a menu of pot fractions. This is the simplest possible toy version of "no-limit" betting, and it forces the algorithm to confront action abstraction for the first time.
Stack
10 chips each
+ 1 chip ante (2 in pot)
Bet menu
x · h · p · t · a
check, ½ pot, pot, 2× pot, all-in
Facing a bet
f · c
fold or call (no raise — kept simple)
Sizes that exceed stack
Collapse into all-in
If
p = stack, only a is offeredStreets
Two — preflop & flop
Board card revealed between
Showdown
Pair beats high card
Pair = private rank matches board
What this teaches. Each decision now has up to 5 actions instead of 2–3. Some bet sizes become illegal as stacks shrink — your action set is state-dependent. The infoset count roughly doubles (288 → 552), but convergence per iteration is essentially identical to fixed-bet Leduc — see Section 5. What grows is wall-clock time per iter, not iteration count.
Section 2 · What changed from Leduc
The CFR algorithm is identical — only the action set changes
Leduc (fixed bets)
Bet menu: x · b · r
Two betting rounds, $2 then $4. Max 1 raise. ~288 infosets total. Strategy at any infoset is a length-2 or length-3 vector. Converges in seconds.
Leduc-NL (custom sizes)
Bet menu: x · h · p · t · a
Same two streets, same showdown. Bet size now depends on current pot, not a fixed amount. Stacks shrink as betting grows; oversized bets collapse into all-in. ~552 infosets (roughly 2× Leduc). Same CFR, same convergence rate per iteration.
Section 3 · Convergence
Exploitability over training iterations
Same exploitability metric — average best-response value for each player against the opponent's average strategy. Drops at the standard O(1/√T) rate.
Exploitability (mbb/g)
1/√T reference (theory)
log scale on Y · linear on X
Section 4 · Average strategy
What sizings the bot picks at each infoset
| Player | Card | R1 Hist | Board | R2 Hist | Pot · Stack | Strategy |
|---|
Action codes.
x check ·
h ½ pot ·
p pot ·
t 2× pot ·
a all-in ·
c call ·
f fold
Section 5 · What you should see
Sanity checks for "is this converging?"
- K opens with bigger bets than J. The strongest preflop hand has the most equity to protect — average bet size with K should be heavier than with J. If J ever bets, it's a bluff at the smallest size.
- Bet sizing on the flop tracks board interaction. A K + K board is the nut — should mostly raise large or all-in. A J on a Q board (no pair, weak) should mostly check or fold.
- Pure bluffs use small sizes; pure value uses large sizes. Same fold equity for less risk on bluffs; max value extraction on the nuts. The interesting frequencies sit in the middle.
- Exploitability falls at essentially the same rate as fixed-bet Leduc. Both games reach ~5 mbb/g at 10K iterations and ~2.5 mbb/g at 30K. Mid-training (500–3K iter) NL trails by 10–20%, but catches up by 10K. CFR's O(1/√T) bound doesn't depend on action count — more actions means more places to update regret, not slower per-place convergence. The cost of NL is wall-clock time per iter: ~3× the decision nodes per pass, so each iter takes ~2-3× longer in seconds.
Why this is the right step. By solving Leduc-NL you've now seen the exact problem NLH presents — a bet menu that can be anything, with state-dependent legal sets — at toy scale where you can debug it. The next step leduc-nl-resolver.html takes one specific spot from this game and re-solves it on demand with a custom bet set chosen at runtime. That's the architecture Pluribus and DeepStack actually run.