Leduc-NL — Tabular CFR with Custom Bet Sizes

Section 1 · The game

Leduc-NL — same cards as Leduc, no-limit bet sizes

Same six cards (J Q K, two of each), same single public board. The change: instead of a fixed $2/$4 bet, the actor picks a bet size from a menu of pot fractions. This is the simplest possible toy version of "no-limit" betting, and it forces the algorithm to confront action abstraction for the first time.

Stack

10 chips each

+ 1 chip ante (2 in pot)

Bet menu

x · h · p · t · a

check, ½ pot, pot, 2× pot, all-in

Facing a bet

f · c

fold or call (no raise — kept simple)

Sizes that exceed stack

Collapse into all-in

If p = stack, only a is offered

Streets

Two — preflop & flop

Board card revealed between

Showdown

Pair beats high card

Pair = private rank matches board

What this teaches. Each decision now has up to 5 actions instead of 2–3. Some bet sizes become illegal as stacks shrink — your action set is state-dependent. The infoset count roughly doubles (288 → 552), but convergence per iteration is essentially identical to fixed-bet Leduc — see Section 5. What grows is wall-clock time per iter, not iteration count.

Section 2 · What changed from Leduc

The CFR algorithm is identical — only the action set changes

Leduc (fixed bets)

Bet menu: x · b · r

Two betting rounds, $2 then $4. Max 1 raise. ~288 infosets total. Strategy at any infoset is a length-2 or length-3 vector. Converges in seconds.

Leduc-NL (custom sizes)

Bet menu: x · h · p · t · a

Same two streets, same showdown. Bet size now depends on current pot, not a fixed amount. Stacks shrink as betting grows; oversized bets collapse into all-in. ~552 infosets (roughly 2× Leduc). Same CFR, same convergence rate per iteration.

Section 3 · Convergence

Exploitability over training iterations

Same exploitability metric — average best-response value for each player against the opponent's average strategy. Drops at the standard O(1/√T) rate.

Exploitability (mbb/g) 1/√T reference (theory) log scale on Y · linear on X

Section 4 · Average strategy

What sizings the bot picks at each infoset

Player	Card	R1 Hist	Board	R2 Hist	Pot · Stack	Strategy

Action codes. x check · h ½ pot · p pot · t 2× pot · a all-in · c call · f fold

Section 5 · What you should see

Sanity checks for "is this converging?"

K opens with bigger bets than J. The strongest preflop hand has the most equity to protect — average bet size with K should be heavier than with J. If J ever bets, it's a bluff at the smallest size.
Bet sizing on the flop tracks board interaction. A K + K board is the nut — should mostly raise large or all-in. A J on a Q board (no pair, weak) should mostly check or fold.
Pure bluffs use small sizes; pure value uses large sizes. Same fold equity for less risk on bluffs; max value extraction on the nuts. The interesting frequencies sit in the middle.
Exploitability falls at essentially the same rate as fixed-bet Leduc. Both games reach ~5 mbb/g at 10K iterations and ~2.5 mbb/g at 30K. Mid-training (500–3K iter) NL trails by 10–20%, but catches up by 10K. CFR's O(1/√T) bound doesn't depend on action count — more actions means more places to update regret, not slower per-place convergence. The cost of NL is wall-clock time per iter: ~3× the decision nodes per pass, so each iter takes ~2-3× longer in seconds.

Why this is the right step. By solving Leduc-NL you've now seen the exact problem NLH presents — a bet menu that can be anything, with state-dependent legal sets — at toy scale where you can debug it. The next step leduc-nl-resolver.html takes one specific spot from this game and re-solves it on demand with a custom bet set chosen at runtime. That's the architecture Pluribus and DeepStack actually run.