Leduc Poker — Tabular CFR

Section 1 · The game

Leduc Hold'em — 6 cards, 2 streets, fixed bets

Leduc is the standard research toy poker game one rung above Kuhn. It has the structural pieces Kuhn lacks — a public board card, two betting rounds, real range evolution — while staying small enough to solve tabularly in a browser tab.

Deck

6 cards · J Q K (×2 each)

Suit doesn't matter

Private cards

1 each

Drawn without replacement

Public board

1 card · revealed after round 1

From the remaining deck

Betting

$2 round 1 · $4 round 2

Max 1 raise per round (cap)

Ante

1 chip per player

Pot starts at 2

Showdown

Pair beats high card

Pair = private card matches board

Why Leduc. ~936 information sets total. Tabular CFR converges to ε < 0.01 in roughly 5,000–10,000 iterations — a few seconds in JavaScript. Every CFR-related paper (DeepStack, Deep CFR, Pluribus) uses Leduc as a benchmark.

Section 2 · Convergence

Exploitability over training iterations

Exploitability = average of best-response values for each player against the opponent's current average strategy. At a Nash equilibrium it equals zero; CFR drives it toward zero at rate O(1/√T).

Exploitability (mbb/g) 1/√T reference (theory) log scale on Y · linear on X

Section 3 · Average strategy

What the bot believes after training

Player	Card	R1 Hist	Board	R2 Hist	Strategy	Visits

Section 4 · What you should see

Sanity checks for "is this converging?"

Exploitability falls on a log-linear path. A few thousand iterations should drop it from ~1000 mbb/g down toward single digits. If it plateaus or rises, something is wrong.
K with K-on-board is a near-deterministic raise. Once a king is on the board and you also hold the king, you have the only possible pair — strategy should converge to almost-always raise.
J without a pair is a near-deterministic fold to bets. Lowest card, no pair — there's nothing to call with except as a bluff-catch.
Q is the bluff/value boundary. Q opens with mixed strategies that don't go to 0 or 1 — this is where the interesting GTO frequencies live.

Next step. Once exploitability sits below ~10 mbb/g, this blueprint is solid. The next page in the progression — leduc-nl-cfr.html — will replace the fixed bets with custom sizes (½ pot, pot, 2× pot, all-in), reusing the same engine and the same CFR loop.