Convergence Metrics — Leduc Poker

What This Page Shows

Vanilla CFR on Leduc poker, three convergence metrics tracked in parallel

Knowing when to stop training is the most practical question in CFR. Exploitability is the gold standard, but it's expensive. Convergence Indicator (CI) and Max Positive Regret Rate are cheaper proxies that solver software uses in practice. This page runs Vanilla Counterfactual Regret Minimization on Leduc poker (6-card deck, two betting rounds, community card revealed between rounds) and tracks all three metrics live, so you can compare how they decay and which thresholds correspond to "essentially converged." Leduc has a unique Nash equilibrium, so all three metrics are expected to converge cleanly — unlike Kuhn poker, which has a one-parameter family of Nash equilibria that prevents CI from settling.

Live Metrics

Current values, sparkline history, and convergence threshold

Exploitability

—

milli-big-blinds per game (mbb/g)

Target: — Converged: < 1 mbb/g

ε(σ) = ½ · ( BR(P1) + BR(P2) ) × 1000

Convergence Indicator (CI)

—

reach-weighted TV drift × 1000

Target: — Converged: < 5

CI = Σ_I w(I)·TV(σ̄_T(I), σ̄_{T/2}(I)) / Σ_I w(I) × 1000 where w(I) = total strategy sum

Max Positive Regret Rate

—

max R⁺(I,a) / iteration

Target: — Converged: < 10⁻³

max R⁺/T = max_{I,a} max(0, R(I,a)) / T

Combined Convergence Chart

All three metrics over iterations — log scale, normalized to their thresholds

Each line is plotted as value / threshold on a log scale. When a line crosses below 1.0 (the dashed white line), that metric has reached its convergence threshold. Look at the order the lines cross and the shape of decay — they tell different but correlated stories.

Exploitability / 1 mbb/g CI / 5 Max positive regret rate / 10⁻³ Convergence threshold (= 1)

How Each Metric Works

The equation each metric uses

Exploitability

Equationε(σ̄) = ½·(BR_value(P1, σ̄_2) + BR_value(P2, σ̄_1))

Tells youExact Nash distance. O(1/√T) decay.

Threshold< 1 mbb/g = essentially solved.

Convergence Indicator (CI)

EquationCI = (Σ_I w(I)·TV(σ̄_T(I), σ̄_{T/2}(I)) / Σ_I w(I)) × 1000
where w(I) = total strategy sum, comparison window is half the elapsed iterations

Tells youDrift between σ̄ now and σ̄ at the half-way point. O(1/√T) decay — same rate as exploitability.

Threshold< 5 now corresponds roughly to exploitability ≈ 1 mbb/g, not just "stable."

Max Positive Regret Rate

Equationmax R⁺/T = max_{I,a} max(0, R(I,a)) / T

Tells youCFR-theorem upper bound on exploitability: ε ≤ const · max R⁺/T. O(1/√T) decay.

Threshold< 10⁻³ ≈ exploit ≲ a few mbb/g.

Side-by-Side Comparison

The trade-offs at a glance

	Exploitability	Convergence Indicator	Max Positive Regret Rate
What it measures	Distance from Nash, exact	Strategy drift between snapshots	Max positive cumulative regret per iteration
Theoretical link to Nash	Direct (= Nash gap)	Indirect (proxy)	Upper bound (CFR theorem)
Cost on Leduc	~10–50 ms (BR DP)	~0.1 ms	~0.1 ms
Cost on Heads-Up Limit Texas Hold'em	CPU-days with engineering	~ms (linear scan)	~ms (linear scan)
Cost on Heads-Up No-Limit Texas Hold'em	Infeasible without abstraction; use Local Best Response as lower bound	Cheap, runs every iteration	Cheap, runs every iteration
Best used for	Final validation, paper claims	Live monitoring, early stopping	Sanity check, theoretical bound
Convergence threshold	< 1 mbb/g	< 5	< 10⁻³