Convergence Metrics — Leduc Poker

Three ways to measure "is this strategy at Nash yet?" running side-by-side Vanilla CFR External Sampling MCCFR Glossary →
Iteration0
Hands seen0
Snapshot every10 iter

What This Page Shows

Vanilla CFR on Leduc poker, three convergence metrics tracked in parallel

Knowing when to stop training is the most practical question in CFR. Exploitability is the gold standard, but it's expensive. Convergence Indicator (CI) and Max Positive Regret Rate are cheaper proxies that solver software uses in practice. This page runs Vanilla Counterfactual Regret Minimization on Leduc poker (6-card deck, two betting rounds, community card revealed between rounds) and tracks all three metrics live, so you can compare how they decay and which thresholds correspond to "essentially converged." Leduc has a unique Nash equilibrium, so all three metrics are expected to converge cleanly — unlike Kuhn poker, which has a one-parameter family of Nash equilibria that prevents CI from settling.

Live Metrics

Current values, sparkline history, and convergence threshold

Exploitability
milli-big-blinds per game (mbb/g)
Target: Converged: < 1 mbb/g
ε(σ) = ½ · ( BR(P1) + BR(P2) ) × 1000
Convergence Indicator (CI)
reach-weighted TV drift × 1000
Target: Converged: < 5
CI = Σ_I w(I)·TV(σ̄_T(I), σ̄_{T/2}(I)) / Σ_I w(I) × 1000 where w(I) = total strategy sum
Max Positive Regret Rate
max R⁺(I,a) / iteration
Target: Converged: < 10⁻³
max R⁺/T = max_{I,a} max(0, R(I,a)) / T

Combined Convergence Chart

All three metrics over iterations — log scale, normalized to their thresholds

Each line is plotted as value / threshold on a log scale. When a line crosses below 1.0 (the dashed white line), that metric has reached its convergence threshold. Look at the order the lines cross and the shape of decay — they tell different but correlated stories.

Exploitability / 1 mbb/g CI / 5 Max positive regret rate / 10⁻³ Convergence threshold (= 1)

How Each Metric Works

The equation each metric uses

Exploitability

Equationε(σ̄) = ½·(BR_value(P1, σ̄_2) + BR_value(P2, σ̄_1))
Tells youExact Nash distance. O(1/√T) decay.
Threshold< 1 mbb/g = essentially solved.

Convergence Indicator (CI)

EquationCI = (Σ_I w(I)·TV(σ̄_T(I), σ̄_{T/2}(I)) / Σ_I w(I)) × 1000
where w(I) = total strategy sum, comparison window is half the elapsed iterations
Tells youDrift between σ̄ now and σ̄ at the half-way point. O(1/√T) decay — same rate as exploitability.
Threshold< 5 now corresponds roughly to exploitability ≈ 1 mbb/g, not just "stable."

Max Positive Regret Rate

Equationmax R⁺/T = max_{I,a} max(0, R(I,a)) / T
Tells youCFR-theorem upper bound on exploitability: ε ≤ const · max R⁺/T. O(1/√T) decay.
Threshold< 10⁻³ ≈ exploit ≲ a few mbb/g.

Side-by-Side Comparison

The trade-offs at a glance

Exploitability Convergence Indicator Max Positive Regret Rate
What it measures Distance from Nash, exact Strategy drift between snapshots Max positive cumulative regret per iteration
Theoretical link to Nash Direct (= Nash gap) Indirect (proxy) Upper bound (CFR theorem)
Cost on Leduc ~10–50 ms (BR DP) ~0.1 ms ~0.1 ms
Cost on Heads-Up Limit Texas Hold'em CPU-days with engineering ~ms (linear scan) ~ms (linear scan)
Cost on Heads-Up No-Limit Texas Hold'em Infeasible without abstraction; use Local Best Response as lower bound Cheap, runs every iteration Cheap, runs every iteration
Best used for Final validation, paper claims Live monitoring, early stopping Sanity check, theoretical bound
Convergence threshold < 1 mbb/g < 5 < 10⁻³