What This Page Shows
Vanilla CFR on Leduc poker, three convergence metrics tracked in parallel
Knowing when to stop training is the most practical question in CFR. Exploitability is the gold standard, but it's expensive. Convergence Indicator (CI) and Max Positive Regret Rate are cheaper proxies that solver software uses in practice. This page runs Vanilla Counterfactual Regret Minimization on Leduc poker (6-card deck, two betting rounds, community card revealed between rounds) and tracks all three metrics live, so you can compare how they decay and which thresholds correspond to "essentially converged." Leduc has a unique Nash equilibrium, so all three metrics are expected to converge cleanly — unlike Kuhn poker, which has a one-parameter family of Nash equilibria that prevents CI from settling.
Live Metrics
Current values, sparkline history, and convergence threshold
Combined Convergence Chart
All three metrics over iterations — log scale, normalized to their thresholds
Each line is plotted as value / threshold on a log scale. When a line crosses below 1.0 (the dashed white line), that metric has reached its convergence threshold. Look at the order the lines cross and the shape of decay — they tell different but correlated stories.
How Each Metric Works
The equation each metric uses
Exploitability
ε(σ̄) = ½·(BR_value(P1, σ̄_2) + BR_value(P2, σ̄_1))O(1/√T) decay.Convergence Indicator (CI)
CI = (Σ_I w(I)·TV(σ̄_T(I), σ̄_{T/2}(I)) / Σ_I w(I)) × 1000where
w(I) = total strategy sum, comparison window is half the elapsed iterationsO(1/√T) decay — same rate as exploitability.Max Positive Regret Rate
max R⁺/T = max_{I,a} max(0, R(I,a)) / Tε ≤ const · max R⁺/T. O(1/√T) decay.Side-by-Side Comparison
The trade-offs at a glance
| Exploitability | Convergence Indicator | Max Positive Regret Rate | |
|---|---|---|---|
| What it measures | Distance from Nash, exact | Strategy drift between snapshots | Max positive cumulative regret per iteration |
| Theoretical link to Nash | Direct (= Nash gap) | Indirect (proxy) | Upper bound (CFR theorem) |
| Cost on Leduc | ~10–50 ms (BR DP) | ~0.1 ms | ~0.1 ms |
| Cost on Heads-Up Limit Texas Hold'em | CPU-days with engineering | ~ms (linear scan) | ~ms (linear scan) |
| Cost on Heads-Up No-Limit Texas Hold'em | Infeasible without abstraction; use Local Best Response as lower bound | Cheap, runs every iteration | Cheap, runs every iteration |
| Best used for | Final validation, paper claims | Live monitoring, early stopping | Sanity check, theoretical bound |
| Convergence threshold | < 1 mbb/g | < 5 | < 10⁻³ |