MCCFR Variants — Chance vs External vs Outcome Sampling

Three sampling schemes from the 2009 Monte Carlo Counterfactual Regret Minimization paper, one clear winner Glossary →

The Choice

What does each algorithm sample, and what does it enumerate?

Lanctot, Waugh, Zinkevich and Bowling's 2009 paper on Monte Carlo Counterfactual Regret Minimization (MCCFR) introduced exactly three sampling-based algorithms: Chance Sampling, External Sampling, and Outcome Sampling. Each iteration, the algorithm visits some part of the game tree and updates regrets there. They differ in which parts get sampled (one branch picked at random) versus enumerated (every branch walked). This single design knob controls per-iteration cost, variance, and convergence speed.

Side-by-Side Comparison

Same game tree, three different traversal patterns

Each diagram below shows one iteration on the same toy game tree: chance node at the top with 3 outcomes, then two layers of player decisions (P1, P2). Coloring shows what gets touched: green = enumerated (walk every branch), gold = sampled (pick one branch), faded gray = skipped this iteration.

CS Chance Sampling
Algorithm 1 — most conservative
1 sampled chance · 4 enumerated player paths
Samples
Chance only
Enumerates
Both players
Cost / iter
high
Variance
low
When: small games where you want clean, low-variance updates and chance enumeration alone is the bottleneck. Mostly historical — was used informally before the paper formalized it.
ES External Sampling
Algorithm 2 — the practical default
1 sampled chance · 2 enumerated own actions · 1 sampled opp per branch
Samples
Chance + opponent
Enumerates
Updating player only
Cost / iter
medium
Variance
medium
When: almost always. Per-iteration cost is independent of the chance and opponent action spaces, while variance stays low enough for smooth convergence. Used in Cepheus, Libratus, Pluribus.
OS Outcome Sampling
Algorithm 3 — most aggressive
1 trajectory — every node sampled
Samples
Everything
Enumerates
Nothing
Cost / iter
low
Variance
high
When: truly massive games where even ES is too slow per iteration. You'll need many more iterations to average out the noise, but each one is essentially free.
Enumerated Sampled (1 branch) Skipped this iteration

The Numbers

How they compare on the same game

For Big Leduc with R=10, K=3 (~150K info sets, ~10K distinct deals). Numbers are illustrative orders-of-magnitude on a typical browser-grade machine.

Chance Sampling External Sampling Outcome Sampling
Per-iter work ~tree depth × player_actions² ~tree depth × own_actions ~tree depth (one trajectory)
Iter / second ~30 ~1,000 ~10,000
Variance per iter Low — chance-only sampling Moderate — opp σ governed High — single trajectory
Iterations to ε=0.01 ~20K ~50K ~500K
Wall-clock to ε=0.01 ~10 minutes ~50 seconds ~50 seconds
Used by Pre-2009 informal use Cepheus · Libratus · Pluribus Bluff(2,2) benchmarks · giant trees

Notice that ES and OS reach the convergence target in similar wall-clock time — OS does many more iterations but each is noisier; ES does fewer but each is sharper. CS lags behind on wall-clock because per-iteration cost dominates iteration count.

Why ES is the Winner

Four reasons it's the de-facto default

01
Variance vs cost — the sweet spot

OS gets the cheapest iterations but each update is so noisy that you need many more to average out. CS gets the cleanest updates but each iteration is expensive. ES sits at the optimum: variance low enough that convergence is smooth, cost low enough that you can do thousands of iterations per second.

02
Per-iter cost is game-size invariant

ES per-iteration cost is O(tree_depth × own_actions) — it depends on your action tree, not the total deal count or the opponent's reach distribution. Scale up the chance space (more cards, more streets) and ES iter/sec barely moves. Vanilla CFR's grows linearly; OS becomes too noisy; ES holds steady.

03
Used by every major poker bot

Cepheus (Heads-Up Limit Texas Hold'em essentially solved, 2015), Libratus (Heads-Up No-Limit Texas Hold'em beat humans, 2017), Pluribus (six-player No-Limit Texas Hold'em, 2019) — all built on External Sampling MCCFR or its descendants (with CFR+ updates layered on top). When the production-bot people picked, they picked External Sampling.

04
Composes cleanly with everything else

ES plays well with CFR+ regret clipping, Linear / Discounted CFR weighting, and Deep CFR's neural-network-backed regret estimation. Most "modern CFR variants" mean "ES plus update-rule X." Picking ES doesn't lock you out of any improvement.

Decision Flow

Picking a sampling scheme by game characteristics

If your game is
Small to medium
(Kuhn, Leduc, ≤ ~5K info sets)
Don't bother with sampling at all — vanilla CFR converges in seconds. CS is for the borderline case where vanilla is just slightly painful and you want to keep variance near zero.
If your game is
Large but fits in memory
(~10K to ~10⁸ info sets — most poker games)
Use External Sampling. This is what Big Leduc, Bluff variants, Limit Texas Hold'em, and most subgames sit at. External Sampling is fast, smooth, and pairs naturally with CFR+ for further speedup.
If your game is
Massive — abstraction-required
(Heads-Up Limit Texas Hold'em scale, billions or more information sets)
Even External Sampling becomes too expensive. Switch to Outcome Sampling, or to Outcome Sampling with importance-weighted variants. You'll pay variance for speed, but it's the only practical option.

What's Next

If you're going to focus on ES, here's the study path

1. Re-read section 3.2 of the 2009 paper (External Sampling specifically) and Theorem 5 — the convergence bound has a constant M_es specific to ES.
2. Implement Algorithm 2 from scratch on a small tabular game — Kuhn or a small Leduc variant. Each iteration: sample one chance outcome, alternate the updating player, enumerate own actions, sample opponent actions per σ.
3. Reproduce a small slice of the paper's Table 2 — run ES alongside vanilla CFR on the same game, plot iterations completed vs wall-clock time, and verify ES's per-iteration cost is independent of game size.
4. Layer CFR+ updates on top of ES — clamp negative cumulative regret to 0 each iteration, weight strategy sums linearly. This is the combination Cepheus used.
5. Read Solving Heads-Up Limit Texas Hold'em (Tammelin et al., IJCAI 2015) for how ES + CFR+ scales to a real, solved poker game.