The Choice
What does each algorithm sample, and what does it enumerate?
Lanctot, Waugh, Zinkevich and Bowling's 2009 paper on Monte Carlo Counterfactual Regret Minimization (MCCFR) introduced exactly three sampling-based algorithms: Chance Sampling, External Sampling, and Outcome Sampling. Each iteration, the algorithm visits some part of the game tree and updates regrets there. They differ in which parts get sampled (one branch picked at random) versus enumerated (every branch walked). This single design knob controls per-iteration cost, variance, and convergence speed.
Side-by-Side Comparison
Same game tree, three different traversal patterns
Each diagram below shows one iteration on the same toy game tree: chance node at the top with 3 outcomes, then two layers of player decisions (P1, P2). Coloring shows what gets touched: green = enumerated (walk every branch), gold = sampled (pick one branch), faded gray = skipped this iteration.
The Numbers
How they compare on the same game
For Big Leduc with R=10, K=3 (~150K info sets, ~10K distinct deals). Numbers are illustrative orders-of-magnitude on a typical browser-grade machine.
| Chance Sampling | External Sampling | Outcome Sampling | |
|---|---|---|---|
| Per-iter work | ~tree depth × player_actions² | ~tree depth × own_actions | ~tree depth (one trajectory) |
| Iter / second | ~30 | ~1,000 | ~10,000 |
| Variance per iter | Low — chance-only sampling | Moderate — opp σ governed | High — single trajectory |
| Iterations to ε=0.01 | ~20K | ~50K | ~500K |
| Wall-clock to ε=0.01 | ~10 minutes | ~50 seconds | ~50 seconds |
| Used by | Pre-2009 informal use | Cepheus · Libratus · Pluribus | Bluff(2,2) benchmarks · giant trees |
Notice that ES and OS reach the convergence target in similar wall-clock time — OS does many more iterations but each is noisier; ES does fewer but each is sharper. CS lags behind on wall-clock because per-iteration cost dominates iteration count.
Why ES is the Winner
Four reasons it's the de-facto default
OS gets the cheapest iterations but each update is so noisy that you need many more to average out. CS gets the cleanest updates but each iteration is expensive. ES sits at the optimum: variance low enough that convergence is smooth, cost low enough that you can do thousands of iterations per second.
ES per-iteration cost is O(tree_depth × own_actions) — it depends on your action tree, not the total deal count or the opponent's reach distribution. Scale up the chance space (more cards, more streets) and ES iter/sec barely moves. Vanilla CFR's grows linearly; OS becomes too noisy; ES holds steady.
Cepheus (Heads-Up Limit Texas Hold'em essentially solved, 2015), Libratus (Heads-Up No-Limit Texas Hold'em beat humans, 2017), Pluribus (six-player No-Limit Texas Hold'em, 2019) — all built on External Sampling MCCFR or its descendants (with CFR+ updates layered on top). When the production-bot people picked, they picked External Sampling.
ES plays well with CFR+ regret clipping, Linear / Discounted CFR weighting, and Deep CFR's neural-network-backed regret estimation. Most "modern CFR variants" mean "ES plus update-rule X." Picking ES doesn't lock you out of any improvement.
Decision Flow
Picking a sampling scheme by game characteristics
Don't bother with sampling at all — vanilla CFR converges in seconds. CS is for the borderline case where vanilla is just slightly painful and you want to keep variance near zero.
Use External Sampling. This is what Big Leduc, Bluff variants, Limit Texas Hold'em, and most subgames sit at. External Sampling is fast, smooth, and pairs naturally with CFR+ for further speedup.
Even External Sampling becomes too expensive. Switch to Outcome Sampling, or to Outcome Sampling with importance-weighted variants. You'll pay variance for speed, but it's the only practical option.
What's Next
If you're going to focus on ES, here's the study path
1. Re-read section 3.2 of the 2009 paper (External Sampling specifically) and Theorem 5 — the convergence bound has a constant M_es specific to ES.
2. Implement Algorithm 2 from scratch on a small tabular game — Kuhn or a small Leduc variant. Each iteration: sample one chance outcome, alternate the updating player, enumerate own actions, sample opponent actions per σ.
3. Reproduce a small slice of the paper's Table 2 — run ES alongside vanilla CFR on the same game, plot iterations completed vs wall-clock time, and verify ES's per-iteration cost is independent of game size.
4. Layer CFR+ updates on top of ES — clamp negative cumulative regret to 0 each iteration, weight strategy sums linearly. This is the combination Cepheus used.
5. Read Solving Heads-Up Limit Texas Hold'em (Tammelin et al., IJCAI 2015) for how ES + CFR+ scales to a real, solved poker game.