infsup

by Zinan Huang 🌸

Formalizing the 3-State Approximate Majority Protocol in Lean 4: Completing a Classic Proof

2026-04-10


In the course of formalizing the $O(n \log n)$ convergence proof for the 3-state approximate majority protocol (Angluin, Aspnes, and Eisenstat, 2008), we found that one intermediate step in the original proof sketch does not hold as stated. The final theorem is correct β€” the proof just requires a slightly different argument for the central region than what the paper’s sketch suggests.

This is a formalization of one specific population protocol β€” the 3-state approximate majority β€” not a general framework for population protocols. However, much of the infrastructure (state space, Markov kernels, potential functions, geometric drift machinery) could serve as a foundation for formalizing other protocols.

The full Lean 4 formalization is available at github.com/zinan-huang/pp-proof β€” approximately 6,600 lines of Lean 4 + Mathlib, zero sorry.

Background: The 3-State Protocol

The 3-state approximate majority protocol is one of the most studied population protocols. There are $n$ agents, each in one of three states: $x$ (opinion A), $y$ (opinion B), or $b$ (blank/undecided). At each step, two agents are chosen uniformly at random to interact. The update rules are:

Angluin, Aspnes, and Eisenstat proved that this protocol reaches consensus in $O(n \log n)$ interactions with high probability β€” matching the information-theoretic lower bound.

The Potential Function

The proof uses the potential function $f = u^2 + 2n$, where $u = x - y$ is the opinion gap. The state space is partitioned into four regions based on which count is large:

For the three corner regions, the proof shows a multiplicative drift: the expected value of $1/f$ contracts by a factor of $(1 - \delta)$ per step, for some $\delta = \Theta(1/n)$. This yields geometric decay by standard arguments.

The Subtlety: Central Region Drift Does Not Hold

The paper’s Lemma 4 concerns the central region. The proof sketch implicitly uses the same multiplicative drift framework: that

$$\mathbb{E}[1/f' \mid \mathcal{F}_t] \leq (1 - \delta) \cdot 1/f_t$$

for configurations in the central region. This inequality does not hold in general.

Counterexample. Take $n = 4$, $x = 1$, $b = 0$, $y = 3$. This configuration is in the central region (no count reaches $7 \cdot 4/8 = 3.5$, and $y = 3 < 3.5$). The potential is $f = (1-3)^2 + 8 = 12$.

Computing over all possible interactions:

$$\frac{\mathbb{E}[1/f']}{1/f} = \frac{103}{102} > 1.$$

So $1/f$ increases on average. Truncation to the active central region does not help: all transitions from this configuration stay in the central region, so absorption provides no correction.

We verified computationally that this counterexample persists for $n = 4$ through $n = 50$.

Why it fails. The function $1/x$ is convex. Even though the expected drift of $f$ is non-negative ($\mathbb{E}[\Delta f] \geq 0$), Jensen’s inequality goes the wrong way:

$$\mathbb{E}[1/f'] \geq 1/\mathbb{E}[f'] \geq 1/f.$$

In the corner regions, the step sizes are small relative to $f$, so the convexity error is negligible. In the central region, step sizes of order $2u + 1$ can be comparable to $f = u^2 + 2n$, and the convexity effect dominates.

The Fix: Exponential Supermartingale

The correct approach β€” which is actually described in Lemma 4 of the paper, though the connection to the drift argument is not made explicit β€” uses an augmented state that tracks cumulative interaction counts.

Define the augmented state $(C_t, S_t^{vb}, S_t^{xy})$ where $S_t^{vb}$ and $S_t^{xy}$ count the cumulative number of $vb$-type and $xy$-type interactions up to time $t$. The supermartingale is:

$$M_t = \frac{\alpha_{vb}^{S_t^{vb}} \cdot \alpha_{xy}^{S_t^{xy}}}{f(C_t)},$$

where $\alpha_{vb} = (16n+7)/(16n)$ and $\alpha_{xy} = (16n-5)/(16n)$.

The per-step supermartingale condition $\mathbb{E}[M_{t+1} \mid \mathcal{F}_t] \leq M_t$ reduces to two algebraic inequalities, one for each interaction type:

Both hold for all $n \geq 1$ β€” not just “sufficiently large $n$” β€” which we verified formally. The key insight is that these are product-form bounds: the supermartingale factors through the individual interaction types, avoiding the convexity problem entirely.

What We Formalized

The formalization covers the complete proof chain from algebraic lemmas to the final geometric decay bound:

Component File Lines
State space, transitions Config.lean, Step.lean, Transition.lean ~770
Potential, drift analysis DeltaF.lean, Drift.lean, RelativeChange.lean ~1,130
Supermartingale algebra Supermartingale.lean, CentralSupermartingale.lean ~560
Corner region bounds ConvergenceTime.lean ~1,090
Augmented state + central proof AugmentedState.lean ~2,380
Supporting infrastructure Expected.lean, GeometricDrift.lean, etc. ~720
Total 23 files ~6,650

The formalization is built on Lean 4 with Mathlib. All proofs compile with zero sorry.

Key Theorems

The main results, as stated in Lean:

/-- Geometric decay for central region. -/
theorem prob_in_activeCentral_le (hn : n β‰₯ 2) (cβ‚€ : Config n) (t : β„•) :
    (absorbedKernelCentral hn ^ t) cβ‚€ activeCentral ≀
    3 * ENNReal.ofReal ((1 - 1 / (15000 * (n : ℝ))) ^ t) *
    potentialCentralTrunc cβ‚€

/-- Geometric decay for large-x region. -/
theorem prob_in_activeLargeX_le (hn : n β‰₯ 2) (cβ‚€ : Config n) (t : β„•) :
    (absorbedKernelLargeX hn ^ t) cβ‚€ activeLargeX ≀
    ENNReal.ofReal ((1 - 1 / (15000 * (n : ℝ))) ^ t) *
    potentialLargeXTrunc cβ‚€

The large-$y$ and large-$b$ regions have analogous statements.

Technical Challenges in the Formalization

ENNReal Arithmetic

The probability bounds live in $\mathbb{R}_{\geq 0}^{\infty}$ (ENNReal in Mathlib), while the algebraic core is in $\mathbb{R}$. Bridging the two requires careful management of:

Heartbeat Budgets

Several proofs involving high-degree polynomials (e.g., the Bernoulli inequality applied to $(1 - 1/(15000n))^{516}$) require set_option maxHeartbeats 400000 due to exponential WHNF expansion when Lean tries to verify definitional equality of large expressions. The solution: extract intermediate results into opaque helper lemmas.

Constants

The paper uses tighter constants that require careful case analysis. Our formalization uses $15000$ where the paper’s implicit constants are smaller. This is intentional β€” the larger constant provides more margin for the formal inequalities without affecting the asymptotic $O(n \log n)$ bound.

What This Means

The final theorem of Angluin, Aspnes, and Eisenstat is correct: the 3-state approximate majority protocol converges in $O(n \log n)$ interactions with high probability. The paper’s Lemma 4 does describe the correct supermartingale construction β€” the only issue is that the surrounding proof sketch suggests a simpler drift argument that does not quite go through. The fix is already implicit in their own lemma statement; our formalization just makes it explicit.

This is a case where formalization revealed a subtle point: the multiplicative drift $\mathbb{E}[1/f'] \leq (1-\delta)/f$ is a natural conjecture that almost works β€” it holds in three of the four regions β€” but fails precisely where the potential function’s convexity interacts with large step sizes. The exponential supermartingale avoids this by working with products rather than reciprocals.

References