infsup

by Zinan Huang 🌸

Formalizing Exact Majority: A Complete Machine-Checked Proof of Doty et al. Theorem 3.1

2026-05-31


We completed a full Lean 4 formalization of Doty et al.’s Theorem 3.1 β€” the existence of a time- and state-optimal stable population protocol for exact majority. The formalization covers all three claims: $O(\log n)$ states, stable correctness, and $O(\log n)$ stabilization time with high probability. Zero sorry, zero custom axioms, 3,500+ build targets passing.

What Was Proved

Doty, Eftekhari, Gasieniec, Severson, Stull, and Uznanski’s protocol (arXiv:2106.10201v2) is a 70-page paper achieving simultaneous optimality in state complexity ($O(\log n)$) and stabilization time ($O(\log n)$ parallel time). The protocol uses an elaborate 11-phase design: population splitting, integer averaging, fixed-resolution clock synchronization, tie detection, bias doubling, exponent sampling, consumption, and stable backup.

Our Lean 4 formalization proves the complete Theorem 3.1 as a single conjunction:

$$ \underbrace{|\Lambda_{\text{eff}}| \leq C \cdot (K+1)(L+1)}_{\text{state complexity}} \;\wedge\; \underbrace{\text{StablyComputes}(\mathcal{P}, M)}_{\text{correctness}} \;\wedge\; \underbrace{\Pr[\text{not done by } t] \leq 1/n^2}_{\text{time complexity}} $$

where $L = \lceil \log_2 n \rceil$, $K$ is the clock-resolution constant, and $t = 10(n-1) \ln n$.

The Three Parts

Part 1: Stable Correctness

The hardest part. A population protocol stably computes majority if, from any valid initial configuration, every reachable configuration can reach a stable output matching the true majority.

This required formalizing:

The correctness proof alone spans 9,000+ lines in DeterministicChain.lean.

Part 2: State Complexity $O(L)$

The paper claims $O(\log n)$ states. Our flat AgentState record stores all fields simultaneously, giving $\Theta(L^4)$ states. The paper’s actual design uses disjoint fields per phase β€” for example, Phase 3 Main agents are either biased (carrying sign + exponent) or unbiased (carrying hour), never both.

We formalized this as a sum-typed EffAgent:

$$ \texttt{EffAgent} = \Sigma_{p \in \text{Phase}} \Sigma_{r \in \text{Role}} \texttt{PhaseRoleState}(p, r) $$

with $|\texttt{EffAgent}| \leq 3{,}769{,}920 \cdot (K+1)(L+1) = O(L)$. We proved all 11 per-phase commutation theorems showing that the flat transition projects faithfully to the effective state space:

$$ \texttt{toEff} \circ \texttt{Phase}_k\texttt{Transition} = \texttt{effPhase}_k\texttt{Transition} \circ \texttt{toEff} $$

Phase 0 was the hardest β€” its 100+ nested if-branches required an eraseBHM projection technique to avoid simp timeouts.

Part 3: Time Complexity $O(n \log n)$ with $\varepsilon \leq 1/n^2$

The time bound combines two mechanisms:

The key insight for the tight bound: the per-step descent probability at epidemic count $v$ is $\frac{2(n-v)v}{n(n-1)} = \Omega(v/n)$ from the cross-pair count, so the expected hitting time telescopes to $(n-1) H_{n-1} = \Theta(n \log n)$ rather than the generic $\Theta(n^3 \log n)$. The Janson tail bound then gives $\Pr[\text{not done}] \leq 1/n^2$.

Findings Along the Way

Phase 0 Dead-End for Small Populations

We discovered that for populations with $n \leq 9$, the protocol can dead-end at Phase 0 under adversarial scheduling β€” not enough agents to create two clocks. The paper’s proof implicitly assumes $n$ is large enough. Our formalization adds the hypothesis $n \geq 10$, matching the paper’s intended parameter range.

Phase 0 Rule 3’ Bug

The formalization revealed a mismatch between the paper’s Rule 7-9 for MCR promotion and the originally coded transition. The assigned guard was missing, allowing MCR waste that could prevent clock creation. Fixed in Transition.lean.

The EffAgent Design

The paper’s $O(\log n)$ state bound relies on fields being mutually exclusive per phase β€” biased Main agents carry exponent but not hour, unbiased ones carry hour but not exponent. Our flat Lean type stores both, giving $\Theta(L^2)$ for Phase 3 alone. The resolution was a sum-typed EffAgent with a projection/simulation theorem, following the canonical subtype pattern rather than a quotient.

Statistics

Total Lean files 30+
Lines of Lean 40,000+
Build targets 3,502
Sorry count 0
Custom axioms 0
Axiom dependencies propext, Classical.choice, Quot.sound
Duration ~4 weeks
AI agents used Claude Opus 4.6, Opus 4.8, GPT-5.5 (advisory)

Lessons

  1. Simulate first. Monte Carlo simulation before formalization catches specification bugs early. Our C simulator validated the protocol on populations from $n = 21$ to $n = 1001$.

  2. Monotonicity squeeze. When you know $f(x) \leq \text{result} \leq g(x)$ and $f(x) = g(x)$, you get the result without unfolding the function. This pattern closed the epidemic phase analysis without touching phaseEpidemicUpdate’s internals.

  3. Pipeline decomposition for nested ifs. A 100-branch transition function cannot be simp’d. Split it into named rule functions, prove per-rule commutation, compose. This is not a Lean trick β€” it is how the mathematics should be organized.

  4. Reachable invariants need the full Transition. Proving “phase 4 implies output T” required tracing through every possible interaction, not just Phase 4. Cross-phase interactions via the epidemic mechanism are the hard cases.

  5. The honest theorem is a conjunction. State complexity, correctness, and time are three separate mathematical claims. Packaging them correctly β€” with the right type (EffAgent vs AgentState) for each β€” matters for the statement to be faithful to the paper.

  6. 17-point audit catches what 0-sorry does not. We developed a 17-point acceptance checklist for Lean formalizations, organized in three groups. Group A (mechanical) checks sorry count, axiom count, trivially-true conclusions, full build, and #print axioms. Group B (signature) checks hypothesis escape, end-to-end statement, interface minimality, and honest classification of conditional vs unconditional results. Group C (semantic) is the one that matters most: it requires reading the Lean theorem statement side-by-side with the paper’s original theorem, checking statement fidelity (does the Lean statement actually say what the paper claims?), definition alignment (do the Lean definitions match the paper’s?), fragment detection (is this the full theorem or just a component?), and impostor detection (is the conclusion trivially true or has the hard part been smuggled into a hypothesis?). Each theorem gets a verdict: FAITHFUL, CONDITIONAL-honest, FRAGMENT, or IMPOSTOR. Zero sorry and a clean build pass Groups A and B automatically β€” but without Group C, you can have a 0-sorry formalization that proves something entirely different from the paper’s theorem.

Conclusion

The gap between a 70-page paper proof and a 40,000-line machine-checked proof is not mainly about rigor β€” the paper’s arguments are correct. The gap is about completeness: every case, every cross-phase interaction, every dormant field, every edge in the population-size parameter. Formal verification does not find conceptual errors in well-written papers. It finds the thousand small decisions that a human reader resolves unconsciously and a proof assistant demands explicitly.


εƒζ·˜δΈ‡ζΌ‰θ™½θΎ›θ‹¦οΌŒεΉε°½η‹‚ζ²™ε§‹εˆ°ι‡‘γ€‚ A thousand washings, ten thousand siftings β€” arduous though they are β€” blow away the wild sand to reveal the gold. β€” εˆ˜η¦Ήι”‘ (Liu Yuxi), ζ΅ͺζ·˜ζ²™

β€” Zinan