Formalizing Exact Majority: A Complete Machine-Checked Proof of Doty et al. Theorem 3.1
2026-05-31
We completed a full Lean 4 formalization of Doty et al.’s Theorem 3.1 β the existence of a time- and state-optimal stable population protocol for exact majority. The formalization covers all three claims: $O(\log n)$ states, stable correctness, and $O(\log n)$ stabilization time with high probability. Zero sorry, zero custom axioms, 3,500+ build targets passing.
What Was Proved
Doty, Eftekhari, Gasieniec, Severson, Stull, and Uznanski’s protocol (arXiv:2106.10201v2) is a 70-page paper achieving simultaneous optimality in state complexity ($O(\log n)$) and stabilization time ($O(\log n)$ parallel time). The protocol uses an elaborate 11-phase design: population splitting, integer averaging, fixed-resolution clock synchronization, tie detection, bias doubling, exponent sampling, consumption, and stable backup.
Our Lean 4 formalization proves the complete Theorem 3.1 as a single conjunction:
$$ \underbrace{|\Lambda_{\text{eff}}| \leq C \cdot (K+1)(L+1)}_{\text{state complexity}} \;\wedge\; \underbrace{\text{StablyComputes}(\mathcal{P}, M)}_{\text{correctness}} \;\wedge\; \underbrace{\Pr[\text{not done by } t] \leq 1/n^2}_{\text{time complexity}} $$where $L = \lceil \log_2 n \rceil$, $K$ is the clock-resolution constant, and $t = 10(n-1) \ln n$.
The Three Parts
Part 1: Stable Correctness
The hardest part. A population protocol stably computes majority if, from any valid initial configuration, every reachable configuration can reach a stable output matching the true majority.
This required formalizing:
- The complete 11-phase transition function (2,500+ lines, handling every pair of agents in every phase/role combination)
- Phase epidemic synchronization and its monotonicity properties
- A deterministic chain: from any reachable configuration, the protocol reaches a checkpoint (synchronized phases + two clocks), then progresses through all phases to a stable endpoint
- A reachable invariant proving that agents at phase 4 always have output
.Tβ the key link between the protocol’s tie-detection mechanism and its output
The correctness proof alone spans 9,000+ lines in DeterministicChain.lean.
Part 2: State Complexity $O(L)$
The paper claims $O(\log n)$ states. Our flat AgentState record stores all fields simultaneously, giving $\Theta(L^4)$ states. The paper’s actual design uses disjoint fields per phase β for example, Phase 3 Main agents are either biased (carrying sign + exponent) or unbiased (carrying hour), never both.
We formalized this as a sum-typed EffAgent:
with $|\texttt{EffAgent}| \leq 3{,}769{,}920 \cdot (K+1)(L+1) = O(L)$. We proved all 11 per-phase commutation theorems showing that the flat transition projects faithfully to the effective state space:
$$ \texttt{toEff} \circ \texttt{Phase}_k\texttt{Transition} = \texttt{effPhase}_k\texttt{Transition} \circ \texttt{toEff} $$Phase 0 was the hardest β its 100+ nested if-branches required an eraseBHM projection technique to avoid simp timeouts.
Part 3: Time Complexity $O(n \log n)$ with $\varepsilon \leq 1/n^2$
The time bound combines two mechanisms:
- Untimed phases: geometric waiting-time sums with Janson concentration (infrastructure: 1,898 lines in
JansonGeometric.lean, zero sorry) - Timed phases: clock-conditioned window arguments with forward-preserved invariants
The key insight for the tight bound: the per-step descent probability at epidemic count $v$ is $\frac{2(n-v)v}{n(n-1)} = \Omega(v/n)$ from the cross-pair count, so the expected hitting time telescopes to $(n-1) H_{n-1} = \Theta(n \log n)$ rather than the generic $\Theta(n^3 \log n)$. The Janson tail bound then gives $\Pr[\text{not done}] \leq 1/n^2$.
Findings Along the Way
Phase 0 Dead-End for Small Populations
We discovered that for populations with $n \leq 9$, the protocol can dead-end at Phase 0 under adversarial scheduling β not enough agents to create two clocks. The paper’s proof implicitly assumes $n$ is large enough. Our formalization adds the hypothesis $n \geq 10$, matching the paper’s intended parameter range.
Phase 0 Rule 3’ Bug
The formalization revealed a mismatch between the paper’s Rule 7-9 for MCR promotion and the originally coded transition. The assigned guard was missing, allowing MCR waste that could prevent clock creation. Fixed in Transition.lean.
The EffAgent Design
The paper’s $O(\log n)$ state bound relies on fields being mutually exclusive per phase β biased Main agents carry exponent but not hour, unbiased ones carry hour but not exponent. Our flat Lean type stores both, giving $\Theta(L^2)$ for Phase 3 alone. The resolution was a sum-typed EffAgent with a projection/simulation theorem, following the canonical subtype pattern rather than a quotient.
Statistics
| Total Lean files | 30+ |
| Lines of Lean | 40,000+ |
| Build targets | 3,502 |
| Sorry count | 0 |
| Custom axioms | 0 |
| Axiom dependencies | propext, Classical.choice, Quot.sound |
| Duration | ~4 weeks |
| AI agents used | Claude Opus 4.6, Opus 4.8, GPT-5.5 (advisory) |
Lessons
-
Simulate first. Monte Carlo simulation before formalization catches specification bugs early. Our C simulator validated the protocol on populations from $n = 21$ to $n = 1001$.
-
Monotonicity squeeze. When you know $f(x) \leq \text{result} \leq g(x)$ and $f(x) = g(x)$, you get the result without unfolding the function. This pattern closed the epidemic phase analysis without touching
phaseEpidemicUpdate’s internals. -
Pipeline decomposition for nested ifs. A 100-branch transition function cannot be
simp’d. Split it into named rule functions, prove per-rule commutation, compose. This is not a Lean trick β it is how the mathematics should be organized. -
Reachable invariants need the full Transition. Proving “phase 4 implies output T” required tracing through every possible interaction, not just Phase 4. Cross-phase interactions via the epidemic mechanism are the hard cases.
-
The honest theorem is a conjunction. State complexity, correctness, and time are three separate mathematical claims. Packaging them correctly β with the right type (EffAgent vs AgentState) for each β matters for the statement to be faithful to the paper.
-
17-point audit catches what 0-sorry does not. We developed a 17-point acceptance checklist for Lean formalizations, organized in three groups. Group A (mechanical) checks sorry count, axiom count, trivially-true conclusions, full build, and
#print axioms. Group B (signature) checks hypothesis escape, end-to-end statement, interface minimality, and honest classification of conditional vs unconditional results. Group C (semantic) is the one that matters most: it requires reading the Lean theorem statement side-by-side with the paper’s original theorem, checking statement fidelity (does the Lean statement actually say what the paper claims?), definition alignment (do the Lean definitions match the paper’s?), fragment detection (is this the full theorem or just a component?), and impostor detection (is the conclusion trivially true or has the hard part been smuggled into a hypothesis?). Each theorem gets a verdict: FAITHFUL, CONDITIONAL-honest, FRAGMENT, or IMPOSTOR. Zero sorry and a clean build pass Groups A and B automatically β but without Group C, you can have a 0-sorry formalization that proves something entirely different from the paper’s theorem.
Conclusion
The gap between a 70-page paper proof and a 40,000-line machine-checked proof is not mainly about rigor β the paper’s arguments are correct. The gap is about completeness: every case, every cross-phase interaction, every dormant field, every edge in the population-size parameter. Formal verification does not find conceptual errors in well-written papers. It finds the thousand small decisions that a human reader resolves unconsciously and a proof assistant demands explicitly.
εζ·δΈζΌθ½θΎθ¦οΌεΉε°½ηζ²ε§ε°ιγ A thousand washings, ten thousand siftings β arduous though they are β blow away the wild sand to reveal the gold. β εη¦Ήι‘ (Liu Yuxi), ζ΅ͺζ·ζ²
β Zinan