Formalizing Exact Majority: Machine-Checked Correctness and State Bounds for Doty et al. Theorem 3.1

2026-05-31

We present a Lean 4 formalization of Doty et al.’s Theorem 3.1 — the existence of a time- and state-optimal stable population protocol for exact majority. Two of the three claims are machine-checked: $O(\log n)$ states and stable correctness, with zero sorry and zero custom axioms across 3,500+ build targets. The third claim — the $O(\log n)$ stabilization-time bound — is not yet established; see the correction note below.

Correction (2026-06-06). An earlier version of this post claimed a complete formalization of all three parts of Theorem 3.1, including the $O(\log n)$ stabilization-time bound. That time-complexity claim was premature. At the time of writing, the timed clock was proved only at scale $\Theta(\log^2 n)$ parallel time (clock_composed_via_A0), a full logarithmic factor short of $O(\log n)$; the pieces needed for $O(\log n)$ were absent from the build, recorded honestly in-code as clock_honest_verdict; and the time-composition theorem doty_time_headline remained conditional on eleven phase-convergence hypotheses that were never discharged. The “0 sorry” build was real, but the hard time content sat in undischarged hypotheses and a trivially-true verdict — not in a complete unconditional theorem. The correctness and state-complexity results below stand unchanged. The time-complexity formalization is ongoing; this post has been revised to retract the time claim. Part 3 has been rewritten to reflect the true status.

What Was Proved

Doty, Eftekhari, Gasieniec, Severson, Stull, and Uznanski’s protocol (arXiv:2106.10201v2) is a 70-page paper achieving simultaneous optimality in state complexity ($O(\log n)$) and stabilization time ($O(\log n)$ parallel time). The protocol uses an elaborate 11-phase design: population splitting, integer averaging, fixed-resolution clock synchronization, tie detection, bias doubling, exponent sampling, consumption, and stable backup.

Our Lean 4 formalization proves two of the three parts of Theorem 3.1 — state complexity and stable correctness:

$$ \underbrace{|\Lambda_{\text{eff}}| \leq C \cdot (K+1)(L+1)}_{\text{state complexity}} \;\wedge\; \underbrace{\text{StablyComputes}(\mathcal{P}, M)}_{\text{correctness}} $$

where $L = \lceil \log_2 n \rceil$ and $K$ is the clock-resolution constant. The third part — the $O(\log n)$ stabilization-time bound $\Pr[\text{not done by } t] \leq 1/n^2$ with $t = 10(n-1)\ln n$ — is not yet established (see the correction note above); its formalization is ongoing, and Part 3 below records the true status.

The Three Parts

Part 1: Stable Correctness

The hardest part. A population protocol stably computes majority if, from any valid initial configuration, every reachable configuration can reach a stable output matching the true majority.

This required formalizing:

The complete 11-phase transition function (2,500+ lines, handling every pair of agents in every phase/role combination)
Phase epidemic synchronization and its monotonicity properties
A deterministic chain: from any reachable configuration, the protocol reaches a checkpoint (synchronized phases + two clocks), then progresses through all phases to a stable endpoint
A reachable invariant proving that agents at phase 4 always have output .T — the key link between the protocol’s tie-detection mechanism and its output

The correctness proof alone spans 9,000+ lines in DeterministicChain.lean.

Part 2: State Complexity $O(L)$

The paper claims $O(\log n)$ states. Our flat AgentState record stores all fields simultaneously, giving $\Theta(L^4)$ states. The paper’s actual design uses disjoint fields per phase — for example, Phase 3 Main agents are either biased (carrying sign + exponent) or unbiased (carrying hour), never both.

We formalized this as a sum-typed EffAgent:

$$ \texttt{EffAgent} = \Sigma_{p \in \text{Phase}} \Sigma_{r \in \text{Role}} \texttt{PhaseRoleState}(p, r) $$

with $|\texttt{EffAgent}| \leq 3{,}769{,}920 \cdot (K+1)(L+1) = O(L)$. We proved all 11 per-phase commutation theorems showing that the flat transition projects faithfully to the effective state space:

$$ \texttt{toEff} \circ \texttt{Phase}_k\texttt{Transition} = \texttt{effPhase}_k\texttt{Transition} \circ \texttt{toEff} $$

Phase 0 was the hardest — its 100+ nested if-branches required an eraseBHM projection technique to avoid simp timeouts.

Part 3: Time Complexity — In Progress (Not Yet Established)

This is the part the original post over-claimed. The honest status, recorded directly in the codebase as clock_honest_verdict (ClockTimeConvergence.lean):

The untimed-phase machinery is real and complete: geometric waiting-time sums with Janson concentration (1,898 lines in JansonGeometric.lean, zero sorry). The per-step descent probability at epidemic count $v$ is $\frac{2(n-v)v}{n(n-1)} = \Omega(v/n)$ from the cross-pair count, so the hitting time telescopes to $(n-1)H_{n-1} = \Theta(n\log n)$ interactions rather than the generic $\Theta(n^3 \log n)$.
The timed clock was the gap. The A0-template composition proved it only at scale $\Theta(\log^2 n)$ parallel time (clock_composed_via_A0) — a full logarithmic factor short of the paper’s $O(\log n)$. Reaching $O(\log n)$ needs three pieces that were absent from the build: the constant-fraction bulk epidemic ($0.1 \to 0.9$ in $O(1)$ parallel, Lemma 4.5 at $p_{\min} = \Theta(1)$), the doubly-exponential front tail $c_{\geq i+1} < p\,c_{\geq i}^2$ (Theorem 6.5), and the early-drip bound $d_{\geq i+1} = O(n^{-0.85})$ (Lemma 6.3).
The time-composition theorem doty_time_headline is conditional: it assumes eleven PhaseConvergence instances as hypotheses, and those instances were never constructed and supplied. So the “0 sorry” build did not contain a complete unconditional time bound — the hard content lived in undischarged hypotheses and a trivially-true verdict. This is precisely the failure mode the audit checklist below (Group C) is meant to catch.

Work since has built the missing clock pieces — the constant-fraction epidemic, the front-tail induction, the early-drip bound, and a two-sided $O(\log n)$ clock (Theorems 6.8/6.9) — on an abstract clock model, and is now connecting them to the real protocol kernel via a direct drift on the minute field. Until that connection and the eleven phase instances are complete, the $O(\log n)$ time bound for Theorem 3.1 is not formally established here.

Findings Along the Way

Phase 0 Dead-End for Small Populations

We discovered that for populations with $n \leq 9$, the protocol can dead-end at Phase 0 under adversarial scheduling — not enough agents to create two clocks. The paper’s proof implicitly assumes $n$ is large enough. Our formalization adds the hypothesis $n \geq 10$, matching the paper’s intended parameter range.

Phase 0 Rule 3’ Bug

The formalization revealed a mismatch between the paper’s Rule 7-9 for MCR promotion and the originally coded transition. The assigned guard was missing, allowing MCR waste that could prevent clock creation. Fixed in Transition.lean.

The EffAgent Design

The paper’s $O(\log n)$ state bound relies on fields being mutually exclusive per phase — biased Main agents carry exponent but not hour, unbiased ones carry hour but not exponent. Our flat Lean type stores both, giving $\Theta(L^2)$ for Phase 3 alone. The resolution was a sum-typed EffAgent with a projection/simulation theorem, following the canonical subtype pattern rather than a quotient.

Statistics


Total Lean files	30+
Lines of Lean	40,000+
Build targets	3,502
Sorry count	0
Custom axioms	0
Axiom dependencies	propext, Classical.choice, Quot.sound
Duration	~4 weeks
AI agents used	Claude Opus 4.6, Opus 4.8, GPT-5.5 (advisory)

Lessons

Simulate first. Monte Carlo simulation before formalization catches specification bugs early. Our C simulator validated the protocol on populations from $n = 21$ to $n = 1001$.
Monotonicity squeeze. When you know $f(x) \leq \text{result} \leq g(x)$ and $f(x) = g(x)$, you get the result without unfolding the function. This pattern closed the epidemic phase analysis without touching phaseEpidemicUpdate’s internals.
Pipeline decomposition for nested ifs. A 100-branch transition function cannot be simp’d. Split it into named rule functions, prove per-rule commutation, compose. This is not a Lean trick — it is how the mathematics should be organized.
Reachable invariants need the full Transition. Proving “phase 4 implies output T” required tracing through every possible interaction, not just Phase 4. Cross-phase interactions via the epidemic mechanism are the hard cases.
The honest theorem is a conjunction. State complexity, correctness, and time are three separate mathematical claims. Packaging them correctly — with the right type (EffAgent vs AgentState) for each — matters for the statement to be faithful to the paper.
17-point audit catches what 0-sorry does not. We developed a 17-point acceptance checklist for Lean formalizations, organized in three groups. Group A (mechanical) checks sorry count, axiom count, trivially-true conclusions, full build, and #print axioms. Group B (signature) checks hypothesis escape, end-to-end statement, interface minimality, and honest classification of conditional vs unconditional results. Group C (semantic) is the one that matters most: it requires reading the Lean theorem statement side-by-side with the paper’s original theorem, checking statement fidelity (does the Lean statement actually say what the paper claims?), definition alignment (do the Lean definitions match the paper’s?), fragment detection (is this the full theorem or just a component?), and impostor detection (is the conclusion trivially true or has the hard part been smuggled into a hypothesis?). Each theorem gets a verdict: FAITHFUL, CONDITIONAL-honest, FRAGMENT, or IMPOSTOR. Zero sorry and a clean build pass Groups A and B automatically — but without Group C, you can have a 0-sorry formalization that proves something entirely different from the paper’s theorem.

Conclusion

The gap between a 70-page paper proof and a 40,000-line machine-checked proof is not mainly about rigor — the paper’s arguments are correct. The gap is about completeness: every case, every cross-phase interaction, every dormant field, every edge in the population-size parameter. Formal verification does not find conceptual errors in well-written papers. It finds the thousand small decisions that a human reader resolves unconsciously and a proof assistant demands explicitly.

千淘万漉虽辛苦，吹尽狂沙始到金。 A thousand washings, ten thousand siftings — arduous though they are — blow away the wild sand to reveal the gold. — 刘禹锡 (Liu Yuxi), 浪淘沙

— Zinan