A Gap Surfaced by Formalization: Transient Overshoot in the LPP Main Theorem

2026-04-20

The previous post introduced Ripple, a Lean 4 formalization of CRN-computable numbers. One of the observations made in the process of formalization was a small but genuine gap in the main theorem of Huang–Huls, Computing real numbers with large-population protocols (DNA 28, 2022) — the paper establishing that every bounded-CRN-computable number in $[0,1]$ is also computable by a large-population protocol (LPP). This post explains what the gap is and how the formalization patched it.

The conclusion is the paper’s theorem is correct under a small strengthening of its hypothesis, and a clean preprocessing step recovers the strong conclusion from the weaker standard hypothesis. Nothing in the paper’s main line of reasoning is in question; the formalization simply forced an implicit assumption to be named.

1. The LPP Compilation Pipeline

A large-population protocol on a finite set $Q$ of states is a pair $(\pi_0, \delta)$ consisting of an initial distribution $\pi_0$ on $Q$ and a symmetric transition rule $\delta: Q \times Q \to Q \times Q$ applied to uniformly random pairs at Poisson-distributed times. In the infinite-population mean-field limit, the population-fraction vector $\mathbf{p}(t) \in \Delta(Q)$ evolves by a quadratic ODE on the simplex $\Delta(Q)$.

A real number $\alpha \in [0,1]$ is LPP-computable if there is a protocol whose mean-field fraction in a designated state tends to $\alpha$ exponentially fast.

The main theorem of the paper — call it the LPP compilation theorem — says the following.

If $\alpha \in [0,1]$ is computable by a bounded polynomial IVP (equivalently, a bounded CRN), then $\alpha$ is LPP-computable.

The proof is a four-stage pipeline:

$$ \text{CRN} \xrightarrow{\text{Stage 1}} \text{degree-}2\ \text{CRN} \xrightarrow{\text{Stage 2}} \text{TPP on simplex} \xrightarrow{\text{Stage 3}} \text{LPP} \xrightarrow{\text{Stage 4}} \text{probabilistic LPP}. $$

Stage 2 is the load-bearing step. It takes a non-negative degree-2 CRN with designated output coordinate $x_{\mathrm{out}}$ and embeds its trajectory onto the probability simplex $\Delta^{d}$ of one higher dimension via

$$ x_i(t) \;\mapsto\; c \cdot x_i(t) \qquad (i = 0, \ldots, d-1), \qquad x_d(t) \;\mapsto\; 1 - c \sum_{i=0}^{d-1} x_i(t), $$

for a suitable scaling constant $c > 0$.

2. Where the Hypothesis Becomes Load-Bearing

For this embedding to produce a genuine simplex-valued trajectory, the scaled coordinates $c \cdot x_i(t)$ must each stay in $[0,1]$ for all $t \geq 0$, not only in the limit. In particular the output coordinate $x_{\mathrm{out}}$ must satisfy a uniform strict bound

$$ x_{\mathrm{out}}(t) \;\leq\; M_{\mathrm{out}} \;<\; 1 \qquad \text{for all } t \geq 0. \tag{$\star$} $$

When one carefully assembles the Stage 2 balancing-dilation argument, the bound $(\star)$ is what allows the slack constant in the simplex embedding to be chosen with enough room to guarantee non-negativity of the residual coordinate $x_d$.

The implicit assumption in the paper is that a bounded CRN computing $\alpha \in [0,1]$ automatically satisfies $(\star)$. Why does it seem natural? Because “$\alpha \in [0,1]$” and “$x_{\mathrm{out}}(t) \to \alpha$” both appear, and the eye slides to the conclusion that $x_{\mathrm{out}}$ stays in $[0,1]$. Formalization does not let the eye slide.

3. The Gap

The CRN / GPAC definition of bounded-time computability only guarantees

$$ \|x(t)\| \;\leq\; M \qquad \text{for all } t \geq 0, $$

for some finite $M$, typically much larger than $1$. The output coordinate’s limit is $\alpha < 1$, but en route it may transiently exceed $1$ before relaxing. This behavior is normal: a damped oscillation approaching $\alpha$ from above, an exponential approach with an initial bump, an integrator that accumulates before shedding — all are routinely seen in the bounded-CRN zoo, and all are consistent with “CRN computes $\alpha$” in the standard sense.

The paper’s Stage 2 argument, however, silently assumes the bump does not exist. In dialogue the author of the paper confirmed the implicit hypothesis. This is the gap: the theorem as stated assumes a strict pointwise bound $(\star)$ that is not part of the standard “bounded-CRN-computable” definition.

4. The Saturating Surrogate

The patch is a preprocessing step that does not modify any stage of the pipeline. It attaches one auxiliary species $y$ evolving by

$$ y'(t) \;=\; \bigl(x_{\mathrm{out}}(t) - y(t)\bigr) \cdot \bigl(U - y(t)\bigr), \qquad y(0) = 0, $$

where $U \in (\alpha, 1) \cap \mathbb{Q}$ is a user-chosen rational cap.

CRN-implementability. The right-hand side expands to

$$ y' \;=\; U \cdot x_{\mathrm{out}} \;+\; y^2 \;-\; x_{\mathrm{out}} y \;-\; U y, $$

which decomposes into four non-negative-coefficient reactions — production at rate $U \cdot x_{\mathrm{out}}$, the autocatalytic reaction $y \to 2y$ at rate $y$, the cross-degradation $y$ by $x_{\mathrm{out}}$ at rate $x_{\mathrm{out}} y$, and first-order decay at rate $U$. The surrogate is therefore a legal degree-$3$ CRN extension.

Invariance of $[0, U]$. The boundary check is elementary. At $y = 0$, the right-hand side equals $x_{\mathrm{out}} \cdot U \geq 0$, so $y$ never drops below zero. At $y = U$, the right-hand side equals $(x_{\mathrm{out}} - U) \cdot 0 = 0$, so $y$ never rises above $U$. Continuity of the solution then pins $y(t) \in [0, U]$ for all $t \geq 0$, regardless of what $x_{\mathrm{out}}$ does. In particular $y(t) \leq U < 1$ pointwise, satisfying $(\star)$ with $M_{\mathrm{out}} = U$.

Convergence. Set $\varphi(t) := y(t) - \alpha$ and $\varepsilon(t) := x_{\mathrm{out}}(t) - \alpha$, so $\varepsilon(t) \to 0$ by hypothesis. Rewrite:

$$ \varphi'(t) \;=\; \bigl(\varepsilon(t) - \varphi(t)\bigr) \cdot \bigl(U - y(t)\bigr). $$

Introduce the time change $\tau(t) := \int_0^t (U - y(s))\, ds$. Because $y(s) \leq U$, $\tau$ is non-decreasing; and once $y$ has stabilized near $\alpha$ (which it does, as will be verified), $U - y(s)$ is bounded below by $\kappa/2$ where $\kappa := U - \alpha > 0$, so $\tau(t)$ grows at least linearly in $t$.

In the $\tau$-variable the equation becomes linear,

$$ \frac{d\Phi}{d\tau} \;=\; E(\tau) \;-\; \Phi(\tau), $$

with Duhamel representation

$$ \Phi(\tau) \;=\; e^{-\tau}\, \Phi(0) \;+\; \int_0^{\tau} e^{-(\tau - \sigma)}\, E(\sigma)\, d\sigma. $$

Splitting this integral at the point where $\varepsilon$ has entered its decay regime, and translating back from $\tau$ to $t$ via the linear lower bound on $\tau(t)$, one obtains the modulus estimate

$$ \mu_y(r) \;\leq\; \mu_x\!\left(r + \left\lceil \log\!\tfrac{2U}{\kappa} \right\rceil\right) \;+\; \kappa^{-1} \log\!\tfrac{2U}{\kappa} \;+\; 1. $$

In plain terms: if $x_{\mathrm{out}}$ computes $\alpha$ in real time, then so does $y$, with the linear rate slowed by a multiplicative factor $O(\kappa^{-1})$ that depends only on how much room $U$ leaves above $\alpha$.

5. What the Patched Theorem Looks Like

Re-running the four-stage pipeline with $y$ as the designated output (rather than the original $x_{\mathrm{out}}$) removes the need for the implicit hypothesis altogether. The Lean formalization records this as

$$ \texttt{bounded\\_crn\\_is\\_lpp\\_computable\\_unconditional}: $$

given any $\alpha \in [0,1]$, any bounded certified PIVP computing $\alpha$, and any polynomial-CRN decomposition of that PIVP, there exists an LPP computing $\alpha$. No hypothesis on the pointwise range of $x_{\mathrm{out}}$ is assumed; the saturating surrogate supplies the missing bound internally.

The corresponding Lean 4 file is Ripple/LPP/SaturatingSurrogate.lean, and the reformulated main theorem lives in Ripple/LPP/BoundedLPP.lean. The technical note projects/Bounded/notes/saturating-surrogate-LPP.tex in the Ripple tree contains the paper-form statement with proofs.

6. What Formalization Added

It is sometimes said that formal verification only checks what mathematicians already know. This case is a small counterexample. The LPP paper’s authors were correct on their intended theorem; they were not wrong about the behavior of CRNs; and yet the written statement was slightly weaker than what is needed for the compilation to literally go through. The gap between “what the authors meant” and “what the page says” is invisible to a reader who shares the authors’ intuition, and to a reviewer with limited time. It is not invisible to a type checker.

The patch is small. The discovery that a patch was needed would not have happened without the attempt to write the proof inside a language that forbids the eye from sliding.

工欲善其事，必先利其器。

A craftsman who wishes to do his work well must first sharpen his tools.

— 《論語·衛靈公》 · The Analects, Wei Ling Gong