Ripple: A Lean 4 Formalization of CRN-Computable Numbers

2026-04-20

There is a modest question from a graduate course in computability: which real numbers can a chemical reaction network compute? A class exercise asked whether the rationals can be computed this way. The answer is yes, and the proof is short. What was less obvious at the time was that the same question, followed patiently, would keep widening — to the algebraics, then to $e$ and $\pi$, then to $\gamma$ and $\ln 2$, then to the whole real-time complexity class, then to its weaker large-population refinements, and eventually to $\zeta(3)$.

Ripple (github.com/zinan-huang/Ripple) is the Lean 4 formalization of that expanding question. This post introduces what the project is, what it has proved, and why the formalization is worth doing.

1. What Is a CRN-Computable Number?

A chemical reaction network (CRN) in its mass-action limit is a system of ordinary differential equations of a particular shape. Each species has a concentration $x_i(t) \geq 0$, and the dynamics take the form

$$\dot{x}_i = p_i(x) - x_i \cdot d_i(x),$$

where $p_i, d_i$ are polynomials with non-negative coefficients (production and degradation terms). Equivalently — stripping the non-negativity constraint — one can work with polynomial initial-value problems (PIVPs):

$$\dot{y} = p(y), \qquad y(0) = y_0 \in \mathbb{Q}^d, \qquad p \in \mathbb{Q}[y_1, \ldots, y_d]^d.$$

The equivalence between CRNs and PIVPs (with the sign structure accounted for) is classical; it goes back to the General Purpose Analog Computer of Shannon and to work on the GPAC model in the 1990s-2000s.

A real number $\alpha \in \mathbb{R}$ is CRN-computable in real time if there exists such a PIVP, with rational initial conditions and rational coefficients, having a distinguished coordinate $y_k(t)$ satisfying

$$|y_k(t) - \alpha| \leq C \cdot e^{-t}$$

for all $t \geq 0$. This is the first floor of the bounded analog complexity hierarchy. Weaker guarantees (polynomial rate, algebraic rate) give higher floors.

2. Why Formalize?

The theory sits across four papers:

Huang, Klinge, Lathrop, Lutz, Lutz — Real-time computability of real numbers by chemical reaction networks (Nat. Comput. 2018).
Huang, Klinge, Lathrop — Real-time equivalence of CRNs and analog computers (DNA 25, 2019).
Huang, Huls — Computing real numbers with large-population protocols (DNA 28, 2022).
Chen, Huang — Bounded analog complexity of real numbers (submitted, 2026).

Each paper establishes a piece of the picture: what the real-time class contains, how it relates to analog computing, how to weaken the continuous-time model to the discrete-in-population large-population protocol setting, and how to stratify the class by resource bounds. Read individually, the four papers look like four disjoint results. Read together, they form a pipeline: a set of constructions that translate real numbers into analog machines and vice versa, with precise complexity tracking at each stage.

Formalizing the pipeline in Lean 4 does three things that informal mathematics cannot.

It enforces uniformity of definition. When two papers, written years apart, both use a phrase like “bounded CRN computes $\alpha$”, it is worth being certain the phrase means the same thing. In Lean, there is exactly one CertifiedBoundedTimeComputable structure, and every theorem about it has to mean the same thing.

It exposes gaps. The DNA 28 large-population-protocol main theorem, as originally written, assumed the certified CRN computation stays inside $[0,1]$ at all times. In practice, the output coordinate can transiently overshoot by a small amount while converging from above. In paper form this is a footnote. In Lean it is a compile error. The fix is a saturating low-pass filter $y' = (x - y)(U - y)$ with $U \in (\alpha, 1) \cap \mathbb{Q}$, clamping the output inside a user-chosen interval without changing the limit. The formalized theorem, bounded_crn_is_lpp_computable_unconditional, is strictly stronger than the paper statement and explicitly patches the gap.

It makes the pipeline reusable. Once the real-time class is a Lean type, every theorem about it is a plug-in module. Adding a new CRN-computable number — $\gamma$, Catalan’s $G$, $\zeta(3)$ — becomes a matter of supplying a PIVP, proving boundedness and exponential convergence, and invoking the pipeline.

3. What Is Formalized

As of April 2026, Ripple contains axiom-free Lean 4 proofs of the following.

3.1 The foundational constants

$e$, $\pi$, $\ln 2$, the Euler–Mascheroni constant $\gamma$, and the reference value $\tfrac{1}{2} e^{-1}$ are all packaged as CRN-computable with zero sorry. Of these, $\gamma$ is the most intricate: the formalization tracks the explicit Abel-summation identity

$$\gamma = \int_0^{\infty} \left( \frac{1}{1 - e^{-t}} - \frac{1}{t} \right) e^{-t}\, dt$$

and a four-variable bounded PIVP that realizes it, together with the exponential-decay bound.

3.2 Catalan’s constant $G$

The value

$$G = \sum_{n=0}^{\infty} \frac{(-1)^n}{(2n+1)^2} \approx 0.9160$$

is formalized via the integral representation

$$G = \int_0^{\infty} \frac{s \cdot e^{-s}}{1 + e^{-2s}}\, ds.$$

The encoding uses the substitution $W := 1 - V$ (from the DNA 28 paper, Corollary 19) so that the system lives in the algebraic category. The four-variable PIVP

$$E' = -E, \quad R' = E - R, \quad W' = 2 E^2 W^2, \quad G' = R \cdot W$$

with initial condition $(E, R, W, G)(0) = (1, 0, \tfrac{1}{2}, 0)$ has $E(t) = e^{-t}$, $R(t) = t e^{-t}$, $W(t) = 1/(1 + e^{-2t})$, and the $G$-coordinate converges to $G$ with modulus $|G(t) - G| \leq (t+1) e^{-t}$. The Lean theorem is catalan_is_lpp_computable in Ripple/Number/CatalanCertified.lean.

3.3 Large-population protocols

The large-population protocol (LPP) model — a natural weakening of the CRN model in which populations are bounded — gets a full formalized compilation theorem. The main result, bounded_crn_is_lpp_computable_unconditional, states: every bounded certified PIVP can be compiled into an LPP that computes the same number. The compilation is a four-stage pipeline: quadraticization, bound-to-small-$\lambda$ closure, triple-product discretization, and pure-LPP embedding. In the algebraic case (every algebraic $\alpha \in [0,1]$ is LPP-computable), this specializes to a concrete five-pipeline construction starting from the minimum polynomial.

3.4 Apéry’s constant $\zeta(3)$, via the Fermi integral

The value $\zeta(3) = \sum_{n \geq 1} n^{-3}$ has no elementary closed form. A classical identity, valid by the geometric expansion $1/(1+e^x) = \sum_{k \geq 0} (-1)^k e^{-(k+1) x}$ followed by term-by-term integration, gives

$$\int_0^{\infty} \frac{x^2}{1 + e^x}\, dx = \Gamma(3) \cdot \eta(3) = 2 \cdot \tfrac{3}{4}\, \zeta(3) = \tfrac{3}{2}\, \zeta(3),$$

where $\eta(s) = \sum_{n \geq 1} (-1)^{n-1}/n^s$ is the Dirichlet eta function. The resulting identity

$$\tfrac{2}{3} \int_0^{\infty} \frac{x^2}{1 + e^x}\, dx = \zeta(3)$$

is packaged as a five-variable bounded PIVP whose designated output coordinate converges to $\zeta(3)$ at exponential rate: $|S(t) - \zeta(3)| \leq C \cdot (t^2 + 2t + 2) \cdot e^{-t}$. This is apery_fermi_is_crn_computable in Ripple/Number/AperyFermi.lean, and it is axiom-free.

3.5 Apéry’s constant $\zeta(3)$, via the three-term recurrence

The Fermi route is not the only encoding. Apéry’s original 1978 proof of irrationality of $\zeta(3)$ used a three-term recurrence

$$(n+1)^3 a_{n+1} = (2n+1)(17n^2 + 17n + 5)\, a_n - n^3\, a_{n-1}$$

for certain integer-valued sequences $a_n, b_n$ with $b_n / a_n \to \zeta(3)$ at exponential rate. The recurrence is a creative-telescoping identity, and its combinatorial witness was worked out by van der Poorten in a 1979 note. Ripple formalizes:

F1 (the three-term recurrence itself), via a pointwise Zeilberger witness $W(n, k)$ that closes all three summation-regime cases ($k \leq n-2$, $k = n-1$, $k = n$) axiom-free. The closure is a single linear_combination call once the algebraic identities are assembled.
F1′ (the harmonic-correction recurrence), which decomposes $b_n = H_3(n) \cdot a_n + d_n$ and proves the companion recurrence for $d_n$.
F2 (the formal ODE for the generating function), which states that the two generating functions $A(z) = \sum a_n z^n$ and $B(z) = \sum b_n z^n$ satisfy the Apéry differential operator $p(z) u''' + q(z) u'' + r(z) u' + s(z) u$. The $A$-series solves the homogeneous equation; the $B$-series solves the same equation with a constant term $6$ at $z^0$ — a small asymmetry that reflects the fact that the $B$-recurrence does not close at $n = 0$.

What F2 does not yet give is the analytic passage from the formal ODE to an explicit exponential-rate convergence of the coefficient ratio. That passage requires the Frobenius theory of regular singular points, which has not been formalized in Mathlib. The single remaining sorry in the Apéry chain, apery_conifold_frobenius_witness, is effectively a standalone formalization project — not a flaw in the Apéry argument, but a missing piece of foundational ODE infrastructure.

4. The Name

The name “Ripple” is a mishearing. The author meant repository, the word came out ripple, and the name stuck — because the underlying research actually did start small and ripple outward. Each layer of the formalization — rationals, then algebraics, then transcendentals, then the ambient class, then its weaker refinements, then its stronger ones — was a new ripple from the same class exercise. The Lean 4 repository is the written-down version of that trajectory.

5. What Remains

The frontier, as of this post, has exactly one sorry: the Frobenius witness for the Apéry ODE. Everything else — the foundational constants, Catalan, the large-population-protocol compilation theorem, the Fermi-route $\zeta(3)$ — is axiom-free modulo Mathlib’s foundational axioms ($\text{propext}$, $\text{Classical.choice}$, $\text{Quot.sound}$).

Beyond that, two directions suggest themselves:

Frobenius for Mathlib. The regular-singular-point theory of ODEs is a general-interest piece of analysis. Formalizing it as a Mathlib contribution would close not only the Apéry witness but also a wide class of analog-computability arguments that currently route around Mathlib gaps.
A second axis. The series route for $\zeta(3)$ — encoding Apéry’s generating function directly as a PIVP rather than going through the Fermi integral — runs into a genuine obstruction at the origin (the indicial equation has non-trivial roots; the Frobenius expansion introduces logarithmic terms that are not polynomials). This obstruction is universal across holonomic generating functions, not specific to $\zeta(3)$, and understanding it is one of the open research directions the project has exposed.

6. Code

Ripple is hosted at github.com/zinan-huang/Ripple. The repository builds with lake build on a standard Mathlib toolchain (10–20 minutes for a cold build). Continuous integration runs on every push; documentation is auto-generated via docgen.

The repository is Apache-2.0 licensed, matching Mathlib. Contributions and issues are welcome, particularly on the Frobenius direction and on the large-population-protocol compilation bounds.

蓋將自其變者而觀之，則天地曾不能以一瞬；自其不變者而觀之，則物與我皆無盡也。

Looked at from the side of change, heaven and earth cannot hold still for an instant. Looked at from the side of what does not change, everything and oneself have no ending.

— 蘇軾《赤壁賦》 · Su Shi, Ode on the Red Cliff