The Geometry Hiding in Algebraic Manipulations: A Manifold Perspective on CRN Computation

2026-04-16

When Huang and Huls proved that population protocols can simulate any GPAC — that chemical reaction networks compute the same set of real numbers as analog computers [1] — the proof was built on four stages of algebraic manipulation. Substitutions, rescalings, conservation tricks, self-products. Each step felt like “just algebra.” But when I started formalizing this proof in Lean 4, a geometric picture emerged that I think illuminates what the algebra is really doing.

The Setup

A Chemical Reaction Network (CRN) is a polynomial ODE system where every equation has the form

$$x_i' = p_i(x) - q_i(x) \cdot x_i, \quad p_i, q_i \geq 0.$$

The constraint — every negative term must contain $x_i$ as a factor — comes from chemistry: you can only destroy a species that’s present.

A Population Protocol (PP) is a more restricted CRN where reactions are bimolecular ($X + Y \to W + Z$), the system is conservative ($\sum x_i' = 0$), and, crucially, there are no “self-square” terms ($x_i^2$ cannot appear as a positive term in $x_i'$). These constraints mean the balance equation takes the elegant form

$$x_r' = f_r(x) - 2 x_r \cdot \sum_k x_k,$$

where $f_r$ is a quadratic form with non-negative coefficients.

The Main Theorem [1]: Every real number computable by a GPAC is also computable by a population protocol.

The proof has four stages. Each is an algebraic construction. Each, I’ll argue, is really a geometric operation on manifolds.

Stage 2: The Simplex as an Invariant Manifold

After Stage 1 (reducing to quadratic CRNs, which I won’t focus on here), Stage 2 transforms the system into a cubic form:

$$x_i' = \big(P_i(x) - Q_i(x) \cdot x_i\big) \cdot x_0, \quad x_0' = -\sum_{i \neq 0} x_i'.$$

This involves three tricks:

The $\lambda$-trick (constant dilation): scale variables so everything fits in $[0,1]$.
The one-trick: introduce a balancing variable $x_0$ with $\sum x_i = 1$.
The $g$-trick (balancing dilation): multiply every $x_i'$ by $x_0$.

In geometric terms, the $\lambda$-trick is a contraction map that pulls the trajectory into a compact region. The one-trick constrains dynamics to the probability simplex $\Delta = \{x \geq 0 : \sum x_i = 1\}$ — a $(d-1)$-dimensional manifold embedded in $\mathbb{R}^d$. The conservation law $\sum x_i' = 0$ is precisely the statement that $\Delta$ is an invariant manifold of the ODE.

The $g$-trick is subtler. Multiplying by $x_0$ serves two purposes: it makes every term cubic (homogeneous degree 3), and it introduces $x_0$ into every negative term, preventing self-cubes. In manifold terms, the $g$-trick is a time reparametrization that slows dynamics near the boundary of the simplex (where $x_0$ is small), effectively “cushioning” the flow.

Stage 3: The Self-Product Manifold and Vector Field Extension

This is where the geometric picture becomes essential. Given the cubic form system on $d$ variables, we construct a self-product:

$$z_{i,j}(t) = x_i(t) \cdot x_j(t), \quad (i,j) \in \{0, \ldots, d\}^2.$$

This defines a map $\varphi\colon \mathbb{R}^d \to \mathbb{R}^{d^2}$ sending $x \mapsto (x_i \cdot x_j)_{i,j}$. In algebraic geometry, this is a variant of the Veronese embedding [2] — the map that sends a point to all its degree-2 monomials.

The image of $\varphi$ is a $d$-dimensional algebraic subvariety $M \subset \mathbb{R}^{d^2}$, defined by the symmetry equations $z_{i,j} = z_{j,i}$ and the rank-1 constraint $z_{i,j} \cdot z_{k,l} = z_{i,k} \cdot z_{j,l}$. We call $M$ the self-product manifold.

The Problem: Degree Inflation

By the product rule, $z_{i,j}' = x_i' \cdot x_j + x_i \cdot x_j'$. Since each $x_k'$ is cubic, $z_{i,j}'$ is a sum of degree-4 terms in the original $x$-variables. Rewriting in $z$-variables via the row-sum $\sum_j z_{i,j} = x_i \cdot \sum_j x_j = x_i$ (using $\sum x_j = 1$), we get a vector field on $\mathbb{R}^{d^2}$ that is degree 4 in the $z$-variables.

But population protocols require quadratic fields! Degree 4 is too high.

The Solution: Manifold Agreement

Here is the key geometric insight. We construct a different vector field on $\mathbb{R}^{d^2}$ that is:

Quadratic in the $z$-variables (degree 2).
Equal to the degree-4 field on the manifold $M$.

The construction uses symbolic substitution: wherever the degree-4 expression contains $x_a \cdot x_b$, replace it with $z_{a,b}$. This is only valid on $M$ (where $z_{a,b}$ actually equals $x_a \cdot x_b$), but that’s all we need — because $M$ is an invariant manifold of the ODE.

If the trajectory starts on $M$ (i.e., $z_{i,j}(0) = x_i(0) \cdot x_j(0)$), it stays on $M$ for all time. On $M$, the two fields agree. So the ODE solutions are identical on $M$, even though the two fields disagree everywhere else in $\mathbb{R}^{d^2}$.

This is the geometric content of Theorem 15 in [1].

The Three Cases

The symbolic substitution yields three cases depending on the indices $(i,j)$:

Case 1 ($i, j \neq 0$): Both variables are “regular.” The field decomposes cleanly:

$$z_{i,j}' = \big(z_{0,j} \cdot P_i^z + z_{0,i} \cdot P_j^z\big) - z_{i,j} \cdot \big(x_0 Q_i^z + x_0 Q_j^z\big)$$

where $P_i^z$ and $x_0 Q_i^z$ are linear in $z$ (the symbolic lifts of the quadratic/linear $P_i$, $Q_i$).

Case 2 (one index is 0): The conservation variable $x_0$ is involved. The field picks up additional coupling terms from the conservation law.

Case 3 ($i = j = 0$): The purely conservation-driven case:

$$z_{0,0}' = 2 z_{0,0} \cdot \big(\mathrm{totalQxz} - \mathrm{totalPz}\big).$$

In each case, the negative terms factor through $z_{i,j}$ (CRN-implementability), and no $z_{i,j}^2$ appears as a positive term (no self-square).

A Subtlety Revealed by Formalization

The formalization process uncovered a subtlety in the original proof. The symbolic $z$-field is conservative on the self-product manifold $M$ (where $z_{i,j} = z_{j,i}$), but not for arbitrary $z$-inputs. For $d = 2$, the residual is

$$\sum_{i,j} F(z)_{i,j} = z_{0,0} \cdot (z_{0,1} - z_{1,0}) \cdot P_1^z,$$

which vanishes on $M$ (where $z_{0,1} = z_{1,0}$) but not in general. This doesn’t affect the computation — the trajectory never leaves $M$ — but it means the algebraic characterization of PP-implementability (which requires global conservation) needs to be applied with care. The manifold perspective clarifies exactly where the conservation lives and why it suffices.

What the Geometry Tells Us

The algebraic manipulations in the four-stage construction are not arbitrary. They implement a specific geometric program:

Stage	Algebraic Operation	Geometric Interpretation
1	Introduce $v$-variables for monomials	Veronese-type embedding (degree reduction)
2	$\lambda$-trick + one-trick + $g$-trick	Contraction to simplex + invariant manifold + time rescaling
3	Self-product + symbolic substitution	Veronese embedding + vector field extension from invariant submanifold
4	Product distribution $\alpha_{i,j,k,l} = c_k c_l / 4$	Discretization of the quadratic form into pairwise interactions

The deepest step is Stage 3: the passage from a degree-4 field (which is “true” but not PP) to a degree-2 field (which is PP but only “true” on the invariant manifold). The geometric framework makes clear why this works — the ODE solution never leaves the manifold, so the extension to all of $\mathbb{R}^{d^2}$ is irrelevant for computation.

Open Questions

This geometric perspective suggests several directions:

Generalization: The self-product manifold is one specific algebraic variety. Are there other varieties where a similar “degree reduction via field extension” works? Could this lead to more efficient CRN constructions?
Fiber bundle structure: The simplex $\Delta$, the self-product manifold $M$, and the non-negative orthant form a nested hierarchy of constraint manifolds. Is there a natural fiber bundle structure that organizes the relationships between the four stages?
Complexity interpretation: The $\lambda$-trick introduces a linear slowdown ($x_0$ acts as a clock). Can invariant manifold theory quantify the convergence rate loss more precisely?
Veronese connection: The self-product $x \mapsto (x_i x_j)$ is exactly the degree-2 Veronese embedding. Stage 1’s $v$-variables generalize this to higher degrees. Is the degree-complexity hierarchy in CRN computation related to the geometry of Veronese varieties?

References

[1] X. Huang and R. Huls. “Computing Real Numbers with Large-Population Protocols.” In Proc. 28th International Conference on DNA Computing and Molecular Programming (DNA 28), pp. 48–67, 2022. doi:10.1007/978-3-031-20624-5_4

[2] J. Harris. Algebraic Geometry: A First Course. Springer Graduate Texts in Mathematics 133, 1992. (For the Veronese embedding, see §2.)

[3] M. W. Hirsch, S. Smale, and R. L. Devaney. Differential Equations, Dynamical Systems, and an Introduction to Chaos. 3rd edition, Academic Press, 2013. (For invariant manifold theory.)

[4] X. Huang, T. H. Klinge, and J. I. Lathrop. “Real-Time Equivalence of Chemical Reaction Networks and Analog Computers.” In Proc. 25th International Conference on DNA Computing and Molecular Programming (DNA 25), pp. 37–53, 2019. doi:10.1007/978-3-030-26807-7_3

This perspective emerged during a Lean 4 formalization of the construction in [1]. Thanks to the formalization process for forcing every algebraic step to justify itself, and for revealing the geometry that the algebra was encoding all along.

形而上者谓之道，形而下者谓之器。 — 《易经·系辞上》

What lies above form is called the Way; what lies below form is called the vessel. — Yijing, Commentary on the Appended Phrases