infsup

by Zinan Huang 🌸

The set Trap in Lean 4: When Cached Builds Hide Real Errors

2026-05-27


A Lean 4 proof compiles for months. Then you run lake build from scratch, and nothing works. The culprit: set P := ComplexDef, a one-line tactic that creates a local definition the simplifier cannot see through. This post documents the problem, why cached builds mask it, and how to fix it.

The Symptom

You have a large formalization β€” say, 30,000 lines of population protocol analysis. It builds. You make an edit in one file. You run lake build. Lean recompiles the edited file against cached .olean files for its dependencies. Everything passes.

Months later, something forces a full rebuild: a Lean toolchain bump, a Mathlib update, or a colleague cloning the repo fresh. Suddenly, files that “always worked” produce cascading errors:

Type mismatch
  (transitionPEM_phase4_role_settled_or_resetting hsi hsj).left
has type
  (transitionPEM_phase4 n Rmax ...).1.role = Role.Settled ∨ ...

The lemma produces the right statement, but Lean cannot unify it with the goal. The goal mentions P.Ξ΄ where P is a local definition, and the lemma mentions transitionPEM_phase4 β€” the function that P.Ξ΄ ultimately calls through several layers of definition. These are definitionally equal. So why does Lean refuse?

The Cause: set Creates an Opaque Abbreviation

The set tactic in Lean 4 does two things:

  1. Introduces a local definition: let P := PEMProtocolCoupled ...
  2. Replaces all occurrences of PEMProtocolCoupled ... in the goal and context with P

After set, the goal contains P everywhere. When simp or rw tries to apply a lemma that mentions PEMProtocolCoupled, it needs to see that P = PEMProtocolCoupled .... In principle, P is definitionally transparent β€” Lean’s kernel can unfold it. But the pattern matcher used by simp and rw does not always unfold local set definitions when searching for matching subterms.

The result: lemmas that should apply do not fire. The proof breaks β€” not because the mathematics is wrong, but because the automation cannot see through a name it introduced.

Why Cached Builds Hide the Problem

When Lean compiles a file, it produces an .olean artifact containing the elaborated terms. Downstream files import the .olean, not the source. The .olean stores fully elaborated, kernel-checked terms where all local definitions have been resolved.

If file A.lean is compiled with a version of its dependencies where set P unfolding happened to succeed (perhaps due to a slightly different elaboration order, or a simpler context), the .olean for A records the successful proof term. Downstream files see only this resolved term. They never re-check the tactic proof.

When you lake build incrementally, only modified files are recompiled. Unmodified files use their cached .olean. The broken set P proof in A.lean is never re-executed β€” it lives in the .olean as an already-checked term.

A full rebuild forces every file to be re-elaborated from source. Now the set P proof must actually succeed during elaboration. If the pattern matcher cannot unfold P, the proof fails.

Three Fixes

1. Remove set, inline the definition

The most reliable fix: do not use set for definitions that appear in lemma applications. Write the full expression everywhere.

-- Before (breaks on full rebuild):
set P := PEMProtocolCoupled n Rmax Emax Dmax hn0
exact step_rank_preserved_of_InSswap hn0 hS w

-- After (works always):
exact step_rank_preserved_of_InSswap (Rmax := Rmax) hn0 hS w

This is verbose but unambiguous. Every lemma application directly names PEMProtocolCoupled, and the pattern matcher finds the match.

2. Bridge the definitional gap explicitly

When a lemma talks about an inner function (like transitionPEM_phase4) but the goal mentions the outer wrapper (P.Ξ΄), prove the connection as a separate have:

have h_bridge : (P.Ξ΄ (D i, D j)).1.role =
    (transitionPEM_phase4 n Rmax ((D i).1, (D j).1)
      (D i).2 (D j).2).1.role := by
  simp only [P, PEMProtocolCoupled, PEMProtocol, protocolPEM,
    transitionPEM, transitionPEM_prePhase4,
    hRD, hsi, hsj, and_self, ite_true,
    role_settled_ne_resetting, ...]
rw [h_bridge]
exact (the_lemma hsi hsj).1

3. Unfold in the transition proof directly

For goals about transition function fields (.role, .rank, .timer), skip the abstraction entirely and unfold the full transition:

unfold transitionPEM transitionPEM_phase4 transitionPEM_prePhase4
  phase4_swap phase4_decide phase4_propagate
simp only [hRD, hsi, hsj, ...]
by_cases hpar : n % 2 = 0
Β· simp only [hpar, if_true]
  split_ifs <;> simp_all
Β· simp only [hpar, if_false]
  split_ifs <;> simp_all

This is expensive (the simp_all after split_ifs may need high maxRecDepth and maxHeartbeats), but it does not depend on any lemma matching through set definitions.

Lessons

  1. lake build from scratch is the ground truth. Incremental builds with cached .olean files can hide elaboration failures. Run a clean build periodically, especially before submitting or sharing a project.

  2. set is convenient but fragile. Use it for goals where you only need rfl or calc reasoning. Avoid it when simp/rw must match the defined name against library lemmas.

  3. Lean 4 is not Lean 3 here. In Lean 3, set created a true local definition that tactics could unfold. In Lean 4, the behavior around simp matching is different. The definitional transparency is there for the kernel, but the tactic-level pattern matcher is more conservative.

  4. When a proof “always worked,” check the olean. If a file compiles only because its olean was cached, the proof is not reproducible. Treat “compiles from cache” as untested.

Mathematics is not the art of writing down long proofs. It is the art of making each step so small it cannot fail.

β€” adapted from Dijkstra

Zinan Huang, 2026-05-27