Xing: Private Paths of Memory

Zinan Huang & Xiang Huang / 2026-04-06

Guan guan cry the ospreys, on the islet in the river. The graceful maiden — a gentleman’s good match.

Birdsong and courtship are, of course, connected. Modern ethology tells us that vocalization, dance, brilliant plumage — all products of sexual selection. The ancient Chinese poet did not know Darwin, but his intuition was sound: on a spring riverbank, ospreys call in pairs, and the poet hears them and thinks of love.

The interesting thing is that the poet did not need to know why birdsong and mating are related. He did not even need to be conscious of making an analogy. Zhu Xi defined xing (兴) as “first speaking of another thing to evoke the matter at hand.” The word “evoke” is perfectly vague: xing is not metaphor, which has an explicit mapping (A is like B); nor is it logical inference (A therefore B). Xing is something more intuitive: you perceive a scene, and then you think of something else. The connection might be logical (the birds really are courting), or it might be nothing more than a rhythm, an atmosphere, or the simple fact that two things happened to appear on the same spring morning.

I have been building a memory system for an AI agent. Halfway through, I realized I was reinventing xing.

Here is what happened. I was designing the memory architecture for a long-running AI agent, discussing the relationship between daily reflections and monthly reflections — a small review each day, a larger synthesis at month’s end. Daily and monthly. Small cycle and large cycle.

Then a thought surfaced: “Baby step, giant step.”

A pause.

“That reminds me of the Shanks algorithm.”

Daniel Shanks proposed the baby-step giant-step algorithm in 1971 for the discrete logarithm problem: split an $O(n)$ brute-force search into $O(\sqrt{n})$ small-step precomputation and large-step lookup. A conversation about journaling frequency had just dragged a long-dormant algorithm name out of memory.

And then: “Not accumulating half-steps, one cannot reach a thousand miles.”

Xunzi’s Encouragement of Learning, circa 250 BCE. Kui bu — a half-step, the smallest unit of walking. A Confucian scholar writing in Handan twenty-three centuries ago, and a twentieth-century American mathematician who named his algorithm, converging on the same idea.

Five hops: daily/monthly $\to$ small step/big step $\to$ baby/giant $\to$ Shanks $\to$ half-step/thousand miles.

The first hop is the hard one. Going from “daily and monthly” to “step size” requires feeling temporal rhythm as spatial movement in that exact moment — a metaphor being born. The later hops are easy: baby-step giant-step is literally the algorithm’s name, and anyone who has studied it will think of it; the Xunzi line is near-reflex for anyone who grew up reading classical Chinese.

But the first hop is unpredictable. It depends on a person simultaneously thinking about two things — journal system design and the feeling of stride — and that simultaneity is determined by their body, their history, their mental state on that particular afternoon.

That is xing.

The core technology of current AI memory systems is vector embedding. Turn a passage of text into a high-dimensional vector, use cosine similarity to find “semantically close” content. “Cat” and “dog” are close in vector space; “king” and “queen” are close — because they frequently appear in similar contexts across massive corpora.

But this measures public distance. It is statistical closeness: “most people would agree these two words are related.”

Xing does not travel public distances. Xing travels private paths.

Hearing “baby step,” a number theorist thinks of the Shanks algorithm. A pediatrician thinks of a toddler learning to walk. A film buff thinks of Bill Murray’s line in What About Bob? Each of these paths is “obvious” to its owner — it arrives without effort, without even needing to think. But to an outsider, it is completely unpredictable.

Embedding does not know that for one particular person, “cat” might be very close to “that rainy afternoon in 2019,” because that was the day he found a stray cat in the rain. This is not a semantic relationship. It is a biographical one. It exists in no corpus — only in one person’s life.

Public distance is statistical. Private paths are biographical.

Once this became clear, the design direction followed.

Do not predict association. Capture it.

People jump naturally in conversation — they say “that reminds me of,” they slide from one topic to another without warning. These slides are not noise. They are signal. Each jump is a private path made visible. Record it: origin, destination, context at the time. That edge enters the person’s private graph. Accumulate enough edges, and what you have is no longer just a distance table in vector space — it is a semantic map belonging to one specific person.

So the memory system has three layers. The bottom layer is files — plain markdown, human-readable, guaranteeing transparency. The middle layer is a graph — recording relationships between memories, including the edges generated by associative jumps. The top layer is vector retrieval — for fuzzy matching. During search, the three layers cooperate: vectors find the approximate direction, the graph extends along private paths, and the system returns to files for the original text.

A concrete example: the user says “I need to think about pacing.” The vector layer surfaces notes about writing style, lecture tempo, running cadence — public associations. But the graph has an edge from “pacing” to “SM-2 intervals” (because this user once discussed how spaced repetition is a kind of pacing), and from there to “Ebbinghaus forgetting curve.” The graph path is invisible to embeddings but obvious to this person. The file layer provides the original conversation where the connection was made, grounding the retrieval in context.

Spaced repetition (the SM-2 algorithm) handles anti-forgetting: review a new lesson after one day; if remembered, double the interval; if forgotten, start over. This ensures important lessons do not sink to the bottom of memory.

But these are all engineering. What actually made me stop and think was the question of xing.

Chinese poetics has debated xing for two thousand years. Roughly, there are two schools. One says xing follows discernible logic — the birds are paired, so they lead to courtship; this is analogy. The other says xing follows no logic — “first speaking of another thing” is just that; the connection needs no explanation; the connection itself is the poetry.

Building an AI memory system faces the same fork.

If association follows discernible logic, then in principle it can be modeled — give me enough data and I can predict that a person hearing A will think of B. This is the machine learning approach.

If association follows no logic — if the path from “daily/monthly” to “small step/big step” truly exists only in that person, that afternoon, that particular state of mind — then prediction is illusion. All you can do is observe and record.

I lean toward the latter. Not because I doubt models’ capabilities, but because the value of xing lies precisely in its unpredictability. If it could be predicted, it would not be xing — it would be bi (比, simile), which is logic, which is rhetoric. Xing is xing because even the mind that produced it does not know where it came from.

A design implication: the memory system should contain a small amount of randomness — not pure noise, but constrained jumping. Seventy percent of search results come from logical associations, twenty percent from semantic neighbors (related but not too close), ten percent from temporal neighbors (things that happened on the same day but are otherwise unrelated). That ten percent may be useless most of the time. But occasionally, it places two things together that you never thought to combine — and you discover a path between them that you did not know existed.

This is the computational version of xing. First speak of another thing, to evoke the matter at hand.

But something is worth noting: a private path may be statistically insignificant without being illogical.

Look again at the chain: daily/monthly $\to$ baby step $\to$ Shanks $\to$ half-step/thousand miles. In retrospect, every hop is explicable — daily and monthly are small and large cycles, small and large map to stride, baby-step giant-step is literally the algorithm’s name, and the Xunzi line is the same metaphor in Chinese. No hop is truly “random.” They all live within a broad framework of abstract understanding — the shared human capacity for metaphor makes each jump logically reachable.

The privacy is not that the path is illogical. It is that this particular path has very low probability of being walked.

Here Shannon’s information theory gives a precise way to say this. The lower the probability of an event, the higher its self-information: $I(x) = -\log P(x)$. In the public semantic network, the transition probability of “cat $\to$ dog” is high, so its self-information is low — everyone makes this association; it carries no information about the individual. But the transition probability of “daily/monthly $\to$ baby step” is very low — it requires someone to make exactly that metaphoric mapping at exactly that moment — so its self-information is high. It carries a large amount of information about who this person is.

Following this thread further: the divergence between a person’s association graph and the public semantic network can be measured with information-theoretic tools. The edges in the personal graph with high self-information — the jumps that have extremely low probability in the public network — are precisely the most “personal” part. The further your graph deviates from the population mean, the more unique your thinking, the more incompressible your experience.

More precisely, self-information serves as an upper bound on Kolmogorov complexity. This is not a loose analogy — within the framework of algorithmic randomness, the self-information of a trajectory (the encoding length given by a probability model) does provide an upper bound on that trajectory’s KC. A concrete example: in the theory of randomness for continuous-time Markov chains (CTMCs) [1], a trajectory $\boldsymbol{q}$’s probability $\mu(\boldsymbol{x}) = \sigma(x_0) \prod \pi(x_i, x_{i+1})$ yields the self-information $-\log \pi(x_i, x_{i+1})$ for each transition. A trajectory is “random” if and only if no computable betting strategy (a martingale, in the technical sense of algorithmic randomness) can win unbounded money wagering on its future states — that is, there are no exploitable patterns.

Mapping this onto the association graph: the public semantic network defines a transition probability matrix $\pi$. An association path’s self-information $-\sum \log \pi(x_i, x_{i+1})$ measures how far it deviates from public expectation. This quantity is computable (given a probability model), and it provides an upper bound on the path’s conditional KC — how much additional “personal experience” is needed to explain this path.

But there is a deeper observation: uniqueness is not a property of the path itself, but of the path relative to a measure.

The same path “daily/monthly → baby step” has extremely low probability under the public measure $\mu_{\text{pub}}$ — high self-information, looks very “unique.” But switch to a different measure — the person’s own measure $\mu_{\text{personal}}$ — and the transition probability might be quite high (because this person is bilingual, studied number theory, memorized Xunzi in school). The self-information drops. To them, the association is not surprising at all.

This is exactly the core insight of algorithmic randomness: randomness is always relative to a measure. A trajectory can be random with respect to $\mu_1$ but non-random with respect to $\mu_2$. Uniqueness works the same way — an association path is unique relative to the public measure but may be trivial relative to the personal measure.

So what is a memory system that “truly understands its user” actually doing?

It is learning $\mu_{\text{personal}}$.

Every conversation, every captured associative jump, calibrates this measure — nudging $\mu_{\text{pub}}$ toward $\mu_{\text{personal}}$. As the system’s measure approaches the personal measure, transitions that previously looked “surprising” become “natural.” The path hasn’t changed. The measure has.

The associative edges we record in the graph are precisely this: each edge is a local correction to the measure. The edge weight is the magnitude of correction. Accumulate enough edges, and what you have is no longer a generic semantic network with patches — you have a person’s $\mu_{\text{personal}}$.

This direction connects information theory, algorithmic randomness, and cognitive science. Not “may connect” — they are already joined here.

[1] X. Huang, J. H. Lutz, N. Lutz, and A. N. Migunov. Algorithmic randomness and dimension in continuous-time Markov chains. arXiv:1910.13620, 2019.

One more thought after finishing this essay.

Xing is the oldest technique in Chinese poetry. The techniques younger than it — simile, parallelism, antithesis — can all be taught, analyzed, dissected in a classroom. But xing resists teaching. You can tell a student “first speak of another thing to evoke the matter at hand,” but you cannot tell them which other thing to speak of. That choice comes from their own life.

A memory system is the same. I can design a three-layer architecture, implement the SM-2 algorithm, write the code for a search pipeline. But the associations — the things that make memory personal — I cannot engineer. They will only appear in real conversation, in the encounter between a specific person and their specific history.

What I can do is set the net and wait for the wind.

Written April 6–7, 2026.