Areas of inquiry · measured at N = 10⁶

Structured memory

Memory that scales with structure, not with context length.

Analogy, counterfactual, compositional binding — measured at the million-entry mark.

binding
g e p

grounding ⊗ entity → bound percept · the unit of recall

What we mean by structured memory

Memory that scales with structure, not with context length

We represent knowledge as bound structures — products of grounding and entity vectors — rather than flat token streams. Concepts compose, decompose, and recombine without being re-derived from context every time. Multi-hop chains stay tractable as collections grow because retrieval is a walk over structure, not a search over a buffer.

What the binding gives us

Three behaviours we get for free

M1

Analogy

Bound structures map cleanly across domains, so analogies are first-class retrievals.

M2

Counterfactual

The same machinery that retrieves an analogy can re-role and re-run the binding to evaluate a counterfactual.

M3

Multi-hop recall

Chains of two, five, or ten hops are walks over structure. Cost grows with structure, not with token length.

Scaling

P@5 stays at 1.00 as N goes to one million

We measure precision-at-5 on a held-out multi-hop retrieval task. The structured-memory line stays at 1.00 across three orders of magnitude; the byte-level baseline drifts down as the collection grows.

1.00 0.85 0.70 0.55 1.00 0.94 N = 50k 1.00 0.86 N = 200k 1.00 0.71 N = 1M
structured memory byte-level baseline (reference)

P@5 1.00

Multi-hop precision at one million entries

Held-out evaluation, no leakage between training and retrieval.

+0.36

Slot-factored over byte-level on relational binding

Held-out role-swap test, multi-seed.

8 / 8

Scientific gates closed in v0.3.0

Every release blocker was a measured test.

Slot vs byte

Role-swap is where structured binding earns its place

“Agent A names target B” is not the same proposition as “agent B names target A”. Slot-factored binding holds the role assignment; byte-level retrieval flattens it.

statement slot-factored byte-level Δ
"agent A names target B"
1.00
0.63
+0.37
"agent B names target A" (swapped)
1.00
0.60
+0.40
compositional new pair
0.99
0.69
+0.30

role-swap test, held-out, n = 5 seeds. multi-seed report in [[beyond-transformers]].

01

Scales with structure

Memory cost grows with the relational structure of what is stored, not with the length of the context window. That makes multi-hop reasoning tractable as collections grow into the millions of entries.

02

How we evaluate it

Every claim is measured with held-out tests and multi-seed error bars. Where it matters, we run the same task in a slot-factored memory and in a byte-level baseline so the lift is attributable, not assumed.