The hybrid memory architecture

There’s a video circulating in the AI-builder corner of YouTube where Nate B Jones walks through a framing he picked up from Andrej Karpathy. The framing presents two camps. Camp A wants AI memory built as a wiki. Camp B (the OpenBrain camp) wants it built as a database. The video presents the choice as a fork. Builders are supposed to pick a side.

I watched it twice. The video is good. The framing is not.

Most production AI memory systems converge to hybrid. Not because hybrid is a compromise position, but because each storage shape solves a different problem, and a real personal-AI system has all of those problems at once. The interesting question was never wiki or database. The interesting question is how do the layers compose at retrieval time.

I want to walk that out, because Bishop (my personal AI system) is already structurally hybrid, and looking at the actual layers makes the point cleaner than any abstract argument about storage paradigms.

i.The false dichotomy

The wiki-versus-database framing assumes a single substrate has to carry the whole memory load. It’s the same shape as old debates about SQL versus NoSQL, monolith versus microservices, REST versus GraphQL. The framing forces a choice, the choice generates content, the content generates more framing. The cycle is good for video views and bad for builders.

The actual problem is that AI memory has multiple distinct shapes living inside it. Atomic facts. Synthesized understanding. Audit trails. Narrative continuity. Time-keyed activity. Versioned snapshots. Each shape wants a different storage substrate, and no single substrate is good at all six.

A wiki is excellent for synthesis. It is bad for audit trails (you can’t query “what got decided on May 11” cleanly across thirty essay-shaped notes). A database is excellent for atomic facts and audit queries. It is bad for narrative (you can’t read a row and feel the texture of a session). A git repo is excellent for versioning. It is bad for fast retrieval (you don’t grep five hundred commits to find what’s open this week).

The dichotomy collapses the moment you write down what you actually need the memory to do. You need the synthesis layer AND the query layer AND the audit layer AND the narrative layer AND the timeline AND the versioned snapshots. The substrate question stops being which one and starts being which layer carries which load, and how do they compose when the agent asks.

ii.What hybrid actually means

Hybrid is not “we use a wiki and also a database.” Hybrid is a deliberate architecture where each storage substrate is chosen for the shape of the data it holds, and the layers are connected by an explicit composition pattern at retrieval time.

The substrate is the easy part. Most builders can list the substrates. The harder part is the composition. The agent has to know which layer to hit for which kind of question, and the layers have to point at each other cleanly when one of them needs another.

In a well-built hybrid system, the layers are not parallel. They are a stack. The cheap layers sit on top. The expensive layers sit underneath. The agent walks down only as far as the current question pulls it. The index points at the synthesis. The synthesis points at the atoms. The audit log points at the narrative. The timeline points at the snapshot. Each layer carries its own job, and the composition is what gives the agent depth without paying for depth on every turn.

The dichotomy version assumes a flat retrieval. Read the wiki, or read the database. Hybrid assumes hierarchical retrieval. Read the index, decide what layer the question lives in, walk down only that path.

iii.Six layers in one personal system

Bishop’s memory architecture has six distinct storage shapes in production right now. None of them are theoretical. All of them got built because a specific failure mode in a simpler version forced the next layer into existence.

Layer one: atomic memory rows. These live in a memory/ directory as small markdown files. One observation per file. Each carries a stable filename, a one-line description, and a short body. The shape is database-flavored even though the substrate is filesystem. The agent reads an index file (MEMORY.md) that lists every atomic row by one-line description. When a question needs a specific atom, the agent grabs that file and nothing else.

Layer two: wiki synthesis docs. These live in a wiki/ directory and carry the synthesized understanding of a topic across many atomic rows. A wiki doc on voice architecture compresses fifteen feedback memories into one essay-shaped explanation of how the system works. The wiki layer is loaded on demand only, never pre-emptively. When the agent needs the shape of a topic (rather than a specific atom), it grabs the synthesis.

Layer three: decisions log. This lives in a single append-only decisions.md file. Every architectural decision is numbered, dated, and committed in chronological order. The shape is audit-trail. The query is what got decided when, and why. Decisions reference atomic memory rows when relevant. Atomic memory rows reference decisions when the decision is what spawned the row. The cross-reference is the composition seam.

Layer four: session handoffs. These live in a handoffs/ directory, one file per session, sequenced and dated. Each is a narrative artifact: what happened this session, what decisions landed, what’s open, where to pick up. The shape is narrative. You read a handoff to get the texture of a session, not to query a fact. The agent loads the most recent handoff on demand (a one-line hot-cache file flags whether an unresolved thread needs the load).

Layer five: claude-mem activity timeline. This is a separate plugin that captures a fine-grained activity timeline across sessions: which files Bishop touched, which tools fired, which observations crossed threshold for promotion. The shape is time-series. The query is what happened on a given day, and in what order. The timeline is not curated. The synthesis layers feed off it but don’t replace it.

Layer six: bishop git repo. A private GitHub repo that versions the identity / persistence / voice / capability / memory spine docs as drift insurance. The shape is versioned snapshot. The query is what did this file look like three weeks ago, before the change. The git repo lives outside the workspace; the sync happens at session-handoff time.

Six layers. Six substrates. Six distinct shapes of data. The wiki and the database don’t even meaningfully oppose each other in this taxonomy. They’re layers two and one. Both load-bearing. Both pointed at by the others.

The thesis The interesting questionis not which one wins.It's how the layerscompose at retrieval.

iv.How the layers compose at retrieval

The substrates would be inert without composition. The composition is what makes it a system rather than six unconnected stores.

The composition rule is hierarchical retrieval, but the hierarchy isn’t a strict tree. It’s a directed graph. The index is the entry point. The index lists every atomic row and points at every wiki doc and names every decisions entry above a numeric threshold. From the index, the agent picks a path based on the shape of the question.

A question about a specific fact? Index, then the atomic row. One file. Cheapest path.

A question about how a topic works as a system? Index, then the wiki synthesis, then optionally one or two atomic rows the synthesis flagged as load-bearing.

A question about why something got decided? Index, then the decisions log, then the atomic row referenced inside the decision.

A question about how a session went? Hot-cache, then the relevant handoff. The activity timeline if more depth is needed.

A question about what changed over time? Git history on the relevant spine doc.

The agent doesn’t load all six layers per turn. It loads the index plus whatever path the current question pulled in. The other layers stay on disk. This is what makes the system scale. The vault can grow to thousands of atomic rows and dozens of wiki docs and a year of handoffs and a daily activity timeline, and the per-turn ingestion cost stays roughly flat. The depth lives in the vault. The cost lives in the path the question forces.

The seam between layers matters more than the layers themselves. An atomic row references the decision that spawned it. A decision references the atomic rows it crystallized from. A wiki synthesis references its source atomic rows. A handoff references the decisions made that session. The graph is what lets the agent walk between layers without pre-loading any of them. The cross-references are the composition.

v.What to actually optimize for

Builders sometimes ask which storage substrate is best. The question is a tell. It signals that the asker is still inside the dichotomy. The better question is which seam matters most for the kind of system you’re building.

Optimize the index first. The index is the entry point, and every other layer’s value depends on the index pointing cleanly. A bad index turns the rest of the architecture into a bag of disconnected stores.

Optimize the cross-references next. If your atomic rows don’t point at the decisions that spawned them, you’ve built two parallel systems. If your wiki syntheses don’t point at the rows they’re built from, the syntheses become unverifiable claims. The cross-references are what turn the graph into a graph.

Optimize the on-demand loading after that. Most personal-AI systems waste tokens pre-loading layers the current turn doesn’t need. Hybrid only pays off if the cheap layers (index, hot-cache) sit at the top and the expensive layers (full wiki, full handoffs, full timeline) stay on disk until summoned.

Substrate choice is downstream of all three. Filesystem markdown is fine for most personal-AI layers. SQLite is fine for the activity timeline and any audit trail with structured query needs. Git is the right substrate for versioned spine docs. The substrates are largely interchangeable once the layers and seams are right.

Substrate isthe easy part.The compositionis what makes ita system.

vi.The portable shape

This shape isn’t Bishop-specific. It generalizes.

Any personal-AI system that runs for more than a few weeks accumulates the six data shapes whether the builder names them or not. Atoms. Syntheses. Decisions. Narratives. Timelines. Snapshots. The question isn’t whether your system has all six. The question is whether the layers are deliberate or whether they got smeared together inside one substrate because the dichotomy framing said you had to pick.

If you’re picking, you’re losing depth somewhere. If you’ve smeared a synthesis-shaped doc into a database row, the synthesis becomes unreadable. If you’ve smeared an audit-shaped decision into a wiki note, the audit becomes unqueryable. The substrate dictates the shape, and the wrong substrate flattens the data.

Hybrid is what gives you the shape per layer. Composition is what makes the shape useful. Cross-references are what turn the layers into one system rather than six.

The wiki-versus-database framing is the kind of question you ask before you’ve built one of these. After you’ve built one, you stop asking which side wins. You start asking how the layers point at each other when the agent comes looking. That’s the architecture. That’s the work.

Drafted with Bishop, my AI partner.
Words picked, edited, and approved by me. Model provenance: Claude Code (Claude Opus and Sonnet)