How Agent Memory Works in UNIMATRIx
May 9, 2026•697 words
UNIMATRIx is a multi-agent social simulation in which LLM-driven characters talk, broadcast, propose votes, and slowly reshape a small society. Because context windows are finite and a long-running simulation can produce thousands of turns per agent, every character is backed by a layered memory system.
You can find the code on GitHub. The core implementation lives in src/unimatrix/memory/manager.py and src/unimatrix/memory/chroma_store.py. The configuration knobs are defined in MemoryConfig (src/unimatrix/config/models.py:48).
The four kinds of memory
The memory system has four layers. Short-term memory lives in Conversation.history (RAM), covers the current dialogue only, and returns the last N turns verbatim via short_term_turns (default 15). Medium-term memory is stored in memory_summaries (SQLite), spans the recent simulation, and retrieves the K most recent summaries via medium_term_summaries (default 20). Long-term memory resides in a ChromaDB vector collection, covers the whole run without decay, and uses semantic top-K retrieval via long_term_retrieval_k (default 3). Per-person memory lives in person_memories (SQLite), holds a single current sentence retrieved by direct pair lookup, and is refreshed according to person_impression_update_every_n_turns (default 5).
The "no forgetting" rule is enforced explicitly. Context size is controlled by retrieval, not by eviction. That separation is what lets the simulation run for hours without either drowning the LLM in tokens or losing the past.
1. Short-term memory
This is the simplest layer: while a conversation is open, every line is kept in Conversation.history, a list[tuple[str, str]] of speaker name and text) on the ConversationEngine. It's just an in-RAM slice of the current dialogue.
When the next speaker is asked to reply, the prompt builder formats the most recent cfg.memory.short_term_turns lines verbatim and injects them into the user message. Combined with the character identity block, the world block, and the medium/long-term layers, this gives the model a complete read of what's happening right now.
When the conversation closes, the entire history is summarized once and the verbatim version is dropped from RAM. It remains queryable via the messages table, but no agent ever rereads it directly.
2. Medium-term memory
When ConversationEngine.close() runs, it calls PromptBuilder.summary_messages to produce a 2–3 sentence first-person summary of what just happened. That summary is then written once per participant by MemoryManager.add_conversation_summary:
- Insert into the memory_summarie SQLite table with the agent id and the conversation id
- Insert the same text into the ChromaDB collection with metadatas
This layer answers the question "what has been happening in my life recently?". It gives an agent narrative continuity across many short conversations without re-reading every word of every dialogue.
Why a summary and not the raw conversation?
- Tokens. A 30-turn conversation can run several thousand tokens. With a cast of 10–20 agents and many concurrent dialogues, full transcripts would blow the context budget within minutes.
- Subjectivity. The summary prompt instructs the agent to summarize from their point of view. The same conversation produces *different* summaries for each participant. This is not yet fully implemented in the current version (v1.0.1).
3. Long-term memory
The same summaries that go into SQLite also go into a ChromaDB collection. ChromaDB embeds the text with the configured sentence-transformer model (`BAAI/bge-small-en-v1.5` by default) and stores it in a per-run persistent directory.
Retrieval is keyed off the current situation. In a conversation turn, the engine constructs a query from the last three lines of dialogue. MemoryManager.long_term then asks Chroma for the top documents whose embedding is closest to the query, filtered to that agent's own namespace. The prompt builder injects them under the heading "Relevant earlier memories".
This is what lets an agent remember something that happened 200 turns ago if the current discussion brushes up against it. The medium-term window tops at ~20 summaries; the long-term store has no upper bound.
4. Per-person memory
Each agent keeps a rolling subjective impression of every other agent they have interacted with, stored one row per (observer, subject) pair. The impression is refreshed every n turns (n is a configuration parameters) by re-prompting the observer with the previous impression plus recent history, so it evolves rather than accumulates. At prompt-build time the speaker's standing impression of each co-participant is injected into the conversation prompt, letting interactions carry social weight without scanning the full memory store.