Microsoft Research proposes Memora, a memory system for long-horizon AI agents
Memora decouples stored memory content from retrieval mechanisms, achieving state-of-the-art results on long-conversation benchmarks while reducing context tokens by up to 98%.
1 source · cross-referenced
- Microsoft Research introduces Memora, a scalable memory system for AI agents that separates stored content from retrieval methods to balance abstraction and specificity.
Microsoft Research describes Memora as a scalable memory system designed to address the inefficiency of current AI agents, which must repeatedly reload or retrieve context as tasks grow longer and more complex. The system decouples what is stored (rich memory content) from how it is retrieved (lightweight abstractions and cue anchors), aiming to balance abstraction and specificity.
Memora achieves state-of-the-art performance on long-conversation benchmarks, including LoCoMo and LongMemEval, outperforming systems such as Mem0, retrieval-augmented generation (RAG), and full-context inference. The system reduces context token usage by up to 98% compared to approaches that dump full conversation histories into context.
The framework introduces a harmonic organization where each memory entry consists of a primary abstraction—a short phrase capturing the essence of the memory—and a memory value holding the rich content. Only the primary abstraction is embedded for similarity search, while cue anchors provide alternative access paths to the same memory.
Memora’s design contrasts with existing approaches: content-fragmentation systems like RAG and Mem0 preserve detail but lose narrative coherence, while coarse-abstraction systems sacrifice specificity for efficiency. Graph-based systems add structure but often rely on rigid ontologies that do not generalize across domains.
The system is positioned as a solution to the abstraction–specificity tension, enabling agents to consolidate related information into stable units, surface fine-grained details when needed, and navigate their own history without re-reading everything.
Memora’s paper is published at ICML 2026, and its code is available on GitHub under the Microsoft organization.
- Jul 4, 2026 · Apple — Machine Learning Research
Apple proposes VideoFlexTok for flexible-length, coarse-to-fine video tokenization
Trust79 - Jul 4, 2026 · Apple — Machine Learning Research
Apple study finds self-organizing LLM teams underperform single experts by up to 41.1%
Trust84 - Jul 3, 2026 · arXiv cs.CL
Researchers propose TokenScope for token-level interpretability of code-generating LLMs
Trust79