Apple proposes MemoryLLM to decouple feed-forward modules from self-attention for interpretable memory retrieval
MemoryLLM treats feed-forward networks as context-free token-wise neural retrieval memory, enabling pre-computation of token-wise lookups to improve inference efficiency.
1 source · cross-referenced
- Apple’s Machine Learning Research team introduces MemoryLLM, a method to decouple feed-forward modules (FFNs) from self-attention in transformers.
- The approach treats FFNs as interpretable, context-free token-wise neural retrieval memory, enabling study of memory access patterns within FFN parameters.
- MemoryLLM achieves context-free FFNs by training them in isolation using token embeddings, allowing pre-computation as token-wise lookups (ToLs) for on-demand VRAM/storage transfer.
- A variant, Flex-MemoryLLM, bridges the performance gap between MemoryLLM and conventional transformer designs.
Apple’s Machine Learning Research team proposes MemoryLLM, a method to reinterpret feed-forward networks (FFNs) in transformers as interpretable, context-free token-wise neural retrieval memory. The work revisits challenges in FFN interpretability and decouples FFNs from self-attention, enabling study of how input tokens access memory locations within FFN parameters.
The authors investigate the importance of FFN memory across downstream tasks and demonstrate that FFNs can be trained in isolation from self-attention using token embeddings. This isolates FFNs as context-free modules, allowing them to be pre-computed as token-wise lookups (ToLs).
By enabling on-demand transfer between VRAM and storage, the approach aims to enhance inference efficiency. The paper also introduces Flex-MemoryLLM, a variant positioned between conventional transformer designs and MemoryLLM, designed to reduce the performance gap caused by training FFNs with context-free token-wise embeddings.
The research is presented as a publication with authors Ajay Jaiswal, Lauren Hannah, Han-Byul Kim, Duc Hoang, Arnav Kundu, Mehrdad Farajtabar, and Minsik Cho, and is associated with the ICML conference. The publication date is listed as July 2026.
- Jul 4, 2026 · Apple — Machine Learning Research
Apple proposes amortized maximum inner product search to speed up vector retrieval
Trust79 - Jul 4, 2026 · Apple — Machine Learning Research
Apple study finds RL-finetuned vision-language models vulnerable to textual perturbations and CoT inconsistencies
Trust79 - Jul 4, 2026 · Apple — Machine Learning Research
Apple proposes VideoFlexTok for flexible-length, coarse-to-fine video tokenization
Trust79