MemOS: A Paradigm Shift to Memory as a First Class Citizen for LLMs
MemOS is a recent paradigm shift towards an LLM-native memory storage system.
LLMs have parameter-based (weights and biases) memories (Parameter memory), and also can access external memory in the form of databases or knowledge graphs (Plaintext memory).
What is perhaps interesting is that LLMs can also store Key-Value activation values to save time for recomputing them from input tokens (Activation memory).
What if we can store memory across all three spaces - parameter, activation, plaintext, and shuffle them around as needed based on the frequency of access? This is called a Memory Cube.
Furthermore, a Memory Cube is not static. More frequently used memories will get pushed towards parameter level, while less frequent memories get pushed towards plaintext.
What if we can also have memory processes that regulate access and consolidation of these Memory Cubes? These form the Interface Layer, Operation Layer and Infrastructure Layer to access and modify the contents of a Memory Cube.
Overall, a complicated paper, bringing forth some interesting ideas for memory consolidation and retrieval.
Memory is no longer a second-class citizen, but an important core concept of this MemOS formulation.
Slides: https://github.com/tanchongmin/john-youtube/blob/main/Discussion_Sessions/MemOS.pdf
Paper: https://arxiv.org/pdf/2507.03724
Code: https://github.com/MemTensor/MemOS
~~~
Related reading:
Memory3 (precursor to MemOS paper, talking about activation memory): https://arxiv.org/html/2407.01178v1
MemGPT (using agents with memory tools): https://arxiv.org/pdf/2310.08560
LoRA (learning by parallel parameter training): https://arxiv.org/abs/2106.09685
Learning, Fast and Slow (learning with both NN and external database memory): https://arxiv.org/pdf/2301.13758
~~~
0:00 Introduction
2:37 Conventional Retrieval Augmented Generation
6:17 Implicit Memory vs Explicit Memory
15:55 Explicit Memory and how humans learn
20:24 Cost of read + write vs frequency of access
30:31 Math of KV caching for Attention Memory
38:18 Why memory as an OS?
43:38 Overview of MemOS
45:37 Good benchmarks results on LOCOMO
47:57 How does MemCube work?
55:09 How to convert between abstraction spaces?
58:49 Memory Development: From static to dynamic
1:02:09 Memory Consolidation along abstraction spaces
1:07:48 MemCube Contents
1:11:01 Processing Components across Memory Layers
1:14:24 3-layer architecture for MemOS
1:16:47 Memory Lifecycle
1:18:20 Future Plans
1:19:11 My thoughts: Curse of Memory
1:21:20 Discussion
~~~
AI and ML enthusiast. Likes to think about the essences behind breakthroughs of AI and explain it in a simple and relatable way. Also, I am an avid game creator.
Discord: https://discord.gg/bzp87AHJy5
LinkedIn: https://www.linkedin.com/in/chong-min-tan-94652288/
Online AI blog: https://delvingintotech.wordpress.com/
Twitter: https://twitter.com/johntanchongmin
Try out my games here: https://simmer.io/@chongmin