Instead of complex modern algorithms, a new AI memory system (MemPalace*) was built using the ancient Greek "Method of Loci." It proves a Talebian point: 2,000 years ago, people actually knew how to think. Today, we live in an era of information gluttony—publishing 600,000 books a year with barely a memorable thought, while the handful of texts written in antiquity are quoted eternally. We know how to publish; they knew how to think.
upd: sir @rohithzr examined this repo using BEAM 100K benchmark and results are not so satisfying) https://github.com/milla-jovovich/mempalace/issues/125
Honestly I love that we can just use the same concepts that humans use to optimize themselves and apply them for AI. I'm very excited about further developments in this area. How would drugs or lucid dreaming look like?
I saw an article that attributed that repo to the actress. Is that true?
Yes, actress Milla Jovovich and her friend Ben Sigman spent a few months creating MemPalace — a long-term memory system for AI — using Claude Code.
They didn't try to invent yet another complex neural graph or RAG.
Instead, they took the ancient Greek "Method of Loci" technique and turned it into a virtual architecture where all your conversations with the AI are organized.
Agree, it's a beautiful way!
I ran benchmarks on this repo and found some revealing factors, I posted the findings in detail here: https://github.com/milla-jovovich/mempalace/issues/125
I felt something sus about the repo and I was right.
Yea my openclaw picked upon that too, most likely thanks to your findings.
I find it really hard to choose the right benchmark between what they used and Locomo.
I red your issue -- very deep evaluation, thanks for pointing me on this!
Will update the post
I am not one for benchmarks in general but while developing it serves as a base to build upon. In my opinion BEAM is the most relevant benchmark because it tests end-to-end answer quality, not just retrieval. LongMemEval is solid for retrieval evaluation but only measures whether the right document is in the
top-K, not whether the system answers correctly. LoCoMo tests useful abilities (multi-hop, temporal) but its recall metric is trivially gameable when top-k exceeds the number of sessions per conversation.