Skip to main content
EMil Wu
Back to News

Milla Jovovich's AI Memory Tool: The Perfect Score That Didn't Survive the Week

5 min read
Milla Jovovich's AI Memory Tool: The Perfect Score That Didn't Survive the Week

Last week a colleague shared an AI news item: “Milla Jovovich releases open-source AI memory system.”

I asked a follow-up: “The beautiful lead actress from The Fifth Element?” Another colleague immediately corrected me: “You mean Resident Evil, right?” Then the youngest person on the team asked, completely innocently: “What’s The Fifth Element?”

Okay, the generation gap is real, but whether you’re old enough to remember The Fifth Element or young enough to have never heard of it, MemPalace is worth paying attention to, because in the span of a few days it went from a stunning debut to full-blown controversy.

What Is MemPalace?

MemPalace is an open-source AI memory system built by Milla Jovovich and engineer Ben Sigman using Claude Code over several months. The problem it’s solving: AI has no memory, because every time you open a new conversation window, the AI forgets everything you’ve ever told it, and MemPalace aims to fix exactly that.

The concept draws directly from the ancient Greek “Method of Loci,” where you mentally walk through a familiar building and place things you want to remember in specific rooms. MemPalace applies this to AI:

  • Wings: one per person or project
  • Halls: categories of memory type
  • Rooms: specific ideas and details

If you’ve watched BBC’s Sherlock (yes, the very handsome Benedict Cumberbatch, who later played Doctor Strange), the villain in Season 3 who blackmails people with scandals uses exactly this technique to remember everything, the whole season you think he has some underground vault and then it turns out it’s a memory palace — anyway, I’m getting sidetracked, the point is that the key architectural decision in this memory-palace-inspired system is: store everything verbatim, no AI summarization. Rather than letting AI decide what’s worth remembering, MemPalace keeps the full text of every exchange and uses vector search to retrieve it when needed, plus their custom AAAK compression dialect compresses six months of conversation history from roughly 19.5 million tokens down to about 650,000 tokens.

The whole system runs locally on ChromaDB and SQLite, zero API costs (same stack as mem0), and on launch day GitHub racked up 5,400 stars in 24 hours, reaching over 1.5 million people, with Ben Sigman claiming a 100% score on the LongMemEval benchmark — which is why the tech world took notice.

What Is LongMemEval, and What Does 100% Mean?

LongMemEval is an academic benchmark from ICLR 2025, 500 questions designed to test five long-term memory abilities: information extraction, multi-session reasoning, temporal reasoning, knowledge updates, and knowing when to say nothing, the point isn’t that this test is famous, it’s how hard it is: GPT-4o and similar commercial systems currently score between 30–70%, this test measures whether an AI actually remembers you and can update its understanding of you over time.

Ben Sigman announced on X:

“LongMemEval 100% perfect score — first ever. Every question type at 100%. 500/500.”

That claim exploded immediately.

The Community Firestorm

“Milla Jovovich just released an AI memory system. None of the benchmark scores are real.” — Penfield Labs

What followed wasn’t a handful of skeptics, it was systematic technical critique leading directly to GitHub Issue #29.

Problem 1: Wrong metric entirely

The official LongMemEval evaluation requires two steps: retrieve relevant information, then have the AI answer the question with a judge scoring the response, MemPalace only ran step one (retrieval), reporting recall_any@5 — whether the correct information appeared anywhere in the top 5 retrieved results, this isn’t a LongMemEval score at all, it’s a much easier metric.

Problem 2: Teaching to the test

How did they get to 100%? They identified the 3 questions that failed, engineered specific fixes for those questions, then re-ran on the same test set, in academic contexts this requires a held-out evaluation to be valid, but what they did was essentially get their exam back, correct their wrong answers, and then claim they got a perfect score.

Problem 3: The LoCoMo 100% is even more fundamental

The LoCoMo benchmark score was run with top_k=50, but each conversation in the dataset has only 19–32 sessions, with that few sessions setting top_k=50 means the ground-truth session is always in the candidate pool, the correct answer is guaranteed to be in there no matter what, that’s not testing memory, that’s walking into an exam with the textbook open.

Side controversy: Jovovich’s actual involvement

Community Notes on X flagged her participation as “conceptual or promotional,” her GitHub account showed 7 commits over 2 days, and an early contributor account (aya-thekeeper) was deleted after launch.

On top of that, Sigman launched a MemPalace cryptocurrency simultaneously, 50/50 creator reward split with Jovovich, which pumped and dumped within 24 hours, Kotaku ran with: “Resident Evil Movie Star Promotes Crypto Bro’s AI-Coded ‘MemPalace’ Accused Of Being Snake Oil.” (For the uninitiated, “snake oil” is the English equivalent of a miracle cure that doesn’t actually work — think overhyped, under-delivered.)

The Response

Sigman acknowledged MemPalace had been “ripped to shreds” within 24 hours, updated the scores to 96.6% in raw mode and 98.4% on a held-out split, and added a /benchmarks page to the official site explaining the methodology, the code didn’t change, the framing did.

My Take

Maybe the more interesting question isn’t whether MemPalace works, but what game AI marketing is really playing in 2026. To be fair: the code is real, the architecture is real, and the “store verbatim, retrieve with vector search” design has received genuine praise from developers who looked at it carefully, this isn’t some overhyped piece of vaporware.

But equally fair: calling recall@5 a “100% perfect LongMemEval score, first ever in history,” paired with a Hollywood actress and a cryptocurrency launch, that road — whether driven by malice or opportunism — crosses a line that technical communities tend to enforce quickly.

In Mindset — Unprofessional Arrogance, I wrote about mistaking AI’s capabilities for your own, and MemPalace feels like a different version of the same pattern: taking a real but bounded technical result and marketing it as a perfect, unverifiable myth.

As I’ve said before, AI can solve almost everything, and that “almost” is where the danger lives, the same applies to benchmarks, 96.6% is a genuinely strong result, but to claim 100% perfection, we’d still need to verify that for ourselves.


Later, a colleague asked me again about that movie, so I explained: The Fifth Element, 1997, directed by Luc Besson, Gary Oldman as the villain (he used to play suave bad guys, like the corrupt DEA agent in Léon: The Professional, but in recent years he’s been playing the good guys — Sirius Black in Harry Potter, Commissioner Gordon in The Dark Knight, Winston Churchill in Darkest Hour), Milla Jovovich in an outfit held together by tape with red hair and fight scenes, and the lead is Bruce Willis who’s now sadly dealing with aphasia, by the time I finished I felt incredibly old… he said the only thing he remembered was the blue opera-singing Diva, and by then the youngest colleague had already put their headphones back on.


References

Support This Series

If these articles have been helpful, consider buying me a coffee