LLMs forget — how I try keep em honest, lightening-fast, and productive
June 08, 2026
by a searcher from University of California, Berkeley - Haas School of Business in SF, CA, USA
LLM providers aren't likely to build us a big red warning for when their assistant's memory starts to break. Months ago I seemed to be typing "Wait, didn't we just talk about this?" a lot. And the longer the chat got, the worse the memory leaks.
Two things are going on : the model's memory window is finite, so the oldest messages eventually drop (a moving target, since the windows keep getting bigger). But the subtler one matters more: even instructions still in your viewport lose weight as the chat goes on.
So what to do about it? Summarize the old chat and then start a new one? Build one big file and make the assistant read it first? Stand up "skills" that pull the right knowledge in on demand? Each feels partly right.
The thing that seems to actually work is a three-part list, in order :
1) Build the brain. This is the foundation, and you've probably heard it called a "second brain": your business written down as files the AI can read, so you're not re-explaining yourself every session. If you want the best practical walkthrough, Stephen Browne@redacted wrote a genuinely excellent, free teardown : https://sbrowne-claude-for-small-business.netlify.app. really cool breakdown of permissions settings as well
2) On startup, load a short "guide" but never the whole brain or even large parts of it.
Once your brain gets big, the temptation is to load all of it into every chat. That's expensive to load each time, and the rules that matter may just get drowned out anyways. Call it a "startup tax" and measure it yourself. It can quickly slow you down and eat up your tokens.
The fix is boring but works :
One tiny, dated header is the only thing the chat LLM reads in full. At startup, I write "Read this {header file} and then let's work on XYZ". The header file is half a page: what's true today, what's active, what's blocked, and where the depth lives. Everything else is pulled on demand. The header points to the one file that matters for the task in that specific chat.
3) Respect "context drift" and try to aim for one task, one chat (unless you want a strategic discussion, in which case be clear at the start). Even with the brain and the header, a single chat still degrades over time. So I try to compartmentalize. When I sit down to build something in code, I open a fresh chat scoped to exactly that one ask, point to the relevant files, and go. Payoff is : now any fresh chat cold-starts in seconds with enough understanding to be productive.
Bonus, you can always add "pull the transcripts from last week's chat about {subject}" as an additional lever to ensure understanding. This works great in Claude.
Hope this helps!
And a real h/t to Stephen Browne, whose guide is well-considered and spot on
Most of this advice applies to claude and codex, as these are the tools that seem best at working with "2nd brains" at the moment.
And lastly, regardless of my opinions, now's the time to set this up! AI assistants are being subsidized but won't always be. The 2nd brain lives outside any LLM and so, in the future, can be plugged into any AI assistant. No lab lock-in
from University of California, Santa Barbara in Los Gatos, CA, USA