Apr 29, 2026 · 10 min read · By Rethread

Why ChatGPT forgets everything (and what to do about it)

You've probably had this conversation. You spend ten minutes carefully describing your codebase to ChatGPT — the framework, the database, the deployment target, the team conventions — and it gives you exactly the answer you needed. Brilliant.

Two days later you open a fresh chat to ask a related question, and the model has no idea who you are. It cheerfully suggests JavaScript when you only write TypeScript. It recommends Postgres when your project is on Cloudflare D1. It apologizes for the "long initial setup" you have to go through, again, every single time.

Why? And — more importantly — what's the actual fix?

Each LLM call is, by design, stateless

Under the hood, every "conversation" with ChatGPT (or Claude, or Gemini) is just a single HTTP request with the entire conversation history attached as input. The model itself has no memory between requests. Whatever the model "knows" in turn 17 of a chat is whatever was in the prompt at turn 17.

When you start a new conversation, the prompt is empty. The previous chat's context is gone — not because the model "forgot," but because nothing carried it over.

Memory is not a property of the model. It's a property of the system around the model.

This isn't a bug. It's actually what makes LLMs scalable: every request is independent, every server can serve any user, and there's no shared mutable state to corrupt. But it does mean that "memory" is something you have to add, externally, on top.

What ChatGPT's built-in "Memory" feature actually is

In 2024, OpenAI shipped a feature literally called Memory. On paper this should solve the problem. In practice it's a partial fix that introduces its own set of problems.

1. It's basically a hidden append to your system prompt

ChatGPT's Memory works by extracting "salient" facts from your conversations and storing them as short bullet points in a server-side store. On each new conversation, those bullets are silently prepended to your prompt as system instructions.

That's a fine architecture, but it has consequences:

2. The cap fills silently and ChatGPT just stops remembering

ChatGPT's Memory has a finite server-side budget. Once you hit it, the model simply stops storing new memories. There's no popup, no cleanup wizard, no "would you like to merge or delete some?" prompt. It just goes quiet and stops capturing.

For heavy users, this happens within a few weeks. After that, every "important" thing you tell ChatGPT may or may not be retained — and you have no way to tell which.

3. It's locked to ChatGPT

The minute you switch to Claude for a long-context summary, or Gemini for a Google Workspace task, or Grok for a quick X-aware question, you start completely from scratch. None of your ChatGPT memory carries over. The hard work of the past month of carefully training your assistant evaporates the moment you change tabs.

4. There's no structure

Memories are unstructured natural language. There are no buckets, no folders, no tags, no timestamps you can filter on. Want to see "everything ChatGPT remembers about my side project"? You can't — it's just one giant flat list.

5. Your data lives on OpenAI's servers

Whether or not that bothers you depends on your threat model. But it does mean: you can't export your memories, you can't encrypt them, you can't take them with you to another tool, and they're subject to OpenAI's retention and training policies.

None of this is unique to ChatGPT. Claude has Projects (with manual context). Gemini integrates with Google profile data. Grok ties into your X account. None of them have a shared, structured, exportable memory layer that you own.

What "good" AI memory looks like

Stripping the problem down, a working memory system for AI conversations needs five things:

  1. Externalized. Lives outside any one AI provider, so it survives provider changes, bans, and outages.
  2. Structured. Categorized as facts vs. preferences vs. decisions vs. context, with tags, buckets, and timestamps so you can query and curate.
  3. Curated. You decide what stays, what gets edited, what gets deleted. A good UI for editing memories matters more than a clever extractor.
  4. Selective. You inject only the relevant subset per conversation — not the entire library every time. Token budgets are real.
  5. Portable and private. You can export it, you can encrypt it, you can use it across platforms. Ideally it never leaves your device unless you want it to.

Three patterns that actually work

Pattern 1: The Persistent Profile

Maintain a single document — call it profile.md — that describes who you are, what you work on, what your stack is, and what your preferences are. Paste it into the system prompt or the first user message of each new conversation.

Pros: works in every AI, takes 30 seconds. Cons: doesn't capture in-progress decisions, doesn't compose well with project-specific context, and you have to remember to paste it.

Pattern 2: Per-Project Briefs

Have a brief per project, tracked in your favorite notes app. Paste the relevant brief at the start of any project-specific chat.

Pros: scales better than one giant profile. Cons: still manual; still doesn't capture decisions made in the AI conversation itself.

Pattern 3: An external memory layer that captures and recalls automatically

This is the pattern Rethread implements: a Chrome extension that watches your AI conversations on the page, distills them into structured memories (Facts, Preferences, Decisions, Context), stores them locally in your browser, and lets you selectively inject them into any new conversation — across six different AI platforms.

Crucially, this pattern handles the case the other two don't: memories that come out of conversations, not just ones you authored ahead of time.

And then, the next time you start a fresh conversation in any of the supported AIs, you press Alt+Shift+R, pick the relevant memories, and inject. No retyping, no provider lock-in, no opaque server-side cap.

What about MemGPT, Letta, Mem0, and friends?

There's a small ecosystem of LLM-memory projects worth knowing about: MemGPT / Letta introduce hierarchical memory architectures for agents, Mem0 ships a memory layer as a hosted API, LangChain has memory abstractions for app developers. These are great for building your own AI apps.

But they're not quite the same problem as "I want my normal ChatGPT and Claude conversations to remember me." They're libraries for building agents — not extensions for using existing AIs.

We covered this in detail in Best AI memory extensions in 2026, including a side-by-side comparison.

The bottom line

ChatGPT forgets because:

  1. The model is fundamentally stateless.
  2. The built-in Memory feature is a partial server-side patch with hard caps and no structure.
  3. It's locked to ChatGPT and lives on OpenAI's servers.

The fix isn't to fight the model architecture — it's to add a proper external memory layer you control. Local. Structured. Selective. Cross-platform. Portable.

Stop re-explaining yourself to AI.

Rethread is the privacy-first AI memory extension that fixes all five problems.

Add to Chrome — Free

Read next