A practical guide to AI context engineering
In 2024 we all argued about prompt engineering — magical incantations to nudge GPT-3.5 into doing the right thing. By 2026 the models have gotten so good that a clever prompt rarely matters. What matters is context — the surrounding information you give the model about who you are, what you're working on, and what counts as a good answer.
This is what people now call context engineering. And done well, it's the single biggest lever you have for getting consistently great work out of an AI.
What context engineering is (and isn't)
A typical question to an AI looks something like:
"Help me refactor this function."
The model fills in the blanks: language, framework, performance vs. readability tradeoff, naming conventions, how strict to be about types, whether comments are wanted. Sometimes the guesses match what you wanted. Often they don't.
Context engineering is the discipline of not leaving those blanks for the model to guess. Instead you provide:
- Who you are — your role, experience, and constraints.
- What you're working on — the project, the stack, the goal.
- What you've already decided — past choices the model should respect.
- What "good" looks like — output format, voice, level of detail.
Done thoughtfully, you give the model a 5×–10× quality jump on the same question.
Prompt engineering is a microscope. Context engineering is the slide under the microscope.
Pattern 1: The Persistent Profile
Persistent Profile
One short, high-density document that describes who you are. Loaded into every conversation, regardless of topic.
Profiles compress identity into ~150–300 words and stay roughly constant for months. The goal isn't to be exhaustive; it's to give the model enough scaffolding that it stops asking obvious questions.
A good Persistent Profile might look like:
# About me
- Senior backend engineer, 9 years experience
- Currently building a Chrome extension at a 4-person startup
- Stack: TypeScript strict mode, Vue 3 + Pinia, WXT (build), Cloudflare Workers (backend)
- Conventions: functional over OOP, no `any`, no default exports, prefer composition over inheritance
- I review my own code aggressively and want push-back, not flattery
# How I want answers
- Be direct. Skip preamble. No "Great question!"
- Code first, then 2-3 lines of why
- Cite trade-offs explicitly when relevant
- Never invent APIs — if you're not sure, say so
Where to put it: ChatGPT Custom Instructions, Claude system prompt for Projects, or auto-injected by an extension like Rethread on every new conversation.
Common mistakes: profiles that drift into a 1500-word memoir, or that mix identity with project context (which should be separate, see Pattern 2).
Pattern 2: The Project Brief
Project Brief
A focused brief per project (or per major task). Loaded only into conversations about that project.
Profiles are about you. Briefs are about the work. A brief is the document version of the elevator pitch you'd give a senior colleague joining the project mid-sprint.
# Project: Rethread (AI memory Chrome extension)
## Goal
Capture, organize, and recall AI conversation context across 6 platforms.
## Stack
- WXT (Manifest V3 build tooling) + Vue 3 + Pinia
- IndexedDB (Dexie) for local storage
- Cloudflare Workers + D1 + R2 for cloud sync
- Web Crypto API (PBKDF2 + HKDF + AES-256-GCM) for E2EE
## Constraints
- Must remain local-first; cloud sync is opt-in
- All encryption keys derive on-device — server only sees ciphertext
- No telemetry, no analytics, no third-party trackers
## Currently working on
- Reducing first-paint latency of the side panel from 220ms to <100ms
- Polishing the bulk operations UX
Combining Profile + Brief is where things start to feel magical. The model now knows both who you are and what you're working on, and stops asking either set of questions.
Pattern 3: The Decision Log
Decision Log
A growing list of choices you've made — architectural, stylistic, organizational. The model should respect these by default.
One of the failure modes of long-running AI conversations is that the model "forgets" decisions. You agreed in turn 4 to use Cloudflare Workers; in turn 31 it suggests AWS Lambda. Annoying.
A Decision Log fixes this by making decisions explicit and persistent:
# Decisions
## Backend
- D1 over Postgres (cost + latency for our scale)
- R2 over S3 (free egress fits our usage)
- Workers over Lambda (cold starts + DX)
## Frontend
- Vue 3 over React (smaller bundle for an extension)
- Pinia over Redux (idiomatic for Vue 3)
## Testing
- Vitest for unit tests, Playwright for e2e
- We do NOT enforce coverage thresholds — explicit decision
Treat this list like a rolling commitment: when something changes, update the log. When the AI "forgets," paste the relevant section back in.
Pattern 4: Just-In-Time Recall
Just-In-Time Recall
Pull in only the memories relevant to this conversation, at the moment you start it. Skip the rest.
Once your Profile + Brief + Decision Log have grown beyond a handful of paragraphs, the strategy of "paste everything every time" stops working. Token windows are big but not infinite, and irrelevant context is actively harmful — it dilutes attention and confuses the model.
The fix is to recall just-in-time: at the moment you start a conversation, decide which memories are likely to matter, and inject only those.
This is exactly what Rethread does for you. Press Alt+Shift+R in any AI conversation, search/filter your library, pick the relevant memories, watch a live token estimate, and inject the right subset.
A worked example: you're starting a fresh ChatGPT conversation about a Cloudflare Workers performance bug. From your library of 480 memories, the relevant ones are:
- Your role and stack (Profile)
- This project's brief (Brief)
- Decisions about the Workers / D1 architecture (Decision Log)
- The current performance budget for first-paint (Context)
- One or two past conversations where you debugged a similar issue (Quote)
The 470+ other memories — about your side project, your blog draft, last month's Postgres migration — are excluded. The AI sees only what matters, and you keep your tokens for the actual question and answer.
Anti-patterns to avoid
Anti-pattern: stuffing
Some people interpret "context engineering" as "paste everything you have." This is worse than no context. Models have attention budgets; long irrelevant context dilutes them and produces worse answers than a focused short prompt would.
Anti-pattern: drift
Profiles, briefs, and decisions become outdated. A profile from 2024 that mentions Python isn't useful for a 2026 TypeScript project. Plan to edit your context regularly — every couple of months minimum.
Anti-pattern: secret-leakage
Whatever you put in context, the AI sees. Whatever the AI sees might be logged by the AI provider. Don't put credentials, customer PII, internal financials, or unreleased product names into context unless you've explicitly thought through that risk and your provider's data handling.
Putting it all together
The idealized stack:
- Persistent Profile — small, stable, always loaded.
- Project Briefs — one per major project.
- Decision Logs — append-only, structured, project-scoped.
- Just-In-Time Recall — pull conversation-specific memories per session.
Layer 1–3 you can do today with notes apps and copy-paste. Layer 4 is harder to do manually because it requires a searchable, structured store of past decisions and exchanges — which is exactly the gap a tool like Rethread fills.
Run all four patterns automatically.
Rethread captures, organizes, and recalls context across ChatGPT, Claude, Gemini, Grok, Perplexity, and DeepSeek — without you copying anything.
Add to Chrome — Free