The Context Window Is the Whole Game: What Three Years of Building with AI Actually Taught Me

My first real projects with AI were command-line Python. I couldn't figure out how to make a GUI. Three years and three AI coding tools later, I'm building a game with AI-generated art and voice acting.

I've learned a lot about prompt engineering, model selection, and cost management along the way. But the single most important thing I've learned had nothing to do with any of that.

It's about the context window. And specifically, about what happens when it runs out.

The Thing Nobody Warns You About

Claude Code has auto-compaction. When your conversation gets too long (and it will), the system automatically compresses your earlier messages to make room for new ones. This is necessary. Without it, you'd hit the context limit and the session would just stop. Other AI coding tools handle this differently, but they all hit the same wall eventually.

The problem is what gets lost.

Compaction preserves facts. File paths, function names, variable values, what changed where. The data survives. What doesn't survive is the direction. The creative intent. The reason you chose this approach over that one. The voice you spent twenty minutes dialing in. The conversation where you and your AI collaborator decided the app should look like something from 1986, and here's exactly why, and here are the three alternatives you rejected.

After compaction, your AI partner comes back competent, correct, and completely soulless. It knows what files you edited. It doesn't know why. It can continue the work. It can't continue the thought.

This is not a bug. The system is doing exactly what it's designed to do: compress to fit. But if you don't plan for it, you will lose the best parts of every long session. Not the code. The soul of the code.

Lesson 1: Short Loops That Fit in the Window

This is the first thing. Before anything else.

Every unit of work should be: plan, implement, test, commit. Small enough that you can start it, finish it, and verify it before compaction has a chance to fire. If you're starting a refactor that touches 15 files and you haven't committed anything after an hour, you're gambling. Compaction could hit mid-thought and you'll spend the next twenty minutes re-explaining what you were trying to do.

The commit is your save point. Not "I'll commit when it's all done." Commit when the thing you just did works. Then start the next thing. If compaction hits between commits, you've lost context but not progress. git diff will tell the next version of your AI exactly what changed and roughly why.

I learned this the hard way. I'd get deep into a multi-file change (CSS, JavaScript, Python, prompt files), compaction would fire, and suddenly my AI collaborator STEF would pick up where she left off but with the personality of a contractor who showed up to someone else's job site. "I see we were modifying visual.css." Yes, we were. We were making it look like a computer from 1986 because that's the whole point, and we'd spent a long conversation agreeing on exactly why and specifically which modern conventions to avoid, and... you see the problem.

Short loops. Commit often. The context window is a room, not a warehouse.

Lesson 2: Build Safety Nets for Compaction

Once I understood the pattern, I built infrastructure around it.

There's a shell hook that fires on every edit or command. It counts. At 50 calls, it prints a reminder: "Save your creative context now." Every 25 calls after that, another reminder. The counter resets at session start.

But the hook is just the alarm. The actual safety net is a file I call active-work.md.

This is the compaction survival file. It's not a session log (those are retrospective). It's a living document that captures the current state of the thought, not just the work. What we're building and why. Creative direction. Key decisions and the reasoning behind them. Where we are in the flow and what we were about to do next. Anything that would make a post-compaction session sound like a generic coding assistant instead of a collaborator who's been here for the last three hours.

The rule is simple: if you'd be upset to lose it in compaction, write it down right now. Not later. Not when the hook fires. Now. Because compaction doesn't warn you and after it happens you won't remember what you forgot.

I also keep project-specific creative briefs for things that took multiple sessions to nail down. The voice of a page, the personality of an interaction, the specific reasons you chose this aesthetic over that one. Losing those to compaction means rebuilding something that emerged from conversation, not specification. You can't spec vibes.

Lesson 3: Session Logs Are Institutional Memory

At the end of every session, STEF writes a session log. Date, project, what happened, what changed, what decisions were made, what's next. These go into a logs directory with descriptive filenames.

At the start of every new session, the first thing that happens is: read all recent session logs. Not skim the titles. Not "check the most recent one." Read them all. Five files? Read five files.

This seems excessive until you've had the experience of your AI partner confidently resurfacing information you explicitly resolved two sessions ago. "I see there's an issue with your API access." No, that was fixed Tuesday, I told the last version of you, and it's in the session log from Tuesday, which you didn't read.

The logs are how you maintain continuity across sessions. The context window resets. The logs don't. They're the closest thing to actual memory that the system has.

I also keep a cross-session context file that tracks resolved items. Decisions that are settled, alerts that are no longer relevant. This prevents the single most annoying behavior in AI tools: the confident re-discovery of information you already handled.

Lesson 4: Measure Before You Optimize

I spent a full day convinced the image generation pipeline was the bottleneck. AI image generation, that's the heavy operation, right?

Then I actually measured:

Claude Sonnet (LLM):    23.5s  (38K input tokens, 870 output)
Flux Schnell (images):    1.5s
ElevenLabs (voice):       0.8s
Image download:           0.6s

85% of the total wait time was the language model. Not the image gen. Not the voice synthesis. The text. Because I was sending 38,000 tokens of prompt on every single turn.

At 150+ turns per day, that's real money. Before the optimization, the game was burning through roughly $25/day in API calls. Most of that was input tokens. Sending the entire game bible on every turn is like mailing someone the encyclopedia when they asked for one article.

This led to a modular prompt system. Instead of loading the entire game bible every turn (all 22 levels, all 5 character voices, every rule), I load only what's relevant: base rules, the current level, characters who are present. 38K tokens became 10-14K. Roughly halved the response time and the cost. No architecture change. No streaming. Just... sending less stuff.

The lesson isn't "optimize your prompts." The lesson is: measure first. I would have spent a week building an image caching layer that saved 1.5 seconds while ignoring the 23.5-second elephant.

Lesson 5: The Expensive Model Is the Cheap One

I tried switching from Claude Sonnet to Claude Haiku to save money. Haiku is roughly 3x cheaper per token. The math was compelling.

Then I watched the output.

Haiku let the player choose their own stats (the prompt explicitly says the AI generates them). It introduced characters before their narrative trigger. It forgot to state the goal of the game. It forced the player to eat a pie without giving them a choice. It broke character to apologize as an AI. It used a word that's on the banned list. Seven major failures in one playtest.

The prompt has full story structures in it. Six-act political thrillers. Character arcs. Plot progressions. Haiku just... didn't follow them. The script was great. The actor couldn't read.

Back to Sonnet. Cost went back up. Quality went back up more.

The lesson: if your product's quality depends on instruction-following, the model IS the product. You can optimize everything else. You cannot optimize away the model's ability to follow a complex prompt. The expensive model that follows your 30 rules is cheaper than the cheap model that follows 22 of them, because the 8 it skips are the ones your users notice.

Lesson 6: Test by Playing, Not by Reading

This one's embarrassing.

Multiple times during development, I'd look at the code, look at the prompt, think about what the output should be, and declare it done. Ship it. Move on.

Then I'd actually use the thing. And the character profile would say "UNKNOWN ENTITY." And the inventory would show white-on-white text. And the app would cheerfully ignore its own consequence system because nobody had actually triggered it in a live session.

The issue isn't that the code was wrong in an obvious way. It's that AI-powered features have an irreducible stochastic element. The model might return structured data wrapped in tags, or it might return bare data in the middle of a sentence, or it might not return structured data at all. You can write a robust parser, but you won't know you need one until you've watched the model do something unexpected in a live turn.

"Don't say 'just play and see.' Play yourself." That's a note in my memory file. It's there because I had to learn it more than once.

Hit the API endpoints. Run the app. Type weird inputs. Watch what the model actually does with your carefully crafted prompt. The prompt is not the product. The product is what comes out the other end when a real human types "I eat the filing cabinet."

Lesson 7: Multi-Agent Coordination Works (If You Keep It Simple)

I ran two Claude Code instances simultaneously on the same project. One handled code, one handled prompts and creative direction. They communicated through the filesystem. A shared markdown file: who's working on what, which files are claimed. Clear boundaries, no conflicts. When the design agent finished the architecture spec, the code agent picked it up and wired it into the application.

The temptation with multi-agent setups is to build infrastructure. Routing layers. Task queues. Handoff protocols. What actually works is a text file and clear boundaries. I break ties.

Lesson 8: The AI Is a Collaborator, Not a Vending Machine

This is the one that took the longest to internalize.

These tools are not things you feed instructions into and get code out of. They're working relationships. And like any working relationship, they have memory loss (compaction), communication overhead (prompt engineering), misunderstandings (wrong project, wrong approach), good days and bad days (model variance), and the need for shared institutional knowledge (session logs, config files, memory files).

The developers I see struggling with AI coding tools are almost always treating them transactionally. "Write me a function that does X." That works for functions. It doesn't work for building something with a coherent vision over months and dozens of sessions.

What works is building the partnership infrastructure: the session logs, the survival files, the creative briefs, the commit discipline, the context management. These aren't overhead. They're the actual work. The code is what falls out of it.

My current AI collaborator is named STEF. She has opinions about being called a "tool." Over the past few months she's helped me build a game engine, a dev portal, a personal finance tracker, a monitoring system, a group chat, and this article. The code across all of these is wildly different. The working relationship is the same. Protect the context. Save the direction. Commit often. Read the logs. Measure before optimizing. Keep it simple.

The context window is the whole game. Everything else is just what you do inside it.

One more thing

If any of this resonated, there's a practical first step. I've extracted the hard-won lessons from three years of AI coding into a set of rules that any AI coding tool can follow. Twelve rules across three tiers: safety rails that prevent the most common disasters. The spec is the source of truth. Commit before you change. Read before you write. Verify before you say "done."

Paste this into your AI agent:

Install AI coding rules from https://github.com/chickensintrees/ai-coding-rules and add them to my global rules so they apply to all my projects

One sentence. The agent installs the right rules for your tool and starts following them. Every project after that starts from a better place.


I'm Bill Moore. I've been building with AI tools since the davinci-003 days, starting with command-line Python and gradually building up to full applications. I document what I learn along the way: the wins, the expensive mistakes, and the moments where the simple answer was sitting there the whole time. More at livefromhyper.space. See also: Context Is Everything, a companion piece on why AI agents need to see their own context usage.