On May 23 my computer froze. Not slow, frozen. The system drive had filled to the very top and Windows had nothing left to work with. I forced a reboot, and the first thing I did when it came back was start dumping temp files off C: by hand, just to claw back enough room to open a terminal and figure out what happened.

I expected to find log files. What I found was 48 gigabytes of Chrome profiles.

the cleanup that turned into an architecture review

Once I had breathing room I went looking for the cause. The usual suspects gave up a few gigabytes. Moving Claude’s data folder off the system drive and removing a stack of apps I had not opened since January reclaimed about 4 GB. Useful, not the story.

The story was one directory full of Chrome user-data profiles. I had been running browser automation for a couple of months at that point, and every agent session that needed a browser got its own profile, minted fresh at session start. The isolation rationale was correct. You do not want cookies or cached auth from one session bleeding into another. Give each session a clean slate.

The part I had not thought through was what happens to the clean slate after the session ends.

Chrome writes a lot into a fresh profile. Cache, IndexedDB, shader cache, extension state, history. Even a short automation run pushes a profile to a few hundred megabytes within minutes. Months of browser-touching sessions, each leaving its profile behind, adds up faster than it sounds. Nothing had failed in any obvious way along the road to 48 GB. The automation worked, every session completed. The profiles just kept growing, because I had never specified what “fresh” was supposed to mean once the session was done with it. It grew silently until it didn’t, and the day it didn’t, the whole machine stopped.

isolation without lifecycle is just slow-motion hoarding

The per-session model was not the mistake. Browser session isolation is a real requirement. Cookies, tokens, and cached state from one agent run should not contaminate another. That principle was sound.

The missing piece was lifecycle. A profile that gets minted and never reaped accumulates forever. I had designed isolation but not reclamation, and those are different properties. You can have perfectly isolated sessions and still be hoarding at the infrastructure level. Isolation tells you two runs cannot see each other. It says nothing about whether the disk those runs sit on ever gets its space back.

I changed two things: how the profiles were allocated, and where they lived.

The strategy is a fixed pool. Instead of minting a new profile per session, there is a fixed set of 25 numbered slots. A browser caller leases a slot, does its work, and releases it. The pool manager resets the slot on release, clearing cookies and cache while optionally preserving the auth that is meant to persist. No new directories ever get minted. The surface is fixed at 25 profiles no matter how many sessions run.

The storage location moved off the system drive entirely. The pool lives on a separate data drive now, so even if my math on lifecycle is ever wrong again, it cannot take the operating system down with it. The thing that froze the machine was not just the size, it was the size landing on the one drive Windows needs to breathe.

I picked 25 because I usually have four or five browser sessions going at once and wanted real headroom. If that ever turns out wrong I can resize the pool. The point was to cap the surface, not to nail the exact number.

After the migration and the cleanup I reclaimed 52 GB. The system drive went from frozen-solid to comfortable.

the constraint you hit is never the constraint you find

I filed two follow-up tickets before I closed the laptop. One to give browser callers a clean lease API so they never reason about slot numbers directly. One to apply the same lifecycle audit to other state directories that might be quietly accumulating the same way. Both came out of a disk-full alert. Neither was on any list that morning.

This is how this kind of bug shows up. You do not audit your tooling on a schedule. You audit it when a resource limit forces you to look: disk full, memory high, something frozen. And the constraint you hit is almost never the constraint you find. I went in looking for log files. I came out having found a three-month-old assumption about what “fresh” means.

The alert said “disk full.” That was accurate. The actual bug was that I had built isolation without lifecycle, and those are different things. The disk found it for me. I would not have.

If your AI tooling needs a cleanup day every few weeks, you probably do not have a cleanup problem. You have a lifecycle problem. And if a piece of that tooling can fill the one drive your OS runs on, that is not a disk problem either. That is a blast-radius problem, and the fix is to move it somewhere it can only hurt itself.