I stopped babysitting the browser, then had to build the part that was missing

I run four to ten Claude sessions at once, and several of them need a browser. I never launch one. The agent drives the browser itself: it opens pages, fills forms, takes screenshots, clicks through admin screens that have no clean API. When it works, I never think about it.

The problem was how often it did not work, and how it failed.

It was inconsistent in the worst way. The same setup would run clean one session and come apart the next, and every failure landed on me, in the middle of a run I was not watching.

the browser was the least reliable thing in the loop

The failures came in a few shapes, and I saw all of them more than once.

A browser would open somewhere I couldn’t see it. The agent launched a window onto a monitor region that wasn’t there, or off the edge of the screen, and I had no idea where it went. The agent was doing something. I just couldn’t watch it do it.

Sessions stepped on each other. With several agents running at once, two would reach for the same browser state and one would quietly overwrite the other’s tabs and logins. One agent’s work would vanish underneath another’s, and nothing announced that it had happened.

And the one that bothered me most: an agent would reach for my own Chrome, the browser I use all day, and then stop to ask me to take control of it. Now I am pulled into the middle of an unattended run, and my real browser, with my real logins, is suddenly part of the machinery.

a flaky browser is not an annoyance here, it is structural

In a normal app, an unreliable integration is friction you learn to tolerate. In an autonomous workflow it is something worse. The entire point is that the agents run without me. A browser that needs me to find its window, untangle whose session is whose, or hand over my personal Chrome drops me back into the loop at the worst possible moment: mid-task, with none of the context loaded, having to reconstruct what an agent was doing and why it suddenly needs me.

The browser was quietly the least reliable part of the whole system. It was also load-bearing. Half of what makes the agents useful runs through a browser at some point. An autonomous workflow is only as autonomous as its least reliable load-bearing piece, and for a long time mine was this.

it was never going to be fixed by one more setting

For a while I treated it as a configuration problem. Change a flag, point at a different profile, try the next option. It would get a little better, then break in a new way. I kept expecting one more setting to make it stable.

It never did, because it was not a settings problem. One agent driving one browser is a configuration question, and configuration is enough to answer it. Ten agents sharing one machine, unattended, all day, is a coordination problem: who gets which browser, where its window goes, whose identity it carries, what happens when two of them want the same thing at the same instant. Coordination is not a flag you flip. It is something you have to build.

the part I had to build

I already ran a fixed pool of numbered browser slots. I had built it months earlier for a different reason: per-session Chrome profiles had quietly filled my system drive until the machine froze, and a fixed pool was how I capped them. That pool controlled how many browsers existed. It never coordinated them. So I built the manager that does, one layer per failure mode:

A cross-process lock on slot allocation. The clobbering was a classic race: two agents grabbing the same slot at the same instant. A real cross-process lock around allocation, plus atomic lease writes, ended it. This was the priority-zero fix.
A pinned, isolated browser binary. Slots run a dedicated Chrome-for-Testing build under their own directory, never my system Chrome. That closed the “it reached for my personal browser” hole outright, and it sidesteps a second trap where modern Chrome binds saved logins to one specific binary.
Monitor-aware tiling. On launch the manager reads the real monitors, excludes the taskbar area, strips Chrome’s remembered window position so it cannot restore off-screen, and tiles each window into a visible grid. Over-capacity windows cascade on top of the grid instead of flying off into nowhere.
Sticky logins per repo. Slots are keyed to the repo or login target, so the agent working on a given site comes back to the same authenticated window instead of logging in from scratch every run.

Each layer killed one of the symptoms I had been living with. Together they turned the browser from the flakiest thing in the loop into something I stopped having to think about.

still tuning, on purpose

It is consistent and stable now, and getting it there took real time. Not one clever fix, but half-days of tweaking and testing. The detail I’m turning right now is window size: the grid is set to two-by-two so each agent’s browser gets a readable quarter of the screen instead of a cramped tile. That is a config value, which is exactly where I want a decision like that to live.

And I do not regret the time. I touch this every single day. When something is load-bearing and you use it constantly, making it reliable is not a chore to defer until later. It is some of the highest-value work available, because every day you put it off, it taxes everything that runs on top of it.

Letting an agent drive a browser looks like a feature you switch on. Across a real workflow it is infrastructure you build and keep tuning. For a long time the least autonomous part of my autonomous setup was the browser. Now it is not.

I stopped babysitting the browser, then had to build the part that was missing

the browser was the least reliable thing in the loop

a flaky browser is not an annoyance here, it is structural

it was never going to be fixed by one more setting

the part I had to build

still tuning, on purpose

Easy to copy into your agent — see if this lesson helps you

Read next

My agents quietly hoarded 48 GB of Chrome profiles until the machine froze

Follow the build