← /writing/learning·2026 · 05 · 06·9 min read

I asked for one thing. The agent did three. Where does the harness draw the line?

The agent finishes the task, then keeps going - refactoring three files I didn't ask about. The fix wasn't in the prompt. It was in the harness.

Harness Engineering · Part 2 of 10. Previous: My Claude Code session went great — until turn 30. Next: The agent hallucinated a tool argument.

I asked for one thing. The agent did three. Where does the harness draw the line?

I asked the agent to fix one bug — a date being parsed in the wrong timezone in a single utility function. Two minutes later it came back with a clean diff. The fix was correct. It was also four files long. It had renamed the helper, extracted a new module called time/ for it to live in, updated a sibling parser I hadn't mentioned to use the new helper, and added a unit test for a function I hadn't asked it to touch. Every individual move was defensible. Together they were a refactor I had not approved, on a Friday afternoon, while I was trying to ship.

I rolled back, reread my prompt, and tried again with stricter wording. It did the same thing.

This post is me figuring out where the line should actually be drawn — and where the harness, not the prompt, has to draw it.

What I tried first

The instinct was to write a better prompt. If the agent kept overshooting, surely the fix was to be clearer about the bounds. So I wrote what I thought was a tight one: "Fix only the bug in parseEventTime. Do not modify any other file. Do not extract helpers. Do not add tests. Do not rename anything. Touch one function, return."

It mostly worked. On that turn. On the next turn, in the next session, when I asked for a different small fix, it overshot again — different file, different scope creep, same shape. The prompt-as-leash had to be reconstructed every time, and every time I forgot a clause, the agent found the gap.

The deeper problem is that the prompt is an instruction inside the same window the agent is reasoning in. It's a token among tokens. The model can be biased away from extra edits, but it can also be biased toward "leave the codebase better than you found it" by anything that lands later in the conversation — a test failure, a lint warning, a stale comment it noticed while reading. Once it's in the loop, the next-action distribution covers everything it could plausibly do, not everything I asked for. A stronger system prompt narrows the distribution. It does not truncate it.

The other thing prompts can't do is observe. A prompt sets intent at turn one. It cannot watch turn five and say "you're about to edit a file that wasn't in the original ask — stop." By the time the diff arrives, the work is done. I rolled back five times that afternoon, and each rollback cost me more attention than the original bug did.

The leash, if it was going to exist, had to live somewhere outside the conversation.

What clicked

The reframe took me a while: the prompt declares scope. The harness enforces it. They are not the same job, and asking the prompt to do both is why I kept getting overshoot.

In Claude Code, the enforcement layer is hooks. Hooks are not part of the conversation — they are external programs the harness runs at fixed points in the agent loop. PreToolUse runs before a tool call and can deny it. PostToolUse runs after a tool call succeeds and can feed back into the model — including a decision: "block" that stops the loop. Stop runs when the agent thinks it's done and can refuse to let it stop. Those four points are the only places where something outside the model's attention budget gets to vote on what happens next.

Once I saw that, the architecture got obvious. Scope is a structural property. It belongs in code that runs around the loop, not in prose that lives inside it.

The first hook I wrote was eight lines. A PreToolUse matcher on Edit|Write that read the original task description from a scratch file and checked whether the path being edited was in an allow-list I'd seeded at the start of the session. If not, it returned an exit code that denied the call and printed "file outside requested scope; ask before editing" back to the model. Roughly:

bash#!/usr/bin/env bash
path=$(jq -r '.tool_input.file_path' <<<"$1")
allow=$(cat .claude/scope.txt)
grep -qF "$path" <<<"$allow" || {
  echo "file outside requested scope: $path" >&2
  exit 2
}

That's it. No prompt engineering. The agent could still propose the wide refactor — it just couldn't execute it without me adding the path to .claude/scope.txt first. The conversation became a negotiation: agent says "I'd like to extract a helper into time/utils.ts," I either widen the scope file or say no. The model's eagerness stayed intact. The blast radius didn't.

The deeper move is that this changes what the prompt is for. The prompt no longer carries the burden of "do not edit other files." That sentence was a lie anyway — the prompt couldn't enforce it. The prompt now says what I want the agent to try, and the harness says what it's allowed to do. Two layers, two responsibilities. When they were collapsed into one, I was relying on the model's discretion. When they're split, the model gets to be eager and I still sleep at night.

The 12-Factor Agents framing landed for me here. The principles Dex Horthy wrote up — own your control flow, own your context window, agents are mostly software — all point at the same thing: the agent loop is a program you write, not a magic box you prompt at. Hooks are how you write the program. Walden Yan's Don't Build Multi-Agents lands in the same place from the opposite direction: he argues against decomposing tasks into specialist sub-agents because each one is making decisions outside the others' context. A scope hook is the cheap version of that argument — keep the agent single-threaded, keep its writes inside a fence the harness owns.

The leash is the hook. The prompt is the request.

What I'd do differently next time

Start every non-trivial session with a Stop hook that lists the requested files, even if the hook is no-op. Just writing the allow-list down at turn one forces me to be specific about what "the task" is. Half my overshoot was actually me being vague — I asked for "the timezone fix" and the agent's read of "the timezone fix" honestly included the helper extraction. Naming the files names the boundary.

Make the deny message useful to the model, not just to me. My first version printed "denied: out of scope" and the agent flailed. The second version printed "denied: time/utils.ts is outside the requested scope; if you believe this edit is necessary, explain why before retrying" — and the agent started writing one-paragraph justifications I could approve or reject. The hook is a conversation, not a wall.

Layer hooks rather than write one big one. A PreToolUse for write-scope, a separate PreToolUse for shell commands I don't want it running unsupervised, a Stop hook that re-checks the allow-list against the actual diff before letting the turn end. Each one is small. Each one fails for one specific reason. Composing them is much easier than maintaining a single check that tries to do everything.

What I'm still unsure about

A tight leash has a real cost, and I haven't found the right tension yet. Twice this week the scope hook denied an edit that I would have wanted to approve — a one-line change in a sibling file that genuinely was the cleanest fix — and re-running the loop after widening .claude/scope.txt ate ten minutes. The slower the iteration loop, the more I noticed myself reaching for "let me just turn the hook off for this one." That's a smell. If I'm bypassing the leash, the leash is wrong.

I also don't know how much of the eagerness is good. The unsolicited test it added the first afternoon? It caught a regression I would have shipped. The helper extraction? Probably the right call, eventually. Some of what I called overshoot was the agent doing engineering. Drawing the line between that's the agent thinking ahead and that's the agent overstepping is judgement I don't have a clean rule for yet.

The thing I'm sure of is that the line has to live in the harness. The minute I tried to encode it in prose, I was relying on a model's politeness to enforce a property of my workflow. That's the wrong layer.

References

Anthropic, Building Effective Agents — the line "an LLM using tools in a loop" is what made me start treating the loop as the unit of design rather than the prompt. This post is also where I picked up the workflow-vs-agent distinction that justifies wrapping a hook around eager behavior. https://www.anthropic.com/engineering/building-effective-agents
Anthropic, Effective harnesses for long-running agents — the explicit "harness" framing — separating the agent from the surrounding control flow that initializes, persists, and bounds it — gave me language for what hooks actually are and why they belong outside the window. https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents
Claude Code docs, Hooks reference — the actual list of lifecycle events (PreToolUse, PostToolUse, Stop, and friends), what each can return, and which ones can block versus only observe. The "can it stop the loop" column is the one I keep coming back to. https://code.claude.com/docs/en/hooks
Dex Horthy / HumanLayer, 12-Factor Agents — the principles that gave me permission to stop treating the agent as a magic box. "Own your control flow" and "agents are mostly software" are the lines I think about when deciding whether a problem belongs in a prompt or in code. https://github.com/humanlayer/12-factor-agents
Walden Yan / Cognition, Don't Build Multi-Agents — the case against decomposition, but the part that stuck with me was the underlying claim: every action carries an implicit decision, and conflicting decisions are what break the system. A scope hook is exactly a way to keep those decisions visible and single-threaded. https://cognition.ai/blog/dont-build-multi-agents
Catherine Wu and Boris Cherny on Latent Space, Claude Code: Anthropic's Agent in Your Terminal — the "do the simple thing first" framing, plus the bit about Claude Code being more Unix utility than product, is what convinced me hooks were the load-bearing primitive. The harness is a small program, not an application. https://www.latent.space/p/claude-code
Anthropic, Writing tools for agents — adjacent but useful: the same pattern of "make the boundaries part of the system, not the prompt" applies to tool surface area. A narrower toolset is itself a leash, the same way a hook is. https://www.anthropic.com/engineering/writing-tools-for-agents

Harness Engineering · Part 2 of 10. Previous: My Claude Code session went great — until turn 30. Next: The agent hallucinated a tool argument.

/share

→ share on linkedin → share on x

← back to /writingset in fraunces · geist · geist mono