← /writing/learning·2026 · 05 · 06·9 min read

Why does Claude Code feel different from Cursor when they use the same model?

Same model, different feeling. Same task, different brain. The harness isn't infrastructure. It's the product.

Harness Engineering · Part 8 of 10. Previous: My agent remembers me across sessions. Next: I built my own agent harness over a weekend.

Why does Claude Code feel different from Cursor when they use the same model?

I had a small refactor on the bench last Tuesday — rename a hook, update three call sites, leave the test file alone because it was already asserting against the new name. I ran it twice. Same prompt, same model (Sonnet 4.5), same repo. Once in Claude Code in my terminal. Once in Cursor's agent panel.

They didn't feel the same.

Claude Code paused before the first edit and asked which of two files I meant — there was a useUser in hooks/ and one in legacy/auth/. Cursor just made the edit on the legacy one and showed me a clean diff. Both were technically reasonable. But the second one had already touched my code before I could course-correct, and the first one had me typing hooks/ and feeling, irrationally, like the agent had done something thoughtful.

Same engine. Different brain.

What I tried first

My working assumption — and I think most people's — was that the model is the agent. Pick Sonnet 4.5, get Sonnet 4.5 behavior. The wrapper is plumbing: a UI for the chat, a way to read files, a diff viewer. Surface decoration on top of the same underlying intelligence.

So the first time I ran the same task in both, I expected near-identical traces. Maybe slightly different prose. Maybe one would phrase a clarifying question I'd skip past. But the shape of the interaction — when it paused, when it acted, what it showed me — would be the same, because the thing doing the work is the same.

That assumption survived about ninety seconds.

The differences weren't in the diff. They were in the cadence. When did the agent stop and ask. How much of its own reasoning did it surface before acting. Where did the tool output go — inline in the conversation, in a side panel, collapsed by default, expanded by default. How were file references rendered — as clickable chips, as plain prose, as fenced blocks. Did it pre-summarize what it was about to do, or just do it. When it finished, did it stay quiet or did it give me a recap.

None of those are model decisions. The model can't decide where its output renders. The model doesn't pick whether to ask for confirmation before the first write — that's a harness policy. The model doesn't choose whether tool calls collapse or expand by default. The model doesn't render a diff.

I had been giving the model credit for choices the harness was making.

What clicked

The reframe is that the model is the engine. The harness is the product.

I'd been treating "harness" as a synonym for "the chat UI." It's not. The harness is the entire developer experience the model is wrapped in: when it asks for permission, what it shows, what it hides, how state is presented, what the defaults are, where the friction lives, where the friction has been deliberately removed. The model emits tokens. Everything between those tokens and my eyes is a product decision.

Three concrete UX choices, in order of how surprised I was when I noticed them:

When does it ask for confirmation. Claude Code's default is: read freely, but pause before the first write in a session and confirm the file. It gets out of the way once you've trusted it. Cursor's agent default is: act, then show the diff with accept/reject affordances baked into the editor gutter. Both are coherent positions. The first treats writes as load-bearing and asks you to opt in. The second treats writes as cheap because the editor's diff view makes them trivially reversible. Neither is "more correct." But the resulting feel is wildly different — Claude Code feels like a junior engineer asking before committing to a path; Cursor feels like a senior pair-programmer who just goes. Same model. Opposite vibes.

How tool output surfaces. When Claude Code runs a test, the output streams inline, full-width, in the same scroll as the conversation. When something fails, the failure is right there in the transcript, and the model's next message references it directly. Cursor pushes tool runs into a side surface that you can drill into; the conversation stays clean. The Cursor approach is calmer to read; the Claude Code approach makes the model's reasoning loop more legible. I noticed the model in Claude Code "noticing" failures faster — but the model isn't noticing anything different. The harness is putting the failure inside its attention budget instead of behind a tab.

How files are presented. Cursor's editor-native model means the agent's understanding of "the current file" is anchored to the tab you have open. The harness gives the model strong implicit context about what you're looking at. Claude Code has no such anchor — every file reference is explicit, by path. That sounds like a downside, and sometimes it is. But it also means Claude Code's behavior is dramatically more reproducible: the same prompt with the same files produces the same trace, because there's no hidden "what's in your editor right now" channel. Cursor optimizes for the typical case; Claude Code optimizes for legibility. Same model, different information topology.

That's the through-line. The harness decides what the model sees, when the model gets to act, and what I see of what the model did. Three small policy decisions — confirmation defaults, tool-output placement, file context resolution — and the same model produces what feels like two different products.

The trap I'd been falling into is the trap most teams fall into when they ship an agent: they ship the engine and call it a product. They expose the model's full output, every tool call, every intermediate thought. They surface raw reasoning. They make the user the harness. And then they wonder why it feels like a tech demo instead of a tool. The model is not the product. The harness is. Or as the Claude Code team put it on Latent Space — Claude Code is "a Unix utility, not a chat product." That's a UX position, not a model position.

What I'd do differently next time

When I'm building my own agent UX — even something tiny, like a CLI loop for a one-off task — I now copy the small behaviors first, before the big ones.

Big behaviors are model choice, system prompts, tool selection. Those get most of the attention because they're easy to reason about. Small behaviors are: when does it ask, when does it stay quiet, what does it show by default, what does it hide, where does tool output land, how is state surfaced. Those are the ones that determine whether the thing feels like a tool or a tech demo.

So my new starter checklist for any agent UX I'm building:

  • What's the policy on the first write of a session? Ask, or act?
  • Where does tool output go — inside the model's attention budget, or outside it?
  • What does the user see when nothing's happening — a spinner, a plan, a summary, silence?
  • When the agent finishes, does it recap, or does it stay quiet?

Each of those answers is small. Each one bends the perceived intelligence of the agent more than I would have guessed before I ran the same task in two harnesses.

What I'm still unsure about

The open question for me is whether harness UX commoditizes or stays durable.

The optimistic case: the moves I described — when to ask, where output lands, how files resolve — are imitable. Cursor can ship a confirm-before-first-write toggle in an afternoon. Claude Code can ship implicit-current-file context behind a flag. If the differentiation is just policy defaults, the gap closes fast.

The pessimistic case: harness UX is a thousand small decisions that compound into a feel, and feel is genuinely hard to copy because it's not one decision, it's the coherence across all of them. Claude Code's terseness, its file-by-path discipline, its in-stream tool output, and its first-write confirmation aren't four independent choices — they're one philosophy, repeated. That's harder to clone than any single feature.

I don't know which case wins. My instinct is the second, but I'm watching closely.

References

  • Latent Space — Claude Code: Anthropic's Agent in Your Terminal (Boris Cherny, Cat Wu) — the "Unix utility, not a chat product" framing came from this episode, and it's the cleanest articulation I've found of harness UX as an explicit product position rather than a side effect of building a chat box. https://www.latent.space/p/claude-code
  • Gergely Orosz, How Claude Code is Built — the team-process view: 5 releases per engineer per day, 10+ prototypes per feature, harness changes shipped at the cadence of model changes. Convinced me that "the harness is the product" isn't just rhetoric at Anthropic — it's how they actually staff and ship. https://newsletter.pragmaticengineer.com/p/how-claude-code-is-built
  • Cursor, Best Practices for Coding with Agents — the explicit Cursor-side view of agentic UX. Reading this side-by-side with Anthropic's best-practices doc is the cleanest way I know to see two different harness philosophies for the same underlying capability. https://cursor.com/blog/agent-best-practices
  • Cursor, New Coding Model and Agent Interface (Cursor 2.0) — the moment Cursor made "the harness is the product" explicit by shipping a new interface centered on agents instead of files. The "agent-centric, not file-centric" reframe is the kind of small-but-everything UX shift this post is about. https://cursor.com/changelog/2-0
  • Anthropic, Claude Code Best Practices — the canonical writeup of how Anthropic thinks the harness should be driven. Re-read with the "harness is the product" lens, the doc reads less like a tutorial and more like a UX manifesto: every recommendation is a policy decision about when the agent acts, asks, or stays quiet. https://www.anthropic.com/engineering/claude-code-best-practices
  • Sean Grove (OpenAI), The New Code: Specifications as the Fundamental Unit of AI-Era Programming — Grove's "the best coder is the best communicator" line is upstream of this post: if specs are the artifact, then the harness's job is to make spec-writing fast and legible. The harness becomes the IDE for English. [link TBD — Shreyansh to confirm]
  • Andrej Karpathy, Software 3.0 / AI Startup School 2025 — Karpathy's "your lever over the system is what you put in the context window" is the deepest version of the harness-as-product thesis. The harness is the lever. https://www.latent.space/p/s3

Harness Engineering · Part 8 of 10. Previous: My agent remembers me across sessions. Next: I built my own agent harness over a weekend.

← back to /writingset in fraunces · geist · geist mono