← /writing/learning·2026 · 05 · 06·8 min read

Cognition says no multi-agent. Anthropic ships sub-agents in Claude Code. Who's right?

Two of the sharpest teams in the field disagree out loud. After trying both, here's what I'd actually code on Monday morning.

Harness Engineering · Part 4 of 10. Previous: The agent hallucinated a tool argument. Next: Skill, slash command, sub-agent, or just a better prompt?.

Cognition says no multi-agent. Anthropic ships sub-agents in Claude Code. Who's right?

Last week I sat down to do something I'd been putting off: split a four-file refactor across three parallel agents and watch them race. I'd just re-read Walden Yan's Don't Build Multi-Agents on Cognition's blog. The day I finished it, I scrolled past Anthropic's How we built our multi-agent research system in the same browser tab. Two of the most opinionated harness teams in the world had published essays a day apart that read like rebuttals. And here I was, about to type Task(...) three times in a row, with no idea which side I was on. So I tried it.

What I tried first

The naive setup. I had a feature that touched four files: a route handler, a service, a migration, and a test. I wrote three sub-agent dispatches — one to update the route + service, one to write the migration, one to update the tests — and fired them off in parallel from a single Claude Code turn. The plan was that the parent agent would collect three diffs, stitch them together, and I'd review one PR.

It looked good for about ninety seconds. Each child agent came back with a confident summary. Then I read the diffs.

The route agent had renamed a function. The test agent didn't know about the rename — its sibling's edits weren't in its context — so it had patched the old function name in the test file and now the suite wouldn't even import. The migration agent had picked a column name (created_at) that disagreed with what the service agent had typed (createdAt) two seconds earlier in a different sub-window. The parent agent, presented with three plausible diffs, did the worst thing it could have done: it merged them, declared victory, and moved on. I had to back the whole thing out.

What broke wasn't the model. The model was fine in each sub-window. What broke was the seam. Three children, each making implicit decisions — naming, error shape, return type, which library version to pin — that the others couldn't see. Cognition has a name for this: dispersed decision-making with insufficient shared context. I'd just lived it.

What clicked

Reading the two essays back to back after that experience, the apparent contradiction collapsed.

Cognition's argument, distilled: when sub-agents have to make implicit decisions that the other sub-agents need to be consistent with — code style, naming, edge-case handling, which abstraction to reach for — sharing only summaries between them is not enough. Summaries are lossy, and the lost bits are exactly the implicit choices that make a codebase hang together. So multi-agent systems get fragile. The fragility isn't intelligence; it's the bandwidth of inter-agent communication. Their conclusion is to default to single-threaded linear agents, with very narrow exceptions for sub-agents that only answer well-defined questions and never write code.

Anthropic's argument, distilled: their Research feature decomposes a query, fans it out to parallel sub-agents that each search a slice of the web, and stitches their findings into a synthesis. They report a multi-agent system with Opus 4 as lead and Sonnet 4 as workers beating single-agent Opus 4 by 90.2% on their internal research eval. They're explicit that this only works for breadth-first tasks, that token usage is roughly 15× a chat, and that they had to invest enormous effort in the lead agent's prompt to make sub-agent dispatches behave.

Two essays. Two opposite-sounding conclusions. But look at what each is actually doing.

Cognition is building Devin. Devin writes code. Code-writing is an exercise in maintaining a single coherent set of implicit choices across files, naming, errors, types, conventions. Two children writing code in parallel must converge on those choices or the thing won't compile. The shared mutable state — the code itself — is the work product, and it has to be globally consistent.

Anthropic's Research feature is doing fan-out reads. Each sub-agent goes off, searches a slice of the web, and returns findings. There is no shared mutable state between sub-agents. Sub-agent A finding "the CFO is Jane Doe" doesn't conflict with sub-agent B finding "the CTO is John Smith." The lead agent stitches facts, not decisions. And summaries — the lossy thing Cognition is worried about — are perfectly adequate for facts.

The reframe is short: the question isn't multi-agent yes or no. It's: does this work parallelize without shared mutable state?

That heuristic falls out cleanly. Code-write tasks have shared mutable state — the codebase — and dense implicit-decision surface. Don't fan them out. Read-only research, grep across many files, exploration of a large directory, "find every place we call this deprecated API," "summarize what each of these twelve services does" — no shared mutable state, no implicit decisions to coordinate, just facts to gather. Fan them out and reap the wall-clock speedup.

Anthropic's own Building Effective Agents essay (the earlier, calmer one) frames the same distinction in different language: workflows vs. agents, and within workflows, a specific pattern called parallelization with two flavors — sectioning (split a task into independent subtasks) and voting (run the same task multiple times for consensus). The fan-out-research case is sectioning over a read-only world. The code-write case is, in this framing, just not a sectioning candidate at all — the subtasks aren't independent, they only look independent.

Both teams are right for the work they're doing. Cognition is making a strong claim about a domain (autonomous code generation) where the heuristic correctly says no fan-out. Anthropic is making a strong claim about a domain (open-ended research) where the heuristic correctly says yes fan-out. Neither is the universal answer. The heuristic is.

What I'd do differently next time

Default single-loop. The boring, default, almost-always-right setup is one main agent, one context window, one mental thread. Most of my work is code-write work, and code-write work has shared mutable state by construction.

Sub-agent dispatch only for read-only fan-outs. Concretely, the cases where I now reach for Task:

  • "Find every place in this repo that calls oldThing and tell me what shape the call sites have." Three sub-agents, each grepping a third of the directory, return findings. No edits.
  • "Read these five competing libraries' READMEs and summarize the API surface for each." Five sub-agents, one library each. No edits.
  • "We have twelve services. For each one, find the entry point and tell me what it does." Fan out, summarize back.
  • "Write me a one-paragraph briefing on what changed in the last fifty commits." One sub-agent doing the read so the noise never enters the parent context.

The pattern in all four: the sub-agents return information, the parent makes the decisions, and no two sub-agents touch the same artifact. This is the same context-isolation move I rely on for context budget — fresh window, summary back — just used for parallelism instead of pruning.

The thing I will not do again: fan out a code-write task. Even when it feels like the files are independent. They aren't.

What I'm still unsure about

The honest open question is whether this collapses or sharpens as models get better. Cognition's last paragraph hints that single-threaded agents will eventually become good enough at expressing their reasoning that multi-agent collaboration becomes cheap — and at that point fan-out for code-write becomes viable. I can imagine that. I can also imagine the opposite: as models get smarter, the implicit-decision surface gets richer (better taste, more conventions, deeper consistency), and the bandwidth required to share it across agents goes up faster than the agents' communication ability.

I also don't know where the line is for partially shared state. What about a code-write task where the sub-agents share a strict typed interface up front — does that pin enough of the implicit decisions that the rest can fan out? I have a hunch that's the next frontier, and I haven't tested it.

For now, the heuristic — does this parallelize without shared mutable state — is the thing I'd code on Monday. Both essays make sense once you see they're answering different questions.

References

  • Walden Yan / Cognition, Don't Build Multi-Agents — the essay that made the shared-context argument crisp for me. The line about decisions being "too dispersed" is the one I now recite before reaching for Task on a code-write job. https://cognition.ai/blog/dont-build-multi-agents
  • Anthropic, How we built our multi-agent research system — the counterweight: when fan-out does work, and what it cost them to make it work (15× tokens, lots of prompt iteration on the lead agent). The 90.2% number is real but the framing matters more — this is breadth-first reads, not code writes. https://www.anthropic.com/engineering/multi-agent-research-system
  • Anthropic, Building Effective Agents — older, calmer, and the source of the workflows-vs-agents distinction plus the parallelization (sectioning / voting) pattern that, once I had the vocabulary, let me see Cognition and the Research-system post as answering different questions. https://www.anthropic.com/research/building-effective-agents
  • Claude Code docs, Create custom subagents — the official mechanics of how sub-agents actually work in the harness I use every day: separate context window, scoped tool access, summary back to parent. Re-reading this with the fan-out heuristic in mind made the design choices land. https://code.claude.com/docs/en/sub-agents
  • Cognition, Multi-Agents: What's Actually Working — the follow-up where Cognition softens slightly: there are multi-agent patterns they now ship, but writes stay single-threaded. Worth reading right after Don't Build Multi-Agents — it's the same authors holding the same line while admitting where fan-out earned its keep. https://cognition.ai/blog/multi-agents-working
  • Sholto Douglas & Trenton Bricken on the Dwarkesh Podcast, How Does Claude 4 Think? — the bit about how much of current capability comes from scaffolding rather than the raw model is what made me stop treating "multi-agent vs single-agent" as a model question and start treating it as a harness question. https://www.dwarkesh.com/p/sholto-trenton-2

Harness Engineering · Part 4 of 10. Previous: The agent hallucinated a tool argument. Next: Skill, slash command, sub-agent, or just a better prompt?.

← back to /writingset in fraunces · geist · geist mono