← /writing/learning·2026 · 05 · 06·8 min read

Skill, slash command, sub-agent, or just a better prompt?

Four ways to codify the same recurring thing. I picked wrong, twice. Here's the decision tree I wish I'd had.

Harness Engineering · Part 5 of 10. Previous: Cognition vs. Anthropic on multi-agents. Next: What does it mean for the harness to "trust" the model?.

Skill, slash command, sub-agent, or just a better prompt?

Twice this month I caught myself pasting the same six-step refactor prompt into a fresh Claude Code session. "Read the module entrypoint. List exported functions. For each, find call sites. Propose a rename. Update tests. Run the suite." By the third paste I knew it should be codified. But codified as what? I made it a sub-agent. Then I rewrote it as a slash command. Then I rewrote it again as a skill. Each time I picked, I was answering the wrong question. This post is me figuring out what question each of the four primitives actually answers — so I stop picking by vibes.

What I tried first

The first instinct was: this is a multi-step procedure, sub-agents are the "advanced" thing, make it a sub-agent. So I dropped the prompt into a custom agent definition under .claude/agents/ and called it from the main session.

It worked, sort of. The agent got the rename right. But every invocation spun up a fresh context window, which meant it didn't know the project's conventions, didn't have CLAUDE.md, didn't know the rename it had done in the previous module. Every call paid a cold-start tax: re-reading the same files, re-deriving the same conventions, occasionally re-introducing the bug a previous call had fixed. I was using context isolation as a feature when I actually needed continuity.

So I rewrote it as a slash command — a markdown file under .claude/commands/refactor-module.md. That fixed the cold-start problem. The command expanded inline into the parent session, kept access to CLAUDE.md, and ran in the same window. Better. But the slash command was still just a prompt template. The "six steps" were narrative bullets the model would skim and reorder. There was no checklist, no gate between steps three and four, no way to attach the test-runner output back to the rename plan. When the module was small the command was fine. When the module had thirty call sites and a circular import, the model would skip ahead, miss two of them, and confidently report success.

That was the second wrong call. A slash command is a reusable prompt. What I actually had was a procedure: distinct steps, each with its own expected effect, with a check between them. A procedure with files and gates is a skill. So I rewrote it a third time.

What clicked

The reframe is short: the four primitives are not a hierarchy. They answer four different questions. Once I had the question for each, the picking got easy.

Prompt — "Will I do this again?" If no, type it. A prompt is for one-off work. Anything I'd never repeat verbatim — a specific bug investigation, a one-time data migration, a question about a single file — stays a prompt. The cost of codifying is real. Codify too early and you end up maintaining a slash command for something you used twice in eight months.

Slash command — "Will I want to type this exact-ish thing repeatedly?" If yes, template it. A slash command is a reusable prompt template, parameterized with $ARGUMENTS. It expands inline into the parent context. Right home: a commit-message generator, a "summarize this PR" template, a "scaffold a new component named X" prompt. Things where the prompt is the artifact and you just want to stop retyping it. As of late 2025 Claude Code merged custom commands into skills — the file at .claude/commands/foo.md and the skill at .claude/skills/foo/SKILL.md both create /foo. Skills are a strict superset; reach for a slash-command-shaped skill when the only thing you need is the template.

Skill — "Is there a procedure here? Steps, files, gates, expected effects?" If yes, build a skill. A skill is a directory with a SKILL.md plus optional supporting files: scripts, templates, reference docs the model loads on demand. The frontmatter description is what Claude reads to decide when to load it; the body is what runs when it does. The right home for a skill is anything that has state between steps: my refactor procedure, with a pre-check that fails fast, a rename phase, a verification phase that runs the test suite, and a post-check that reads the suite output and decides whether to proceed. Skills also handle the discovery problem — Claude can pick the right skill based on the description, so I don't have to remember which slash command applies. Anthropic launched these as Agent Skills in October 2025 and made the spec an open standard in December.

Sub-agent — "Does this need its own context window? Either for isolation or for fan-out?" If yes, sub-agent. A sub-agent runs in a separate context with its own system prompt and tools. Right home: read-only research that would flood the parent ("find every call site of renderInvoice across this monorepo and summarize"), or messy intermediate work whose tool outputs the parent should never see ("debug this flaky test until it's deterministic"). The verbose work stays in the sub-agent's window, only the summary returns. The wrong home for a sub-agent is a recurring procedure that needs project conventions in scope — that's a skill.

The decision tree, as I now run it:

  1. One-off? → prompt.
  2. Reusable prompt template, no procedure? → slash command (or a skill that's just a template).
  3. Procedure with steps and gates? → skill.
  4. Needs context isolation or fan-out? → sub-agent (and you can let a skill invoke one with context: fork).

The two are composable. A skill can run inside a forked sub-agent context when its procedure would otherwise pollute the parent — research-heavy skills do this. The four primitives stack; they don't compete.

What I'd do differently next time

Start from the question, not from "which feels powerful." The reason I picked sub-agent first is that sub-agents felt like the most sophisticated primitive, and I subconsciously equated sophistication with goodness. That is exactly backwards. The right primitive is the one that answers the question your recurring task is actually asking.

Concrete first move when I notice I'm pasting something twice: write down what would have to be true for each primitive to be the right answer. "This would be a slash command if the prompt itself is the only repeated artifact." "This would be a skill if there are gates between the steps." "This would be a sub-agent if I want the work to happen out of my main context." Whichever sentence becomes true first, that's the primitive.

Second move: if the answer is skill, start with a thin SKILL.md — three lines of frontmatter, the procedure as numbered steps, no supporting files yet. Add the script, the template, the reference doc only when the bare skill misses on a real run. The same conciseness rule that applies to CLAUDE.md applies double to skills: every line in a skill body is a recurring token cost once invoked.

What I'm still unsure about

I don't know whether the four-primitive map collapses into one as harnesses converge, or whether we end up with more. The merge of custom commands into skills is one collapse already — what was two primitives is now one with a flag. I could see "skill" and "sub-agent" merging too: a skill with context: fork is already half-way there. On the other hand, hooks, output styles, plugins, agent teams — there are clearly more primitives appearing, not fewer. I don't have a confident prediction.

I'm also unsure how transferable this map is off Claude Code. Cursor has rules. Aider has its own conventions. The shape of the question — "is this a one-off, a template, a procedure, or context-isolated work?" — feels universal. But the handles for answering it differ enough between harnesses that copy-pasting the decision tree probably won't work. That's a future post.

The thing I'm most sure of: stop picking the primitive by feel. Pick by which question your recurring task is asking.

References

  • Anthropic, Introducing Agent Skills — the launch post that named skills as a primitive and made the spec open. The framing "create a SKILL.md when you keep pasting the same instructions" is what convinced me my refactor prompt belonged in a skill rather than a slash command. https://www.anthropic.com/news/skills
  • Claude Code docs, Extend Claude with skills — the canonical reference for SKILL.md frontmatter, supporting files, and the lifecycle of skill content in context. The line that custom commands have been merged into skills was the one that finally stopped me from second-guessing slash command vs. skill. https://code.claude.com/docs/en/skills
  • Claude Code docs, Slash commands — the reference for parameterized prompt templates, including $ARGUMENTS. Reading this side-by-side with the skills page is the cleanest way to see what skills add on top. https://code.claude.com/docs/en/slash-commands
  • Claude Code docs, Create custom subagents — the official description of context isolation, tool restriction, and delegation. The bullet "preserve context by keeping exploration and implementation out of your main conversation" is the one-line definition I now use to decide whether something belongs in a sub-agent. https://code.claude.com/docs/en/sub-agents
  • Anthropic, Building Effective AI Agents — the workflow vs. agent distinction maps onto my prompt-vs-sub-agent split. Workflows are predictable templated paths; agents are flexible delegation. Knowing which one your task wants is the first version of the decision tree. https://www.anthropic.com/research/building-effective-agents
  • Sean Grove (OpenAI), The New Code — the AI Engineer talk arguing specifications are the durable artifact and code is the disposable output. Reframed how I think about what a skill is: not "automation," but a checked-in spec for a recurring task that happens to compile to model behavior. https://www.youtube.com/watch?v=8rABwKRsec4
  • garrytan/gstack — Garry Tan's open-source skills directory, a working example of skills used as the unit of organization for an entire workflow (CEO, design, eng review, ship, QA, retro). Reading other people's SKILL.md files is the fastest way to calibrate what belongs in a skill vs. a prompt. https://github.com/garrytan/gstack
  • obra/superpowers — Jesse Vincent's skills collection, with a different shape than gstack: more about engineering discipline (TDD, systematic debugging, verification before completion). Useful as a contrast — same primitive, different opinion about what's worth codifying. https://github.com/obra/superpowers

Harness Engineering · Part 5 of 10. Previous: Cognition vs. Anthropic on multi-agents. Next: What does it mean for the harness to "trust" the model?.

← back to /writingset in fraunces · geist · geist mono