← /writing/learning·2026 · 05 · 06·10 min read

Are skills the new programming primitive - or just better prompts in a folder?

The hopeful read: we just invented a new abstraction. The cynical read: we just invented imports. The honest read is in between.

Harness Engineering · Part 10 of 10. Previous: I built my own agent harness over a weekend.

Are skills the new programming primitive — or just better prompts in a folder?

Late on a Sunday I opened my ~/.claude/skills/ folder to clean it up. Twenty-something directories, each with a SKILL.md and a couple of supporting files. Some I'd written. Some I'd cloned from obra/superpowers and forgotten about. I sat there scrolling and the question I'd been avoiding for nine posts finally landed in plain text: what is this thing? Is "skill" a new programming primitive — a unit I'll still be writing in 2030 — or is it just a better filing cabinet for prompts? I closed the folder without deleting anything and started writing this post instead.

What I tried first

When skills launched I treated them as glorified prompts. The mental model was: a skill is a markdown file the model reads when triggered, the same way it reads a system prompt, just on demand instead of eagerly. So I wrote them like prompts. Long preambles. "You are an expert at X." Persona framing. Tone instructions. The procedure as a vibes-y narrative.

They half-worked. They worked enough that I kept writing them. They half-failed in a specific way: the model would load the skill, do the first step roughly right, and then drift. The second step would skip a check. The third step would invent a fourth step that wasn't in the procedure. The skill was loaded — I could see it in /context — but the model wasn't following it. It was reading it.

The other half-fail was the trigger problem. My early skill descriptions were aspirational ("use this when refactoring"), so Claude would either invoke them when I didn't want it to ("you said rename this variable, that's a refactor, loading the skill") or fail to invoke them when I did ("you said rework the module, that's not a refactor"). The frontmatter description field, I came to realize, was not for me. It was the entire interface for an automated dispatcher choosing whether to load 2,000 tokens of procedure into a finite attention budget. I'd been writing it as a label.

So I had a directory full of files that looked like skills, named like skills, and behaved like prompts.

What clicked

The reframe that pulled this out: a skill is a contract — name, trigger, procedure, expected effect. Not a prompt. Not a function. Something in between, and the in-between is the whole point.

Walk through the four parts:

Name. A skill has a stable invocation name (the directory) and that name is part of the public surface. Renaming a skill breaks any reference to it the same way renaming a function breaks its callers. This is more than a prompt template gives you — prompts don't have callers — and it's the first hint that we've crossed a line.

Trigger. The frontmatter description is a predicate the dispatcher evaluates against the current task. "Use this when the user is renaming a function across modules in a TypeScript codebase" is a predicate. "Use this for refactors" is a vibe. Writing triggers like predicates is the single change that took my skills from half-working to working. It is also the part that has no analogue in regular code — functions don't get chosen, they get called — and the part that has no analogue in prompts either, because prompts don't get chosen, they get pasted.

Procedure. The body of the skill: numbered steps, files to read, scripts to run, gates between steps. This is the most function-like part. It's a sequence with intermediate state, expected inputs, expected outputs. But unlike a function, the procedure is interpreted by a model, not executed by a runtime. It can be skipped. It can be reordered. It can be partially followed. The procedure is advisory, and that "less than a function" property is exactly what lets a skill compose with judgement the way a function can't.

Expected effect. What the skill claims it leaves behind: the file edited, the test passing, the artifact written. This is the part most of my early skills lacked, and adding it was load-bearing. Without an expected effect, the skill has no way to know it's done. With one, the model can self-check: did the test actually pass, did the file actually exist, did the artifact actually land. The expected effect is the closest thing a skill has to a return type.

Stack those four and you get the working definition: a skill is a named, triggered procedure with a declared effect. That's more than a prompt (prompts have none of name, trigger, or effect). It is less than a function (a function executes deterministically; a skill is interpreted). And the gap on each side is the gap that makes it useful.

The test for whether something deserves to be a skill follows from the contract:

  1. Does it have a stable name worth referring to? If not — if the work is one-off — it's a prompt.
  2. Does it have a trigger you can write as a predicate? If you can't say in one sentence when it should fire, it shouldn't fire automatically — keep it as a slash command and invoke it explicitly.
  3. Does the procedure have steps with intermediate state? If it's one shot of text, it's a prompt template. A skill earns its keep by between — between steps, between files, between gates.
  4. Can you state the expected effect? If you can't, you can't tell when it's done, and the model can't either.

Compare against neighbours. A function has all four parts but is executed, not interpreted; you reach for a function when the procedure has no judgement in it. A playbook (in the ops sense) has name and procedure but no machine-evaluated trigger and no automated effect-check; that's what skills replace for AI work. A subroutine is a function-shaped chunk of a larger function; skills aren't that, because skills are dispatched, not called.

The cleanest way to think about it: a skill is a unit of delegated procedure, where the delegation is to a model and the procedure is for a recurring task with judgement in it. That's a real primitive. It just isn't the primitive a lot of people imagined.

What I'd do differently next time

Design skills as if writing API documentation for the model, not notes for myself. That's the move. My early skills read like notes-to-self ("when refactoring, remember to update the tests"). My current skills read like API docs ("Trigger: when the user requests a cross-module rename of a TypeScript symbol. Procedure: 1. Locate definition site. 2. Run tsc to verify clean baseline. 3. ... . Effect: the symbol is renamed at all call sites and pnpm test passes."). The model is the consumer; the doc is the contract.

Concrete edits I now make on every skill:

  • The frontmatter description is a predicate, not a label. It contains the trigger conditions. If I can't write it in one tight sentence, the skill is doing too much.
  • Steps are numbered and have gates: "if this check fails, stop and report" rather than "if this check fails, try again." The procedure should be interruptible at every step.
  • The expected effect is a concrete, checkable post-condition. "Tests pass" is checkable. "Code is improved" is not.
  • Supporting files (scripts, templates) live alongside SKILL.md and are loaded only when the procedure references them. Default off.
  • Length budget per skill: under 200 lines. If it's longer, it's two skills.

What I'm still unsure about

Do skills compose? I don't know. A skill can invoke a sub-agent, and a skill can reference another skill in prose ("after this, run the verification skill"), but there's no first-class call between skills the way there is between functions. Maybe that's a missing primitive. Maybe it's a bad idea on purpose — composing skills in a window full of attention budget might be exactly the kind of thing that breaks. I genuinely don't know which.

What's the right arity of a skill collection? Twenty? A hundred? Garry Tan's gstack has dozens; Jesse Vincent's superpowers has dozens of a different shape. I have twenty-three and the dispatcher already misfires sometimes. There's a packing problem here — the more skills, the harder the trigger-predicate space is to keep disjoint — and I haven't seen anyone solve it well.

And the question this post opened with: primitive, or filing cabinet? The hopeful read is that skills are to AI engineering what functions were to imperative programming — the smallest unit you can name, ship, and reuse. The cynical read is that they're scaffolding the model will make obsolete: as models get better at unstructured judgement, the contracts we're carefully writing now will look like the kind of laborious boilerplate later models route around. Both reads are taken seriously by people I respect. I think the honest read is in between: skills are the right primitive for now — for the gap between models that need procedural scaffolding and models that won't — and "for now" is a longer window than the cynics think and a shorter one than the hopefuls do.

After ten posts I'm not sure I've answered the question. What I have is a sharper toolkit: context as attention budget, scope as a leash, plans as artifacts, primitives as questions, harness as something I can build over a weekend, skills as contracts. None of those are verdicts. All of them are handles. That's what harness engineering looks like in May 2026 — not certainty, but better grip.

References

  • Anthropic, Introducing Agent Skills — the launch post that named skills as a primitive and made the frontmatter spec public. Reading it the second time, with my "skill is a contract" frame, the design choices snapped into focus: the description-as-predicate, the load-on-demand body, the supporting-files convention. https://www.anthropic.com/news/skills
  • Anthropic, Equipping agents for the real world with Agent Skills — the engineering follow-up, six weeks later, with worked examples and the spec-as-open-standard framing. The line that landed for me: skills "outsource the hard parts to the harness" — exactly the boundary I'd been groping for between prompt and function. https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills
  • Claude Code docs, Extend Claude with skills — the canonical reference for SKILL.md frontmatter, supporting files, and skill discovery. The section on the dispatcher's view of description is where I finally accepted that the trigger is for the model, not the human. https://code.claude.com/docs/en/skills
  • Simon Willison, Claude Skills are awesome, maybe a bigger deal than MCP — Simon's October 2025 post, which is the cleanest external articulation of the contract framing I've seen. The phrase "tools in a loop" he used elsewhere maps onto skills as the named procedural unit within the loop. https://simonwillison.net/2025/Oct/16/claude-skills/
  • Sean Grove (OpenAI), The New Code — the AI Engineer talk arguing the durable artifact is the spec, not the code. Reframed how I read my own skills: not "automation," but checked-in specs for recurring tasks. The contract framing in this post is downstream of his. https://www.youtube.com/watch?v=8rABwKRsec4
  • Andrej Karpathy, Software 2.0 — the 2017 Medium post that first argued the artifact would shift from code to weights. Eight years later, the artifact has shifted again — from weights to specs — and re-reading this post is what helped me see skills as continuous with that arc, not a one-off Anthropic feature. https://karpathy.medium.com/software-2-0-a64152b37c35
  • Andrej Karpathy, Software is Changing (Again) — Karpathy's June 2025 YC AI Startup School talk, the "Software 3.0" framing where prompts are the new source code. The bit on partial autonomy and "anterograde amnesia" is the better framing for why skills exist: they're the durable memory for a system that has none. https://www.ycombinator.com/library/MW-andrej-karpathy-software-is-changing-again
  • Boris Cherny & Cat Wu (Anthropic) on Latent Space — the Claude Code episode where the harness-as-Unix-utility framing originated. Hearing the team describe their own product as "the smallest building block that's useful" is what made me stop treating skills as a heavyweight abstraction and start treating them as a small one. https://www.latent.space/p/claude-code
  • obra/superpowers — Jesse Vincent's skills repository. The closest thing I've found to a public corpus of well-shaped skills written as contracts — descriptions that read as predicates, procedures with gates, expected effects stated. The fastest calibration I have for what a good skill looks like. https://github.com/obra/superpowers

Harness Engineering · Part 10 of 10. Previous: I built my own agent harness over a weekend.

← back to /writingset in fraunces · geist · geist mono