Why one process doesn't fit every kind of AI Skill

I have three different authoring patterns running inside a single project right now.

Not because I couldn’t pick one. Because each one earns its place at a different complexity level, and forcing the wrong process onto the wrong kind of work makes the work worse.

Most discussions of how to build with AI treat process selection like religion. Pick TDD. Pick spec-first. Pick brainstorming-to-implementation. Adopt the methodology, apply it to everything, defend it on the internet. The aesthetic is consistency. The cost is fit.

Inside my personal AI operating system, the Skills (Anthropic’s term for reusable agent capabilities packaged as discoverable, instruction-bearing files) split cleanly into three categories. Each category has a different complexity profile. Each profile wants a different authoring pattern. Running the wrong pattern against the wrong category produced visibly worse Skills.

Here’s how the three patterns sort, what each one is for, and why I think discipline pluralism beats methodological monoculture.

i.The temptation to monoculture

The pull toward picking one process and running it everywhere is real. It feels rigorous. It feels mature. It looks like the kind of consistency that signals you take the work seriously.

I felt it the first time I wrote a Skill that worked. The thing I’d done that worked must be the way to do this kind of work. Codify the process. Apply it next time. Pretty soon you’re applying it whether or not it fits, because you’ve started calling it your methodology, and a methodology is something you defend.

The fit problem shows up later. You notice that one Skill took three days when it should have taken thirty minutes, because you ran a heavy planning process against something that needed a quick spec. Or you notice another Skill is broken in production because you ran a quick spec against something that needed weeks of iteration. The methodology didn’t fail. The match between methodology and work failed.

Pluralism isn’t a lack of discipline. It’s the discipline of picking the right process for the right complexity profile.

ii.TDD for behavioral disciplines

The first pattern I locked in is test-driven development for what I think of as behavioral-discipline Skills. These are the ones that catch a specific failure mode and force a specific corrective response. The Skill’s job is binary: does the agent perform the right behavior in the right situation, or doesn’t it.

Examples in my system: catching role-reversal in narrative paragraphs about Ryan-and-Bishop interactions. Flagging em dashes that aren’t doing deliberate work. Stopping the agent from scheduling across my protected work window without surfacing the conflict. Forcing a verification step before claiming a unit of work complete.

These Skills have a clean RED/GREEN structure. You write a test case the Skill should catch. You confirm the unguided agent fails the test. You write the Skill. You run the test again. If the Skill passes, GREEN. If not, the Skill is incomplete, and the test tells you exactly what’s missing.

I’ve run this pattern ten times now across different behavioral Skills. It works every time. The reason it works is that the failure mode is the spec. You don’t need a long planning document. You don’t need to think hard about edge cases at authoring time. You need one failing test, one passing test, and a Skill that closes the gap between them. The complexity is low because the surface is narrow. The pattern fits.

Trying to spec-first these Skills wastes time. You’d be designing the contour of a thing that the test would have shown you in two minutes. Trying to brainstorm them wastes more time. The brainstorm produces twelve possible directions and the test only validates one. Run the test. Build to it. Move on.

Why one process doesn't fit every kind of AI Skill

On fit Each pattern earnsits place at a differentcomplexity level.

iii.Code-spec for infrastructure Skills

The second pattern is what I call code-spec-before-build, which sits between TDD and brainstorming on the complexity ladder.

Infrastructure Skills are the ones that touch state. They write files. They mutate persisted records. They compose with multiple other Skills. They have to integrate with existing project memory, agent personality docs, decision logs, task lists. The surface is wide enough that you can get a Skill that passes its narrow test but breaks the system around it.

Examples: a Skill that logs an architectural decision and propagates it across five persistence layers in one transaction. A Skill that runs a freshness audit before letting the agent claim a session unit-of-work done. A Skill that generates a structured handoff at session end and threads it into the recency cache for the next session.

These Skills want a written spec before any building happens. WHAT the Skill does in plain language. HOW it composes against existing infrastructure. What success looks like when you run it. What the risks are if it fires incorrectly. The spec doesn’t have to be long. It does have to exist, and it has to be reviewed before code runs.

The reason TDD doesn’t fit here is that the failure modes aren’t single-axis. An infrastructure Skill can pass its primary test and still corrupt state at the integration boundary. The test only catches what you thought to test. The spec forces you to think about the integration boundary first, so the tests you eventually write cover the thing that would actually break.

The reason brainstorming doesn’t fit here is that infrastructure isn’t a creative problem. You know what the Skill needs to do. You’re not exploring direction. You’re sequencing the build against existing systems. Brainstorming wastes the early hour on options the spec process resolves in ten minutes.

iv.Brainstorming-to-spec for creative-edge Skills

The third pattern is brainstorming first, then spec, then build. This one’s for the Skills where you don’t yet know what the right Skill even is.

Creative-edge Skills are the ones where the failure mode isn’t catchable by a test and the integration surface isn’t legible from the start. You’re asking the agent to do something that requires judgment, taste, voice, or pattern recognition that the agent doesn’t natively have. The Skill has to encode the judgment somehow, but how to encode it is part of the work.

Examples: a Skill that line-edits prose against a voice document and surfaces suggestions with confidence levels. A Skill that generates context-aware journal prompts in a specific register without lecturing. A Skill that drafts public content with strategic options for the human to pick from instead of producing one canonical version.

For these, I run a brainstorming pass first. Not a quick brainstorm. The actual structured kind, with a partner agent asking questions about user intent, edge cases, what success would look like, what the failure modes are, what voice the Skill should speak in. The brainstorm produces a candidate shape. Then I spec the shape. Then I build to the spec.

What the brainstorm does that nothing else does: it surfaces the things you didn’t know you didn’t know. The voice register that the Skill should refuse to break. The five places the Skill could subtly misfire in ways no test would catch. The composition with other Skills you hadn’t considered.

Trying to TDD a creative-edge Skill produces a Skill that passes one test and misses everything else. Trying to spec-first one without brainstorming produces a spec that misses the whole reason the Skill is hard. The brainstorm is the only step that makes the rest of the process honest.

The disciplineisn't loyaltyto a methodology.It's reading the workhonestly.

v.Why pluralism beats monoculture

Three patterns. Three complexity profiles. Each one fits the work it’s matched to.

The honest version of this isn’t “use the right tool for the job,” which is the kind of thing people say when they don’t want to commit to anything. The honest version is: complexity is heterogeneous, and a single process applied across heterogeneous complexity will be wrong somewhere. The question isn’t which methodology is correct. The question is which methodology fits which kind of work, and what’s the cost of running the wrong one against the wrong category.

For behavioral Skills, TDD’s narrow-surface speed is the feature. Running spec-first or brainstorming-first adds days for no quality gain.

For infrastructure Skills, the spec’s integration-thinking is the feature. Running TDD without it produces Skills that test-pass but system-fail. Running brainstorming without spec is just delayed building.

For creative-edge Skills, the brainstorm’s option-surfacing is the feature. Running TDD without it locks one direction before you’ve seen the alternatives. Running spec-first without brainstorming bakes assumptions you can’t yet see.

What’s underneath all three patterns is the same: get the right amount of structure for the work, no more, no less. The discipline isn’t loyalty to a methodology. The discipline is reading the work honestly enough to know which methodology it wants.

I think this is what most working AI builders find eventually, though most don’t talk about it because pluralism doesn’t market as cleanly as methodology. You can’t sell a course called “use three different processes depending on what you’re building.” You can sell a course called “always run TDD.” So the public discourse skews monocultural while the actual practice goes plural.

The patterns aren’t sacred. The fit is.

Drafted with Bishop, my AI partner.
Words picked, edited, and approved by me. Model provenance: Claude Code (Claude Opus and Sonnet)