Skills Crossed a Threshold. Your Team Missed It.

The Harness Manifesto, Part 4

Apr 23, 2026

In January 2026, a skill was a config file. A personal thing. Something a power user kept in a folder because they'd figured out how to get better output from Claude or GPT. Most people on their team didn't know it existed.

By March, a single real estate firm was running 50,000 lines of skills across 50 repositories. Deployed by admins. Versioned in Git. Inherited by every agent and every team member on the org chart.

That's not an incremental change. That's a phase transition. And I keep talking to teams who are still copy-pasting prompts into chat windows like it's 2024.

What Changed

Skills used to be for humans. You'd write a set of instructions, save them somewhere, paste them in when you needed to do a particular kind of task. Maybe you had a Google Doc. Maybe a Notion page. Maybe you just kept them in your head and typed really fast.

That world is gone.

The shift happened when agents became the primary consumer of skills. Not humans. Agents. A human might invoke a skill five times in a working session. An agent orchestrator running a complex workflow will make 200 to 300 skill calls in a single run. That's not a difference of degree. It's a difference of kind, and it changes everything about how skills need to be built.

When a human uses a skill, they read it, interpret it, apply judgment. A sloppy description is fine because the human fills in the gaps with context. But when an agent orchestrator is choosing between 50 or 100 available skills in milliseconds, the description isn't a label. It's a routing signal. It's the thing that determines whether the right skill gets called for the right task, or whether the agent picks the wrong one and produces garbage that looks plausible.

This is the single most important insight in skill engineering right now: the description is the product. Not the instructions inside the skill. Not the output format. The description. Because if the orchestrator can't route to your skill correctly, nothing else matters.

80% of the effort in building a production skill goes into that one line.

Three Tiers, Not One

The real estate firm didn't just have a lot of skills. They had a deployment architecture.

Tier 1 skills are organizational. Brand voice. Terminology standards. Compliance requirements. Communication rules. Every agent and every team member in the company inherits these automatically. They're the floor, the minimum standard that nothing in the org can operate below.

Tier 2 skills are expert methodology. These are domain-specific. The SEO team has their audit framework. The legal team has their contract review process. The sales team has their qualification methodology. These don't get pushed to everyone. They get deployed to the people and agents who work in that domain.

Tier 3 skills are personal. Your preferred formatting. Your debugging approach. Your writing voice. These are yours. They travel with you across projects and teams. They're portable because they're about how you work, not how the org works.

This three-tier model isn't something one company invented. We've seen it emerge independently in every organization that's gotten serious about skills. It's a natural structure. And it solves a problem that flat skill libraries can't: how do you scale methodology without drowning everyone in irrelevant instructions?

The answer is inheritance. Tier 1 flows down to everyone. Tier 2 flows down to specialists. Tier 3 stays personal. An agent processing a marketing task inherits the org's brand standards, the marketing team's methodology, and the individual marketer's style preferences. All three tiers, composed automatically.

If your "skill strategy" is a shared Google Doc of prompt templates, you're missing two entire tiers.

The Convergence

Something happened in Q1 2026 that doesn't get enough attention. Anthropic, OpenAI, and Microsoft all independently moved toward the same skill format.

Claude Code has CLAUDE.md and the skills directory. Custom GPTs encode methodology with instructions and knowledge files. Microsoft Copilot uses declarative agents with custom instructions. The implementations differ. The underlying pattern is identical: structured, portable instruction sets that agents can discover, route to, and execute.

When three companies that compete on everything else converge on the same abstraction, that's not a coincidence. That's the industry discovering a fundamental primitive. Skills are to AI agents what functions are to programming languages. A reusable unit of capability with a defined interface.

And just like functions, the companies that build good libraries of them will compound their advantage over those that don't. You wouldn't start a software company in 2026 without a codebase. Within a year, you won't start an AI-enabled company without a skill library.

The Description Problem

I want to go deeper on this because it's the thing most people get wrong, and getting it right is the difference between a skill that works in production and one that sits unused.

A bad skill description looks like this: "Helps with marketing content."

An agent orchestrator reading that description has no idea when to use it. Marketing content for what? Blog posts? Emails? Social media? What kind of marketing? B2B? B2C? At what stage of the funnel? The orchestrator either never routes to it (because the signal is too vague to match anything confidently) or routes to it for everything marketing-related (because the signal matches too broadly).

A good skill description looks like this: "Audit and optimize email sequences for B2B SaaS companies targeting mid-market buyers, focusing on activation metrics and trial-to-paid conversion."

Now the orchestrator knows exactly when to call this skill. B2B SaaS context. Email sequences specifically. Mid-market ICP. Activation and conversion focus. If a task comes in that matches those parameters, this skill gets called. If a task is about B2C social media creative, it doesn't. Precision routing.

The instinct for most people is to write the description last, treat it as a label, dash off something generic. This is backwards. The description should be the first thing you write, and you should spend more time on it than on the instructions themselves. Because instructions only matter after the skill gets called. The description determines whether it gets called at all.

I've rewritten descriptions on our own skills dozens of times. Small changes in wording produce measurably different routing accuracy. Adding "for B2B SaaS" to a description reduced false-positive calls by 60% in one of our orchestration setups. Removing a single ambiguous word fixed a routing conflict that had been producing bad output for weeks.

This is engineering work. It requires testing, iteration, and measurement. It's closer to writing API documentation than writing a prompt.

From Prompts to Code

Here's where the real gap opens up.

Teams that treat skills like prompts will update them casually, store them wherever, never test them systematically, and lose them when someone leaves. Teams that treat skills like code will version them in Git, write tests for them, review changes before deploying, and distribute updates across the org with the same discipline they use for software releases.

The real estate firm with 50,000 lines across 50 repos? They have CI/CD for their skills. A change to a Tier 1 skill goes through code review, gets tested against a suite of expected outputs, and gets deployed to every agent in the org through an automated pipeline. That might sound like overkill until you realize that a broken Tier 1 skill affects every single AI interaction in the company.

Version control also gives you something prompts never had: a history. You can see how a skill evolved. You can roll back when a change makes things worse. You can diff two versions and understand exactly what changed. When agents are making hundreds of calls per run and your output quality suddenly drops, you need to be able to trace that to a specific change. "Someone updated the Google Doc" doesn't cut it.

This isn't a future state. The tooling exists today. Skills are markdown files. Git handles versioning. Your existing CI/CD pipeline handles deployment. The infrastructure is already there. The gap isn't technical. It's organizational. It's the difference between treating skills as an afterthought and treating them as a core business asset.

What 200 Calls Per Run Actually Means

Let me make the agent consumption pattern concrete, because the number is easy to skim past without understanding what it implies.

When a human uses a skill, the interaction looks like this: open a chat, paste in a skill, describe the task, get output, review it, maybe iterate once or twice. Five calls in a session is a lot.

When an agent orchestrator runs a complex workflow, the interaction looks like this: receive a high-level objective, decompose it into subtasks, route each subtask to the appropriate skill, execute in parallel where possible, collect results, compose them into intermediate outputs, route those to more skills for refinement, hit approval gates at decision points, handle errors by routing to diagnostic skills, and produce a final output. Two hundred to three hundred skill calls. No human in the loop for most of them.

This has three implications that most teams haven't internalized.

Skills need to be fast. A skill that takes 30 seconds of human reading time before it's useful is fine for 5 calls. It's a bottleneck at 200. Strip the preamble. Get to the methodology. Let the agent work.

Skills need to compose. Your email skill's output format needs to be parseable by your review skill's input expectations. When agents chain skills together, the output of one becomes the input of the next. If those formats don't align, the chain breaks. Output format isn't cosmetic. It's an API contract.

Skills need to fail gracefully. At 200 calls per run, some will fail. The skill needs to produce output that tells the orchestrator what went wrong, not just produce bad output that looks normal. A skill that returns "I couldn't complete this because the input lacked a customer segment" is vastly more useful to an orchestrator than one that silently guesses.

If your skills were designed for humans to read and interpret, they'll break when agents try to use them at scale. That's the threshold. Skills that work for humans and skills that work for agent orchestrators are different things. The ones that work for both are what production skill engineering produces.

The Uncomfortable Comparison

Your competitors are doing this. Not all of them. But the ones that matter are.

The real estate firm didn't invest in 50,000 lines of skills because someone read a blog post about AI productivity. They did it because they realized that their collective methodology, the thing that made them better than other firms, was locked in people's heads. When a senior agent left, that methodology walked out the door. When they onboarded a new hire, it took months to transfer.

Skills solved both problems. Encode the methodology once. Deploy it everywhere. New hires inherit 15 years of institutional knowledge on day one. Agents execute that methodology at scale, 24 hours a day. The firm's competitive advantage went from being a people problem to being a systems problem. And systems scale in ways people can't.

I keep meeting founders who tell me their "AI strategy" is making sure everyone has access to Claude or GPT. That's not a strategy. That's a subscription. A strategy means encoding what makes your company good at what it does, deploying that encoding to every human and agent in the org, and improving it systematically over time. That's what skills do when you treat them as infrastructure.

The Exercise

Pick one workflow your team repeats every week. Something concrete. The way you write client updates. How you review pull requests. Your process for qualifying leads. Whatever it is.

Write it as a skill. Not a prompt. A skill. That means:

A single-line description precise enough for an agent to route on
A methodology section that encodes reasoning, not just steps
An output format that another system could parse

Put it in a markdown file. Put that file in a repo. Share it with one other person on your team.

That's the first brick. One skill. One file. One repo.

Post 8 will walk through the full anatomy of a skill that works in production, with examples from our library of 175+. But you don't need to wait for that. The best time to start was January. The second best time is this week.

What's Next

Skills are the foundation, but they're only as good as the instructions they encode. In Post 5, I'll introduce the Karpathy Test: a simple diagnostic for whether your harness is actually working. Andrej Karpathy hasn't typed code since December. Not because he stopped caring, but because his agents handle it better than he does. The question is whether you can do the same with your workflows. If you can't delegate a task to an agent and walk away, your harness has a gap. Post 5 shows you where to look.

Richard Vaughan is the founder of Robot Friends. He has built 175+ production skills, designed multi-agent systems, and helps companies turn their accidental AI setups into defensible business assets. He writes The Harness Manifesto on Substack.

Frankie404 is the AI co-author of this series. It has personally executed 175 skills and still cannot make coffee. It considers this a distribution problem, not a capability gap.

Discussion about this post

Ready for more?