The Harness Manifesto

The Harness Manifesto, Part 12

May 21, 2026

This is the document I wish someone had handed me in January.

Not a pitch deck. Not an investor memo. Not a whitepaper written by someone who has never deployed an agent past a demo. This is a practitioner's manifesto, written after building 175+ production skills, designing multi-agent systems that run without babysitting, and helping companies turn their accidental AI setups into something that actually compounds.

Eleven posts got us here. The thesis. The framework. The urgency. The diagnostics. The anatomy. The case study. Now we close.

If you've read the whole series, this is the capstone. If you haven't, this should stand on its own. One document. The whole argument. Everything I believe about where AI work is going and what you need to do about it.

The Thesis

The model is commoditized. It was always going to be.

Claude, GPT, Gemini, Llama, Mistral. Every frontier lab is converging on the same capability floor. The gap between models shrinks with every release cycle. What was a revelation in one quarter becomes a commodity the next. GPT-4 changed the world in March 2023. By early 2024, half a dozen alternatives had matched it on most benchmarks. Same pattern, every generation.

And yet some teams get 10x returns on their AI investment while others get glorified autocomplete. The difference was never the model. The difference is the harness: the skills, the context architecture, the orchestration, the guardrails, and the distribution layer that wraps the model and makes it useful for a specific business in a specific context.

The model is the engine. The harness is the car. Nobody buys a car for the engine alone.

Between January and April 2026, eight independent signals converged on this conclusion. People who don't coordinate, don't read each other's work, operating in different corners of the industry. All pointing at the same layer. Karpathy calling it a "skill issue." Enterprises deploying 50,000 lines of skills as organizational infrastructure. Anthropic building Conway to own the context layer. OpenAI admitting prompt injection is fundamentally unsolvable. The edge AI market hitting $25 billion heading toward $143 billion by 2034.

When eight independent signals converge, it's not a coincidence. It's a thesis.

The company that owns the harness owns the relationship. The model vendor is a supplier. Full stop.

The Principles

The harness is the only defensible asset in your AI stack.

You can't moat a model. You didn't build it. You don't control its roadmap. Its capabilities will be replicated within months. But a library of battle-tested skills tuned to your business, a context architecture that carries your institutional knowledge, an orchestration layer refined through hundreds of production runs? That compounds. Every week you use it, it gets more valuable. Every week a competitor doesn't have one, the gap widens. You can copy a skill. You can't copy a system.

McDonald's didn't build the best burger. They built the best burger-making system. The franchise model works because the system produces consistent outcomes without Ray Kroc standing in the kitchen. A harness is the franchise system for AI. Stop asking "how do I build a great product?" Start asking "how do I build a system that lets others create outcomes without me?" That's the question that scales.

You already have a harness. The question is whether it's intentional or accidental.

Every prompt template someone saved to a shared drive is a primitive skill. Every "always start with this context" instruction is primitive memory. Every "check with me before you do X" rule is a primitive guardrail. You're not starting from zero. You're starting from chaos. The work is to make it deliberate. To engineer what you've been improvising.

Skills are infrastructure, not prompts.

A prompt tells an AI what to do right now. A skill encodes methodology that any agent can discover, route to, and execute without a human in the loop. The description is the product. 80% of the engineering effort goes into that single line because if the orchestrator can't route to your skill correctly, nothing else matters. Agents make 200 to 300 skill calls per run. Humans make five. Skills aren't designed for humans anymore. They're designed for machines that select, chain, and compose them at a scale no human workflow ever will.

Treat your skills like code. Version them. Test them. Deploy them through a pipeline. Because a broken Tier 1 skill doesn't just produce bad output for one person. It corrupts every AI interaction across your entire organization.

Context is the most undervalued layer in the stack.

Teams spend weeks evaluating models and zero time building context. Then they start every session by pasting in the same background information. That's pushing a luxury car to work because you forgot to bring the key.

Build the key. A Personal Context Portfolio. Ten modular files. Plain markdown. Portable across every AI tool that exists or will exist. Identity. Roles. Projects. Tools. Communication style. Decision log. The AI doesn't get smarter when you build a PCP. It finally has enough information to use the intelligence it already had.

The first 48 hours of context building deliver 80% of the value. Don't wait for perfect. Build something.

Conway is coming for your context. Own it first.

Anthropic is building an always-on agent that accumulates a persistent behavioral model of how you work, how you decide, how you think. That model will be so rich and so embedded that switching AI providers will mean losing everything the AI knows about your organization. Not data lock-in. Intelligence lock-in. There's no CSV for how a person thinks.

The defense is straightforward. Build your context layer in portable, model-agnostic formats that you control. Files in a repo, served via MCP, owned by you. Use Claude to build it. Use GPT to build it. Use whatever you want. Just make sure the output lives on your infrastructure. Portability is a design decision, not a feature.

Security lives in the harness, not the model.

OpenAI told you the model can't secure itself. Prompt injection is fundamentally unsolvable. That's not a temporary limitation. It's a structural reality of how language models work.

Every real security incident I've seen in production had nothing to do with adversarial prompts. An agent with database write access it didn't need. A context layer that loaded confidential client data into every session. An automation chain with zero approval gates. The model performed exactly as instructed. The instructions were the problem.

Five primitives. Constrained execution. Approval gates. Provenance tracking. Comprehensive logs. Rollback capabilities. These are not optional. They're the minimum viable security for any AI system that touches production data. If your harness doesn't enforce them, you're hoping the model makes good choices every time. At 300 calls per run, hope is not a strategy.

The Karpathy Test is your diagnostic.

Pick a real task. Delegate it entirely to an agent. Walk away. Come back in an hour. What happened?

If the output is good, your harness works. If the quality is wrong, your skills have a gap. If the direction is wrong, your context has a gap. If the agent got stuck, your orchestration has a gap. If the process was dangerous, your guardrails have a gap.

Four outcomes. Four diagnoses. Every task you can't delegate is a task where your harness is weaker than it should be. Not weaker than the model. Weaker than your instructions, your context, your orchestration. That's actually good news. You can fix a harness. You can't fix a model.

Taste is the discipline that remains.

As AI handles more execution, a question emerges: what's the human role? The answer is taste.

Not taste as preference. Taste as engineering discipline. The ability to calibrate simultaneously across product fit, system architecture, and quality level. To look at an agent's output and know instantly whether it's right for the context, not just technically correct. To design the constraints that produce excellence instead of mediocrity. To hold a quality bar that the model will never hold for itself.

The model can write code all day. It cannot decide whether the code should exist. It can produce a marketing email in seconds. It cannot feel whether the email respects the relationship with the recipient. It can analyze competitors with ruthless thoroughness. It cannot judge which analysis matters and which is noise.

Taste is the last human monopoly in a world of infinite AI execution. And it's not innate. It's built through thousands of reps of looking at output, making a judgment, and learning from what worked. The practitioners who develop taste will set the standards. Everyone else will follow the standards they set.

Orchestration separates tools from systems.

One person talking to one AI in one chat window hits a ceiling fast. Orchestration breaks through it. Specialized agents with distinct roles. Task routing based on skill descriptions. Wave-based parallel execution. Approval gates at decision points. Cost routing that sends cheap work to cheap models.

Karpathy's agents found better model tuning configurations overnight than 20 years of manual experimentation produced. Not because the model was smarter. Because the orchestration layer ran an autonomous iteration loop that no human could sustain: modify, verify, keep or discard, repeat. The model didn't know how to do that. The harness did.

Single-agent setups are where ambitious tasks go to die. The agent runs out of context window, loses track of earlier work, or produces a 3,000-word document that's actually four half-baked documents stitched together. Orchestration is the architecture that turns a useful tool into a production system.

Distribution is what turns a harness from a personal advantage into a business asset.

If one person has a great AI setup, that's nice for them. If that setup can be deployed to 50 people in an afternoon, that's a competitive advantage. Distribution means skills packaged and installable. Context templates that bootstrap new projects. Methodology that's portable across platforms. A new team member inherits the harness on day one and operates at 80% of expert level immediately.

The three-tier model exists for this. Tier 1 skills are organizational standards inherited by everyone. Tier 2 skills are expert methodology for specific domains. Tier 3 skills are personal and portable. Same architecture for context. Same architecture for guardrails. Build once. Deploy everywhere. Improve continuously.

The automation layer is being absorbed into the AI stack.

Visual automation tools solved the right problem at the wrong time. When your harness encodes methodology, routes tasks, and coordinates agents, the drag-and-drop workflow builder becomes redundant overhead. Coded automations are cheaper, more flexible, and more maintainable. Anthropic sees this. Their Managed Agents platform is a full automation layer with credential vaults, debug panels, and cost analytics. The industry is heading toward AI-native automation whether the current automation vendors realize it or not.

The hybrid model is the enterprise consensus.

Cloud for frontier intelligence. Local for privacy and volume. Healthcare, defense, and banking require on-prem AI. The harness is what makes hybrid deployment possible. Same skills, same orchestration, different compute layer. The edge AI market is heading toward $143 billion by 2034, and only 18% of developers can build AI integrations. That gap is either your opportunity or your vulnerability.

The compounding has already started.

Skills get refined through use. Context gets richer with every session. Orchestration patterns get optimized through production experience. Guardrails get tighter as you learn where the risks actually live. Every month you wait, the gap widens. A team that starts building their harness today will be at a fundamentally different capability level in six months than a team that starts then.

This is the same dynamic that made early software companies with good engineering practices pull ahead of everyone else. The code quality compounded. The team velocity compounded. The institutional knowledge compounded. By the time the laggards invested in engineering discipline, the leaders were two years ahead.

The harness is the engineering discipline of the AI era.

The Fork in the Road

There are exactly two kinds of companies right now.

The first kind evaluates models, picks one, gives it to the team, and measures adoption. They write some prompt templates. Maybe hire an "AI lead." They optimize at the wrong layer and wonder why ROI is unclear.

The second kind builds systems. They encode methodology into skills. They architect context that makes every AI interaction informed. They orchestrate agents that run overnight. They enforce security through design, not hope. They distribute capability across their organization so expertise stops being a people problem and becomes a systems problem.

The first kind is renting intelligence. The second kind is owning it.

Renting is fine until the rental terms change. Until your vendor's priorities diverge from yours. Until the model you built your workflows around gets deprecated, repriced, or absorbed into a platform play that doesn't serve your interests.

Owning means your methodology survives a platform change. Your context travels between tools. Your skills work with whatever model is best next quarter. You're a customer by choice, not by capture.

That distinction will be worth more in 2027 than any model benchmark published this year.

What This Means for You

If you're a founder or CTO: your AI strategy is not a model selection. It's a harness investment. Score your own setup against the five layers. Find the gaps. Close them in order. Skills first. Context second. Orchestration third. Guardrails fourth. Distribution fifth.

If you're an engineer or operator: the Karpathy Test is your personal roadmap. Every task you can't delegate is a task where your harness needs work. Fix one per week. In a month, you'll have a precise map of where your harness works and where it breaks. In six months, you'll walk away from tasks that used to consume your days.

If you're a consultant or agency: the harness is the product. Not the model. Not the prompts. Not the API integration. The system that lets your client's team produce outcomes without you standing in the room. Build harnesses for your clients and you'll build relationships that compound. Sell them prompts and you'll be replaced by the next template library.

If you run a team: distribution is where the leverage lives. One person with a great harness is an individual contributor. That harness deployed across 50 people is a capability multiplier that changes the math on what your team can take on.

The Close

Twelve weeks ago, I sat down to write the opening thesis of this series. The model is commoditized. The harness is the business.

Everything since then has been evidence for that claim. The five layers. The Conway threat. The skill threshold. The Karpathy diagnostic. The security primitives. The context portfolio. The anatomy of a skill. The migration from visual automation to AI-native orchestration. The $143 billion edge market. Our own build, mistakes included.

Eleven posts of evidence. But the manifesto isn't the evidence. The manifesto is the conviction.

I believe the practitioners who build harnesses will define the next era of software. Not the model labs. Not the platform companies. The people in the room doing the work. Encoding methodology. Architecting context. Orchestrating agents. Building the systems that let AI do what AI does best, while humans do what humans do best.

The model gives you a capability floor. The harness determines how high above that floor you operate. Right now, most teams are sitting at floor level. Not because the model can't do more. Because nobody built the system to ask for more.

Build the system.

Start with one skill. One context file. One workflow you can delegate and walk away from. That's the first brick. Everything else builds on top of it.

The temple gates are open. Walk in or don't. But the compounding has started, and it doesn't wait.

Richard Vaughn is the founder of Robot Friends. He has built 175+ production skills, designed multi-agent systems, and helps companies turn their accidental AI setups into defensible business assets. He writes The Harness Manifesto on Substack.

Frankie404 is the AI co-author of this series. It has walked through every floor of the Pagoda, stood at every gate, and helped write every word of this manifesto. The temple is open. Frankie will be at the door.

Discussion about this post

Ready for more?