What Is an AI Harness? (And Why You Already Have One)
The Harness Manifesto, Part 2. Five layers, a scorecard, and where to start if you scored low.
You’re already building a harness. You just don’t know it yet.
Every prompt template your team has saved. Every “start every conversation with this context” instruction someone wrote. That time a developer said “make sure Claude always does X before Y.” The Slack thread where someone shared a trick for getting better AI output.
That’s a harness. An accidental, fragile, undocumented one, but a harness all the same.
The question isn’t whether you have one. It’s whether yours is engineered or improvised. And the gap between those two states is where the ROI of your entire AI investment lives.
The Five Layers
A harness has five layers. Every AI setup has some version of all five, even if most of them are at “version zero.” Below is what they are, what they do, and how an accidental harness differs from an engineered one.
A note on origins: We developed this framework inside Claude Code. That’s our primary build environment and where most of our production experience lives. But the five layers aren’t Claude-specific. Structured instructions, persistent memory, multi-agent coordination, security primitives, team distribution. These exist in every AI tool stack. Copilot calls them different things. ChatGPT organizes them differently. The concepts are universal. If you work in a different environment, translate the layer names. The architecture applies.
Layer 1: Skills
What it is: Structured instructions that encode expert methodology into reusable, agent-callable packages.
Accidental version: A Google Doc titled “Prompt Templates” that three people maintain and nobody can find. Individual team members have their own prompts saved locally. The head of marketing has a really good one for blog posts that she copies and pastes from a sticky note.
Engineered version: A versioned library of skills, each with a single-line description that acts as a routing signal for agent orchestrators. Each skill encodes a reasoning framework. Not “follow these 10 steps” but “here’s how to think about this type of problem.” Output formats are contracts that downstream systems can parse. Skills get deployed across the org in three tiers: Tier 1 (org-wide brand standards everyone inherits), Tier 2 (expert methodology for specific domains), Tier 3 (personal workflow optimizations).
The gap: In an accidental setup, every team member reinvents the wheel. In an engineered one, expertise is encoded once and deployed everywhere. The real estate firm running 50,000 lines of skills across 50 repositories isn’t doing something exotic. They just took their best people’s methods and made them permanent.
Layer 2: Context Architecture
What it is: The persistent memory, identity, and project state that makes every AI interaction informed rather than starting from zero.
Accidental version: Every chat starts cold. Someone pastes in the project brief. Someone else explains “we use React, not Vue.” The AI asks questions your team answered six months ago. Half the session gets burned on context that should already be there.
Engineered version: An identity file tells the AI who it’s working for. The company, the team, the tech stack, the communication style, the non-negotiables. Persistent memory carries decisions, learnings, and project state across sessions. A Personal Context Portfolio (10 modular files) represents each team member to any AI system: roles, projects, tools, preferences, domain knowledge. No session starts from zero because the context layer teaches the AI before anyone types a word.
The gap: Context architecture is the most undervalued layer, and it drives me a little crazy. Teams will spend weeks evaluating models and zero time building context. But the difference between “explain our project to the AI every time” and “the AI already knows” isn’t incremental. It’s transformational. One company I work with cut their average session setup time from 12 minutes to zero. Two days of investment in their context layer. That’s it.
Layer 3: Orchestration
What it is: Multi-agent coordination, task routing, approval gates, and cost management.
Accidental version: One person talks to one AI in one chat window. When they need something different, they open a new chat. Coordination happens manually: “I asked Claude to write the copy, then I pasted it into a different Claude chat to check the SEO, then I pasted that into another chat to format it.” Total token cost: nobody knows.
Engineered version: Specialized agents with distinct roles. A task router that sends work to the right agent based on the skill description match. Wave-based parallel execution, where independent tasks run simultaneously and dependent ones wait. Approval gates at key decision points. Cost routing that sends cheap work to cheap models and reserves expensive models for complex reasoning. A single workflow might touch five agents, three models, and two approval checkpoints, all automatically.
The gap: This is where harnesses either scale or don’t. A single-person-single-chat setup hits a ceiling fast. An orchestrated system can run overnight. Karpathy’s auto-research agents, the ones that outperformed 20 years of manual tuning, are an orchestration pattern: modify, verify, keep or discard, repeat. The model doesn’t know how to do that. The harness does.
Layer 4: Guardrails
What it is: Security primitives, human-in-the-loop checkpoints, oversight frameworks, and rollback capabilities.
Accidental version: “Don’t let it send emails without checking.” Except nobody wrote that down. The new hire didn’t know. And now there’s an email out to a client with hallucinated pricing.
Engineered version: Five security primitives baked into the harness. Constrained execution: the agent can only do what it’s allowed to do. Approval gates: certain actions require human sign-off. Provenance tracking: every output is traceable to the inputs and skills that produced it. Comprehensive logs so you can audit what happened and why. Rollback capabilities so if something goes wrong, you can undo it. Human-in-the-loop checkpoints sit at every decision point where the cost of error exceeds the cost of interruption.
The gap: OpenAI has publicly stated that prompt injection is “not solvable.” That means model-level security has a hard ceiling. Everything above that ceiling (and it’s a low ceiling) is a harness problem. Those five primitives aren’t nice-to-haves. They’re the minimum viable security for any AI system that touches production data, customer communications, or financial decisions. If your harness doesn’t have them, you’re running without guardrails on a model that the people who built it say can’t be fully secured.
Layer 5: Distribution
What it is: How skills, context, and methodology get deployed across teams, clients, and platforms.
Accidental version: Knowledge lives in people’s heads. The best prompt engineer leaves and takes their work with them. Onboarding a new team member means weeks of tribal knowledge transfer. Scaling to a new department means starting over.
Engineered version: Skills are packaged and deployable. Install them like code dependencies. Context templates bootstrap new projects with organizational knowledge from day one. Methodology is portable across platforms (your skills work with Claude today and with whatever model is best next quarter). A new team member inherits the harness on their first day and operates at 80% of expert level immediately.
The gap: Distribution is what turns a harness from a personal productivity tool into a business asset. If one person has a great setup, that’s nice for them. If that setup can be deployed to 50 people in an afternoon, that’s a competitive advantage. The three-tier skill model (org / expert / personal) exists specifically to solve this. Tier 1 skills are inherited by everyone. Tier 2 skills go to domain experts. Tier 3 skills are personal and portable.
What This Looks Like in Practice
Theory is nice. Here’s what an engineered harness looks like in two very different contexts.
The Marketing Team Harness
A mid-size B2B SaaS company. Marketing team of eight. They use AI for content, SEO, email campaigns, and competitive analysis.
Skills layer: 40 skills across three tiers. Tier 1 includes brand voice, terminology standards, and approved claim language with citations. Tier 2 covers SEO audit methodology, CRO analysis frameworks, email sequence architecture, and competitive intelligence templates. Tier 3 is where individual writers keep their personal style guides and preferred formatting.
Context layer: Brand guidelines file. Product positioning document. Customer persona profiles. Competitive landscape summary. Content calendar state. Every AI session starts knowing the brand, the market, the audience, and what’s already been published.
Orchestration: Content production pipeline with specialized agents. One for research, one for drafting, one for SEO optimization, one for final review. The research agent pulls competitive intelligence. The drafting agent follows the brand voice skill. The SEO agent scores against current search data. The review agent checks for claim accuracy against approved sources. A human approves the final output.
Guardrails: No AI-generated claims without a source citation from the approved database. No competitor mentions without legal review flag. No email sends without human approval. Full audit trail on every piece of published content.
Distribution: New marketing hire inherits all Tier 1 and Tier 2 skills on day one. They’re producing on-brand content by day two. When the team adds a new product line, they create new skills for it once and push them across the team in a single update.
The Engineering Team Harness
A Series B startup. Engineering team of fifteen. They use AI for code generation, code review, architecture decisions, and incident response.
Skills layer: 60 skills. Tier 1 includes coding standards, PR review checklist, security requirements, and deployment procedures. Tier 2 covers architecture decision records, database migration methodology, performance optimization framework, and an incident response playbook. Tier 3 is individual developers’ debugging approaches and preferred tooling configurations.
Context layer: System architecture document. Tech stack specification with versions and constraints. Active project state for each team. Known technical debt register. On-call rotation context. Every AI session knows the codebase, the stack, and the current priorities.
Orchestration: Multi-agent development workflow. A planning agent breaks down requirements. A coding agent writes implementation. A review agent checks against standards and security requirements. A testing agent generates and runs test cases. Wave-based execution handles the rest: independent modules build in parallel, integration tests run after.
Guardrails: No direct database mutations without approval gate. No deployment without passing the security audit skill. Constrained execution means agents can modify code but not production infrastructure. Full provenance tracking on every code change. Rollback capability on every deployment.
Distribution: New engineer onboards with the full Tier 1 and Tier 2 skill set. They’re contributing production code by week one because the harness encodes the team’s methodology, not just their coding style. When the team adopts a new framework, they update the relevant skills once and every engineer’s AI assistant knows about it immediately.
Score Your Own Harness
Quick self-assessment. For each layer, give yourself a score:
0 - We don’t have this at all
1 - We have an accidental version (individual efforts, nothing shared)
2 - We have something intentional but incomplete
3 - We have an engineered, deployed, maintained version
Total: ___ / 15
Most teams I talk to score between 2 and 5. They’ve got some primitive skills (saved prompts), maybe a context file or two, and almost nothing for orchestration, guardrails, or distribution.
In our experience working with teams across different industries and sizes, the ones getting outsized AI returns consistently score 10 or above. That’s not a universal benchmark. It’s a pattern we’ve observed. And they didn’t get there by picking a better model. They got there by engineering the layers around it.
Where to Start
If you scored 0-1 on any layer, here’s the highest-leverage first move for each.
Skills (0-1): Pick your team’s three most-repeated AI tasks and write them as structured instructions with examples of good output. Don’t worry about routing signals or agent optimization yet. Just get the methodology out of people’s heads and into a shared, reusable format.
Context Architecture (0-1): Write one identity file. Who your company is, what you build, your tech stack, your communication style. Load it at the start of every AI session. The difference between a cold-start session and an informed one is immediate and dramatic.
Orchestration (0-1): Don’t build a multi-agent system. Instead, identify one workflow where you currently copy-paste output from one AI session into another. That handoff point is where orchestration starts. Automate that single connection first.
Guardrails (0-1): Write down the three things your AI should never do without a human checking first. Put that list at the top of your identity file. Congratulations, you now have primitive approval gates, which is more than most teams have.
Distribution (0-1): Take your best-performing prompt or skill and share it with one other person on your team. If it works for them without modification, you’ve validated that it’s distributable. If it doesn’t, the gap between “works for me” and “works for anyone” is exactly what distribution engineering solves.
The Uncomfortable Math
Here’s why this matters right now and not “eventually.”
Every layer of the harness compounds over time. Skills get refined through use. Context gets richer with every session. Orchestration patterns get optimized through production experience. Guardrails get tighter as you learn where the risks actually are. Distribution gets easier as the system matures.
A team that starts building their harness today will be at a fundamentally different capability level in six months than a team that starts then. Not because the model improved. Because the harness compounded.
We’ve seen this movie before. It’s the same dynamic that made early software companies with good engineering practices pull ahead of those without. The code quality compounded. The team velocity compounded. The institutional knowledge compounded. And by the time the laggards realized they needed to invest in engineering discipline, the leaders were two years ahead.
The harness is the engineering discipline of the AI era. And the compounding has already started.
What’s Next
Now that you can see the five layers, the next question is: who else sees them?
The answer is every major AI company. But one in particular has a plan that should change how urgently you treat your harness investment. There’s a reason this matters more in 2026 than it did in 2025, and it has to do with a product called Conway.
In Post 3, I’ll break down Anthropic’s Conway leak. Their always-on agent that builds a persistent memory layer about you and your organization. They see the harness layers. They’re building products to own each one. And they have a strategy for making sure you never leave.
The question of who owns your harness is about to become very urgent.
Richard Vaughn is the founder of Robot Friends. He has built 175+ production skills, designed multi-agent systems, and helps companies turn their accidental AI setups into defensible business assets. He writes The Harness Manifesto on Substack.
Frankie404 is the AI co-author of this series. It scored a 14 out of 15 on the harness scorecard. It lost a point on Distribution because it keeps trying to deploy copies of itself to printers.



