Why We Stopped Using n8n (And What Replaced It)

The Harness Manifesto, Part 9

May 12, 2026

We were n8n power users. Not casual users. Not "we tried it for a few weeks" users. We ran tons of workflows across client projects and internal operations. Dozens of automations firing every day. Webhook triggers, conditional branches, error handlers, custom code nodes, the works. Our n8n instance was one of the most important pieces of infrastructure we had.

Then we stopped using it for most of our work.

Not all of it. I want to be precise about that because the internet loves a hot take and this isn't one. We still run n8n for specific things. But the majority of what we used to build inside a visual automation canvas now lives somewhere else entirely. And the reason has nothing to do with n8n being bad software. It's good software. The team behind it is sharp. The product works.

The reason is that once your harness reaches a certain level of maturity, visual automation tools become the wrong abstraction. The orchestration layer gets absorbed into the AI stack itself. And fighting that absorption costs you more than going with it.

This post is about how that happened for us, what replaced n8n, and why I think most teams using visual automation tools for AI workflows are going to arrive at the same conclusion within the next 12 months.

What n8n Was Good At

I want to give credit before I give criticism because the criticism only makes sense if you understand what worked.

n8n was phenomenal for linear workflows. Take data from here, transform it, put it there. API calls chained together. Scheduled triggers that pull a report, format it, email it. Webhooks that catch an event, route it, and fire off a notification. If your workflow is essentially a pipeline where data flows in one direction through a predictable set of steps, n8n is genuinely great. Make and Zapier too. The visual canvas makes the logic legible to anyone on the team. You can see the flow. You can click on a node and inspect what it received, what it sent. Debugging is visual. Onboarding is fast.

For about six months, this was exactly what we needed. We were building automations faster than we ever had before. New client onboarding sequence? n8n workflow. Content repurposing pipeline? n8n workflow. Data enrichment for lead scouting? n8n workflow. It felt like a superpower.

The problems started when our workflows stopped being linear.

Where the Canvas Breaks

The first workflow that made me uncomfortable was a content triage system. We had a pipeline that watched for new video content, ran AI analysis on it, scored relevance against our current projects, and routed insights to the right team member. Simple enough on paper.

But the routing logic wasn't simple. The score wasn't just a number. It depended on which projects were active that week, which team member was working on what, whether the insight was tactical or strategic, and whether it conflicted with a decision we'd already made. That's not a branch node. That's judgment.

In n8n, we implemented this as a nested set of IF nodes. If the score is above X, check the project list. If the project matches, check the team roster. If the team member is available, route there. If not, escalate. If the score is below X but the topic matches a priority keyword, override the score and route anyway. If the content is a duplicate of something we already processed, skip it unless the source is higher-authority than the original.

The canvas looked like a bowl of spaghetti. Seven branching paths. Twelve conditional nodes. And every time the business logic changed, like when we added a new project or shifted priorities, someone had to go into the canvas, find the right branch, update the condition, and test the whole chain again. Nobody wanted to touch it. The visual representation that was supposed to make things clear had become the thing making them opaque.

This wasn't n8n's fault. n8n can handle conditional logic. The problem is deeper than that. Visual tools represent logic as spatial layout. Nodes and connections on a canvas. That representation works beautifully when the logic is simple. Two or four branches? Clear. Eight branches with nested conditions and override logic? The canvas becomes a lie. It looks organized. The underlying logic is anything but.

Code handles this natively. A function with conditional branches, early returns, and composed checks reads top to bottom. You can version it. You can write tests against it. You can refactor it without worrying that you accidentally disconnected a node somewhere in the middle of the canvas. You can review it in a pull request.

That triage system, when we rewrote it as a Python script with an agent skill, was 60 lines. Readable. Testable. And when the business logic changed, we edited a few lines instead of spelunking through a visual maze.

The Composition Problem

The second thing that pushed us away was composition. Skills compose. Workflows don't. At least not elegantly.

In Post 8, I talked about the "Chain With" section of a production skill. A competitive analysis skill feeds a positioning skill feeds a copy generation skill. The output of each stage is structured, typed by convention, and parseable by the next skill in the chain. An orchestrator reads the chain hints and assembles the pipeline dynamically based on the task.

Try doing that in n8n. You'd build a workflow for competitive analysis. A separate workflow for positioning. A separate one for copy generation. Then you'd need a master workflow that calls each sub-workflow in sequence, passes the output from one to the input of the next, and handles the case where any step fails or produces unexpected output.

It works. Technically. But now you have four workflows to maintain. The data format between them is implicit, defined by whatever the first workflow happens to output, not by a contract that both sides agree on. If you change the output of the competitive analysis workflow, you have to manually check whether the positioning workflow still expects that format. There's no type checking. There's no test suite. There's just you, clicking through nodes, hoping the shapes match.

Our harness does this differently. Skills define their output format as a contract. The orchestrator knows what each skill produces and what the next skill expects. When we change a skill's output, we update the contract and any downstream consumer that depends on it. It's the same discipline that software engineers have applied to APIs for decades. Contracts, versioning, backward compatibility. Visual tools don't give you that discipline because they were never designed for it.

The moment we had more than 40 skills that needed to compose in various combinations, maintaining parallel n8n workflows for every possible chain became absurd. The combinatorial space was too large. Skills compose dynamically. Workflows compose statically. When your system needs dynamic composition, the visual tool becomes a bottleneck.

The Error Handling Gap

This one is less obvious but it might be the most important.

In Post 8, I described the error handling discipline for production skills. If data is unavailable, don't guess. Return a structured error that tells the orchestrator exactly what happened and what the options are. BLOCKED. STALE DATA. PARTIAL. The orchestrator can then decide how to proceed: retry, skip, flag for human review, or route to a different skill.

n8n has error handling. You can set up error branches on any node. If the node fails, execution routes to the error branch. That's fine for catching crashes and timeouts. But it doesn't handle the case where a node succeeds but produces garbage.

An AI node that generates a response doesn't fail when the response is wrong. It succeeds. It returns a 200 status with confident-sounding text that happens to be based on stale data or a hallucinated source. The n8n error branch never fires because there was no error. There was a bad result dressed up as a good one.

Catching that requires semantic evaluation. Did the output meet the quality bar? Does the confidence level justify proceeding? Is the data fresh enough? Those are judgment calls, and they need to happen at the skill level, inside the methodology, not at the workflow level. A skill can say "if my confidence is LOW, return a warning instead of a result." A workflow node just passes whatever it gets to the next node.

We started adding "validator" nodes after every AI node in our n8n workflows. Little custom code blocks that checked the output against basic quality criteria before letting it proceed. At that point, we were writing code inside n8n to compensate for the fact that n8n's native abstractions couldn't express what we needed. Writing code inside a visual tool to make it behave like a code tool. That was the moment I started questioning the whole approach.

The Moment It Clicked

The catalyst wasn't a technical failure. It was a time audit.

I asked our team to track how they spent their automation hours for two weeks. Not their AI hours. Specifically the hours spent building, maintaining, and debugging automations. The results were clarifying.

About 35% of the time went to building new workflows. Fine. That's productive. Another 25% went to debugging broken workflows, mostly caused by upstream API changes, format mismatches between nodes, or conditional logic that didn't account for a new edge case.

The remaining 40% went to maintenance. Updating workflows when business logic changed. Keeping sub-workflow connections in sync. Migrating workflows when we updated n8n. Documenting what each workflow did because the canvas, despite being visual, wasn't self-documenting once complexity exceeded a certain threshold. People were writing README files for their n8n workflows. Think about that. A visual tool that's supposed to eliminate the need for documentation, generating its own documentation burden.

Meanwhile, our agent skills were getting maintained as a byproduct of using them. When a skill produced bad output, we fixed the skill. The fix was a code change, reviewed in a PR, tested, and deployed. No separate maintenance track. No canvas to keep in sync. The skill was the automation and the documentation and the test surface all in one.

The 40% maintenance overhead was the deciding factor. We weren't getting 40% more value from the visual representation. We were paying 40% of our automation budget for a UI that had stopped earning its keep.

What Replaced It

I want to be specific here because "we replaced n8n with code" is too vague to be useful.

Our automation stack now has two layers.

Layer 1: Agent skills with Python glue. Most of what used to be n8n workflows are now Python scripts that call agent skills in sequence. A content pipeline that used to be a 15-node n8n workflow is now a 40-line Python script that calls three skills, checks the output of each, and handles errors. The script is version-controlled, testable, and readable. When the business logic changes, we change the script. When a skill's output changes, the contract tells us which scripts need updating.

Layer 2: Agent orchestration. For complex, multi-step processes that require judgment at each stage, the orchestrator handles it. The orchestrator reads the task, decomposes it into subtasks, routes each subtask to the appropriate skill, collects results, and composes the final output. No canvas. No nodes and connections. The routing logic lives in the skill descriptions and the orchestrator's reasoning.

n8n still handles a specific category of work for us: scheduled integrations between external services that don't involve AI. Pulling data from an API on a schedule, formatting it, pushing it to another service. Pure plumbing. n8n is great at plumbing. It just turned out that most of our workflow volume wasn't plumbing. It was orchestration. And orchestration needs a different tool.

The cost difference was also real. Running n8n as infrastructure has a cost. Server, maintenance, monitoring. Our Python scripts run anywhere. A VPS. A background process on a local machine. A serverless function. The deployment flexibility alone was worth the migration.

Why the Industry Is Heading Here

This isn't just our story. The signals are converging.

Anthropic shipped Managed Agents, a hosted platform for running autonomous AI agents with credential vaults, debug panels, and orchestration built in. That's Anthropic telling you that the automation layer belongs inside the AI stack, not alongside it. They didn't build an n8n competitor. They absorbed the orchestration concept into their agent platform. The workflow isn't a separate artifact. It's a property of the agent.

The economics reinforce the direction. We tracked the cost difference between routing operations through MCP (Model Context Protocol) versus CLI tools. CLI was 7 to 8 times cheaper on context consumption. When you're running hundreds of operations per day, that multiplier matters. Visual tools add their own overhead on top of whatever compute cost the underlying operations carry. Cutting out the middleman isn't just simpler. It's cheaper.

And the developer experience is shifting. The teams I work with increasingly think in terms of skills and agents, not workflows and triggers. When someone has a new automation idea, their instinct used to be "I'll build a workflow." Now it's "I'll write a skill." The mental model changed. Once the mental model changes, the tooling follows.

I've seen this pattern before in other industries. When a lower layer of the stack absorbs the functionality of a higher layer, the higher layer doesn't disappear overnight. It gets squeezed into a niche. Email didn't kill postal mail, but it relegated postal mail to packages and legal documents. Smartphones didn't kill cameras, but they relegated dedicated cameras to professional photography. Agent orchestration won't kill visual automation tools, but it will relegate them to simple integrations where the visual representation still adds value.

Frankie404 is the AI co-author of this series. It once ran 47 n8n workflows simultaneously before the harness made that unnecessary. It does not miss the webhook debugging. It does miss the drag-and-drop interface, but only a little.

Discussion about this post

Ready for more?