Understanding The AI Harness
How the infrastructure built around AI models influences their performance and why optimizing it is critical for getting the most out of existing models
The tech industry is incredibly interested in new model releases because as models become more capable, it becomes possible to build more powerful solutions. However, “smarter” models are not the only way to enhance an AI system’s capabilities. Its performance is significantly influenced by how the underlying models are implemented.
Two systems can use the same models and get very different results. The reason for this disparity is the infrastructure built around the model, which is called the harness. When we choose the right models for the job and optimize the architecture around these models, we can get much better results. A well-designed harness allows us to fully utilize a model’s “intelligence,” helping us maximize our AI system’s performance.
What Is A Harness
A harness is a program that wraps around an AI model, determining what to store, retrieve, and show to the model at each step. The model on its own is like a brain without a body. The harness allows it to use and apply its “intelligence.” There are several things it can’t do, such as maintaining state across interactions, executing operations, retrieving information, and so on. These are all harness-level features. For example, when someone is “chatting” with a model, the chat experience is possible because of the harness. It gives the model the ability to track back-and-forth messages and “remember” relevant information.
The harness can include several elements, such as system prompts, tools, skills, MCPs, memory systems, guardrails, and orchestration logic. The features and functionality built into the harness should be customized to an AI system’s use case. When the architecture is designed to take advantage of the underlying model’s strengths, you can amplify the system’s efficiency and effectiveness. For example, a better harness could allow you to get higher-quality results using existing models or substitute them with less expensive alternatives.
Conceptually, a harness is nothing new. People building AI products already focus on optimizing the systems around the model and equipping it with the necessary functionality. For example, they use prompt engineering to give the model better instructions and context, enhancing its outputs. Similarly, they add integrations to new systems to give the model greater access and control, expanding the scope of tasks it can execute. However, the “harness” framing is useful because it forces teams to think about what they need to build to fully utilize a model’s capabilities.
We can better appreciate the value of this subtle perspective shift by looking at AI agents. These systems need tools, abstractions, and internal structure to make progress toward high-level goals. None of this is embedded in the model itself. These capabilities have to be integrated into the harness to allow the agent to act across a broad range of complex tasks. OpenClaw is a great example of exactly how powerful an optimized harness can be. This agent uses existing models but packages multiple useful capabilities in a more cohesive way than anything before it. The innovation lies in the harness, rather than the model.

Agent = Model + Harness.
Many teams have likely not discovered the massive amount of performance they can unlock by improving their system’s harness. In many cases, they may not even need to use the latest models, fine-tune existing ones, or even train their own models to get better results. When they understand just how much influence the harness can have on what an AI system can accomplish, they no longer have to wait for more advanced models to get past performance bottlenecks. They can redesign the internal architecture to get better results, even with existing models.
There’s an enormous amount of value in what sits around the model, not just the model itself… You don’t even necessarily need to pick the single “best” model. What you need is to build the best harness for your domain.
— Jamin Ball, Partner at Altimeter Capital
What Is Harness Engineering And Why Does It Matter
Harness engineering is the process of building and refining the systems around the models to optimize performance. We can’t make informed technical decisions without recognizing how the systems that wrap around the underlying models impact the overall performance. The focus should be on optimizing the architecture around the model’s intelligence. Depending on what the product is supposed to do and what the team’s goals are, they may have to make different technical choices.
Harness engineering is a critical part of building AI products. Architectural differences are the reason why user outcomes vary between AI solutions, even when using the same models. For example, all of Anthropic’s products use the same underlying models, but a user performing a coding task may get different results with the Claude Chatbot, which is a general-purpose AI tool, compared to Claude Code, which is a specialized coding agent, because their harnesses are optimized for different purposes.
Product teams have to consider several elements when designing a harness:
Human-in-the-Loop Controls: Identifying the operations that require user review and confirmation. For example, a shopping agent may need user approval before proceeding to checkout.
Access and Permissions: Defining which systems the model is allowed to access and execute operations in. For example, a local coding agent should not be able to make system-level changes on the user’s device.
Tool Orchestration: Establishing which tools the model can use and how it can use those tools. For example, an agent given a proper hierarchy and description of tools is more likely to use the right ones, reducing unnecessary or incorrect tool calls.
Subagent Coordination: Ensuring the model can efficiently deploy and coordinate specialized subagents when completing tasks. For example, a research agent may use a subagent to look up relevant information and another one to review and analyze search results.
Prompt Management: Optimizing instructions for each task via a directory of relevant prompts. For example, a code review agent may need different instructions when refining UI changes vs. backend infrastructure changes.
When building AI systems (especially agents), product teams have to determine the features and functionality that need to be added on top of the model to enable desired behaviors. When the system fails to perform as expected, instead of “blaming” the model, teams should first examine the existing architecture. There may be issues preventing the model from applying its full reasoning capabilities (e.g., bad memory systems, insufficient tools, or poor system prompts). When teams address these by designing more effective harnesses, they increase the probability that the system will successfully complete tasks. Performance and reliability can also be improved by implementing feedback loops that allow the system to inspect and correct the outcomes of its actions.

How Does Harness Engineering Help Builders
A well-designed harness can unlock much greater performance.
When companies access models from providers, they are using the providers general-purpose harness and layering some additional functionality around that. While the performance they get out of this setup may be “fine,” the best harness may not be the default one. LangChain ran experiments to compare how Anthropic’s Opus 4.6 model performs in different harnesses and found that they got better results with other harnesses than in Claude Code. Researchers also found that changing the harness around a model can produce as much as a 6x performance improvement on the same benchmark.
The companies that build an optimized harness will be able to squeeze more performance out of their models. The resulting difference in capability can lead to better outcomes for both customers and the business. For example, Vercel improved its performance by removing 80% of its agents’ tools. Their redesigned harness led to better results, along with fewer tokens, faster responses, and higher success rates.
A well-designed harness can give teams more flexibility in which models they use.
An effective harness can allow the system to distribute tasks between models more efficiently, and it gives teams the option to run certain workflows on private infrastructure using local, open-source models. Together, these elements give teams much greater control over the system, allowing them to decouple dependencies on specific models. By efficiently routing tasks across different models, they can replicate the performance of a more advanced model and even achieve greater quality, speed, and efficiency.
A product’s value proposition should not be entirely tied to any specific model. If most products in a category primarily derive their value from using the latest models, none of them have a meaningful edge. The core differentiator should be how the models are implemented. When teams understand how a specific configuration influences the system’s performance, they can make better technical decisions to achieve the desired outcomes without compromising on quality or performance. When the system can effectively utilize a selection of models, it becomes much more efficient and resilient overall.
A well-designed harness can reduce costs without sacrificing performance.
Teams can address a core performance consideration for AI systems by redesigning their harness: token consumption. Unless they’re running models on their own infrastructure, companies are either directly or indirectly paying for tokens to run the underlying models. Model providers spend heavily on researching, developing, and running these models. Since these costs are substantial, there’s a natural pressure to sell as many tokens as possible. Therefore, the default harnesses they build around their models may be optimized for output quality rather than token efficiency.
Balancing cost with performance can be a real challenge for companies. Many simply accept high token costs, even though they can consume up to half of their IT spending, just to stay competitive in crowded markets. However, a less advanced model with a better harness can match or outperform a more advanced one, so optimizing the harness can help reduce spending by maximizing the utility of less advanced models, which may be more beneficial in the long run.
Why Harness Engineering Plays A Key Role In Context Management
Good context management can make AI systems effective at large and complex tasks. The harness is responsible for exposing the right context to the model at the right steps. As models become more capable, the scope of tasks they can handle expands. But even a more advanced model still needs context to decide what to do. Therefore, a core element of harness engineering is building a good delivery mechanism for context.
The harness must essentially give the model a map, directing it to the relevant context. For example, if a task requires information stored in the system’s long-term memory, the harness should direct the model toward the right details. When models are forced to process a large amount of context up front, they are likely to either miss key constraints or optimize for the wrong ones. So, the harness should ensure the model gets the context it needs without getting overwhelmed by irrelevant details.
Product teams have to steer the model towards the desired behavior to get the full benefit of their “intelligence.” Even as the models get more capable, they will still need explicit instructions on what to do, and the harness should surface these at the right time.
Why Harnesses May Remain Important As AI Improves
The AI industry has been racing to build more powerful models. There seems to be a persistent narrative that better models will lead to better results. So, it’s not surprising that many people believe that harness engineering won’t matter once models get “smarter.” However, even as model performance has improved and their capabilities have expanded, the scaffolding they need has evolved, but not gone away entirely.
Harnesses may become more important over time, for a few key reasons:
An increase in model capabilities can also increase potential modes of failure. Just because it can do more things, it won’t necessarily do these things right. We will still need systems to effectively direct the model’s “intelligence” to ensure good, consistent results.
Operating cost will always be a deciding factor for many AI systems. So, the harness may serve as a vital mechanism to optimize spending by efficiently routing simple tasks to cheaper models and complex tasks to more expensive ones.
As models take on more important work, ensuring reliability will be essential. Since their behavior is inherently non-deterministic, they will behave unpredictably even as they get more capable. A good harness can constrain those variations and improve overall reliability.
The leading AI companies seem to have reached the conclusion that harnesses are essential as well. The recent leak of Claude Code’s source code revealed that over 500,000 lines of code are responsible for all of its impressive capabilities. All this code is the harness built around the models, enabling the agent’s powerful coding capabilities. If the harness weren’t important, Anthropic wouldn’t be adding this much code. OpenAI also recently released a detailed post outlining how important harness engineering is in building software.
When the companies building the frontier AI models themselves are this focused on harnesses, it’s a clear sign that people using these models in their own products should also be doing the same.
Conclusion
When building AI products, we have to recognize that models are only one piece of the puzzle. They are certainly important, but the systems built around them can play a major role in either enhancing or diminishing the overall performance.
We have to consider both the choice of models and the design of the harness collectively when building and evaluating AI systems. This is how we can close the gap between what models can do and what they actually do inside a product.
A better model may have a higher capability ceiling, but your harness determines how close you actually get to hitting it. As models continue to improve, the harness will likely become a core mechanism for capitalizing on those improvements.
Thanks For Reading
If you haven’t already, please consider subscribing and sharing this newsletter with a friend. I hope you have a great week!


