Harnesses in AI: A Deep Dive — Tejas Kumar, IBM

Tejas Kumar from IBM presents the concept of AI harnesses—engineering frameworks that surround and control AI models to ensure reliable, stable, and cost-effective agent behavior, demonstrated through a live coding example of building a browser-based agent. He emphasizes that harnesses enable the effective use of simpler models by adding guardrails, verification, and context management, predicting that 2026 will be the year of harnesses as they evolve to create more trustworthy and adaptable AI systems.

In this talk, Tejas Kumar from IBM introduces the concept of AI harnesses, emphasizing their importance in creating reliable and stable AI agents. He begins by explaining the motivation behind harnesses: since many AI models are rented as black boxes with limited control and cost constraints, harnesses serve to ground these models in a stable environment, ensuring consistent and trustworthy behavior. He distinguishes between two types of harnesses—those in machine learning, which are essentially test suites, and agent harnesses in AI engineering, which encompass the tools, context management, guardrails, loops, and verification steps that surround and support the AI model.

Tejas uses relatable analogies, such as mountain climbing and dog harnesses, to illustrate the purpose of a harness: to anchor and control, preventing the AI agent from “going off the rails” or incurring excessive costs. He then dives into the components of an agent harness, including a tool registry, model selection, context management, guardrails like maximum steps or tool calls, the agent loop, and verification mechanisms to ensure the agent’s outputs are valid and reliable. This framework helps transform black-box AI models into dependable agents capable of performing complex tasks.

The core of the presentation is a live coding demonstration where Tejas builds a simple AI harness from scratch. The task is to create a browser-based agent that upvotes the first post on Hacker News using an older GPT-3.5 Turbo model. Initially, the agent fails due to login issues and falsely reports success. Through incremental improvements, including adding guardrails to limit iterations and context size, implementing verification steps to detect failures accurately, and creating a login handler that programmatically manages authentication, Tejas shows how the harness stabilizes the agent’s behavior and ensures truthful reporting of outcomes.

Tejas highlights that the harness approach allows the use of cheaper, less capable models effectively by compensating with engineering controls around the model. He shares IBM’s enterprise-level open-source project, Open RAG, which uses harnesses to securely manage sensitive data queries, demonstrating the practical and scalable applications of harness engineering in real-world scenarios. The talk underscores that the harness is not about changing prompts but about building a robust infrastructure around the AI to improve reliability and control.

In conclusion, Tejas predicts that while 2025 was the year of agents, 2026 will be the year of harnesses, with the future potentially seeing dynamic, on-the-fly generated harnesses that adapt to tasks autonomously. Such advancements could represent a significant step toward more general and trustworthy AI systems. He encourages the audience to appreciate the power of harnesses in AI development and invites further discussion, providing resources for deeper exploration.