Benedikt Sanftl and Burak from Mutagent present the concept of the Agentic AI Engineer, which automates the traditionally manual offline and online loops of AI agent development—specification, evaluation, deployment monitoring, and diagnostics—to enhance scalability, efficiency, and continuous improvement. Their platform orchestrates specialized evaluator and diagnostic agents that integrate with existing tools, enabling a self-improving system that accelerates AI agent lifecycle management and reliability.
In the talk “The Agentic AI Engineer,” Benedikt Sanftl, CEO of Mutagent, and Burak, CTO, discuss the concept of agentic loops in AI agent development. They explain that building AI agents involves two main loops: an offline loop for iterative development, testing, and improvement, and an online loop for monitoring deployed agents, diagnosing issues, and feeding insights back into the optimization process. Traditionally, these loops have been manual and slow, with human review as a bottleneck, limiting scalability especially when managing multiple agents. The Agentic AI engineer concept aims to automate these loops, increasing throughput and efficiency in AI agent development.
Burak elaborates on the stages of the agentic loop, starting with the specification phase where the agent’s responsibilities, functions, and constraints are clearly defined. This spec acts as a blueprint for building the agent on any chosen platform, allowing flexibility to switch frameworks as needed due to the rapidly evolving agent ecosystem. After building, the agent undergoes evaluation through defined metrics and test cases, akin to unit testing in software development. This evaluation is crucial to verify agent functionality and to establish clear success criteria, which evolve over time based on user feedback and production data.
The evaluation phase benefits significantly from automation, as manual review of large datasets and agent traces is time-consuming and inefficient. Automated evaluator agents sift through data, providing actionable feedback and reducing human workload. Effective evaluations focus on the agent’s entire decision-making trajectory, checking context completeness and tool outputs, and use calibrated, binary criteria to minimize noise and provide clear guidance for improvements. This structured evaluation enables continuous improvement and reliable deployment decisions.
Once deployed, the online loop begins, where the agent’s performance is continuously monitored. Diagnostic agents analyze failure modes by clustering and root cause analysis of trace data, helping identify recurring issues without the need to manually review vast amounts of logs. This diagnostic process generates new evaluation criteria and improvement suggestions, which feed back into the offline loop for optimization. Over time, this creates a self-improving system where agents evolve based on real-world usage and data, enhancing reliability and performance.
Mutagent’s product embodies this agentic AI engineer approach, featuring an orchestrator that manages various specialized agents such as the evaluator and diagnostics agents. These agents integrate with existing development environments and data sources, automating the specification, evaluation, diagnosis, and optimization stages. The platform supports flexible deployment targets and aims to streamline the entire AI agent lifecycle, reducing manual effort and accelerating development. The presenters conclude by encouraging adoption of this approach to improve agent building efficiency and reliability, inviting further engagement with their solution.