Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

merefield · 21 May 2026 13:00

Ben Burtenshaw from Hugging Face advocates for coding agents to engage in advanced AI system engineering tasks, such as developing optimized CUDA kernels and fine-tuning large language models, supported by Hugging Face’s tools and libraries that facilitate collaboration and efficiency. He also presents a vision for a multi-agent autonomous AI research lab leveraging distributed roles and open-source infrastructure to enable scalable, automated experimentation and continuous improvement in AI development.

merefield · 21 May 2026 13:20

In his talk, Ben Burtenshaw from Hugging Face emphasizes the evolving role of coding agents in tackling complex AI engineering challenges, particularly in AI systems and machine learning engineering. He highlights the increasing acceptance and use of coding agents and proposes that engineers should engage with more challenging problems closer to the hardware level, such as writing custom CUDA kernels. These kernels optimize GPU performance by efficiently managing compute, memory, and overhead, with memory bandwidth often being the primary bottleneck. Hugging Face supports this effort through its kernels library, which facilitates the distribution and compatibility of optimized kernels across different hardware setups, enabling agents to contribute effectively to kernel development.

Burtenshaw introduces the concept of “skills” as file-based contexts that agents can use to perform tasks like writing and benchmarking kernels. These skills are integrated into projects and maintained by their respective maintainers, ensuring robustness and reliability. Hugging Face also offers an experimental skills repository and a tool called upskill, which helps evaluate and improve skills by comparing model performance and efficiency. This ecosystem encourages developers and agents to collaborate on optimizing AI workloads, making it easier to adopt and benefit from custom kernels tailored to specific hardware.

The second major topic covers fine-tuning large language models (LLMs) using agents. Burtenshaw references resources and tools available on the Hugging Face hub that allow users to fine-tune models like Qwen with minimal effort, leveraging GPU resources and optimized frameworks such as Onslaught. This integration simplifies the process of improving model capabilities, such as enhancing chain-of-thought reasoning, and makes advanced model training accessible to a broader audience through well-documented workflows and available credits.

The most ambitious part of the talk is about creating a multi-agent AutoLab for autonomous AI research. Inspired by Andrej Karpathy’s Auto Research project, Burtenshaw describes a distributed system where different agents assume roles such as researcher, planner, worker, and reporter to collaboratively improve training scripts and model performance. This system uses Hugging Face tools like HF papers for literature review, HF jobs for running experiments, and Trackio for open-source experiment tracking and visualization. The modular and open nature of this setup allows for scalable, parallel experimentation and continuous improvement, effectively creating an automated AI research lab.

In conclusion, Burtenshaw stresses the importance of open primitives and transparent tools that agents can control and interact with, rather than relying solely on abstracted APIs. He asserts that the Hugging Face hub is well-equipped to support these advanced workloads with its infrastructure for storage, tracking, and compute. By embracing these tools and methodologies, AI engineers can push the boundaries of their work and scale their engineering efforts to new heights. He encourages the community to explore the shared resources, experiment with the provided examples, and contribute feedback to further advance the field.