I designed an AI engineer interview. Most developers failed

merefield · 17 March 2026 19:00

The speaker describes designing an AI engineer interview and being surprised that most candidates lacked practical experience with real AI systems, often failing basic tasks like generating structured outputs or handling errors effectively. They highlight a significant gap between AI industry hype and the hands-on skills needed for building robust AI applications, urging aspiring engineers to focus on practical areas like structured outputs, evaluation, RAG design, and observability.

merefield · 17 March 2026 19:21

Certainly! Here’s a five-paragraph summary of the video transcript:

The speaker, who previously worked at an AI startup as a full stack developer, data engineer, and product manager, describes their experience designing an interview process to hire their replacement. They reflect on the ambiguity and hype surrounding the “AI engineer” title, noting that most roles labeled as such are really about integrating AI into full stack applications rather than training models from scratch. The speaker prefers more precise titles like “LLM developer” or “applied AI engineer,” but acknowledges that job titles are often driven by marketing and industry trends.

During the interview process, the speaker was surprised by how few candidates had hands-on experience with real AI systems, despite many claiming expertise. The technical interview focused on practical skills, such as generating structured outputs from large language models (LLMs) instead of unstructured text, which is essential for building reliable AI-powered applications. Many candidates failed to demonstrate this knowledge, often suggesting less robust solutions like parsing raw text with regex instead of leveraging built-in structured output features.

Another key area of assessment was evaluating how candidates would detect and handle errors in AI systems. The best responses involved using evaluation datasets, curated prompts, and automated tools like “LLM as judge” to systematically test and monitor system performance. Observability tools such as Langsmith and Helone were also mentioned as important for tracing conversations and monitoring token usage, but few candidates demonstrated familiarity with these practices.

The interview process also revealed issues with candidates cheating by using AI tools to generate answers during interviews, which became apparent when they couldn’t explain their responses or answer follow-up questions in depth. The speaker emphasizes the importance of genuine, hands-on experience over theoretical knowledge or textbook answers, especially when it comes to complex topics like retrieval augmented generation (RAG) pipelines, chunking strategies, and metadata management.

A particularly surprising finding was that none of the candidates mentioned re-ranking retrieved documents in RAG systems, a simple but crucial technique for improving relevance in AI applications. The speaker concludes that there is a significant gap between AI hype and the practical skills needed to build robust AI products. They encourage aspiring AI engineers to focus on mastering structured outputs, evaluation, RAG design, and observability, as these skills are rare and highly valuable in the current job market.