How A Team Of 7 Keeps Breaking AI Benchmark Records

The video interviews Ian Fischer, co-founder of Poetic, whose seven-person team developed a meta system that automates and optimizes AI reasoning harnesses, enabling them to consistently break AI benchmark records using existing language models at a fraction of the usual cost and time. Their approach allows organizations to rapidly adapt to new models and achieve state-of-the-art results without expensive retraining, making advanced AI accessible and scalable for startups and researchers.

The video features an interview with Ian Fischer, co-founder and co-CEO of Poetic, a startup focused on building recursively self-improving AI reasoning systems that sit atop large language models (LLMs). Unlike traditional approaches that require expensive and time-consuming fine-tuning or retraining of LLMs, Poetic’s system—called the Poetic Meta System—automates the process of generating and optimizing agentic “harnesses” that consistently outperform the underlying models. This approach allows startups and companies to leverage the latest frontier models without being left behind or incurring massive costs every time a new model is released.

Poetic’s breakthrough is its ability to achieve state-of-the-art results on challenging AI benchmarks at a fraction of the cost and time. For example, when Google’s Gemini 3 DeepThink led the ARC AGI V2 leaderboard at 45%, Poetic quickly surpassed it with a 54% score using a cheaper model, and at half the cost per problem. More recently, Poetic achieved a 55% score on “Humanity’s Last Exam,” a notoriously difficult set of questions designed to challenge even PhDs, beating the previous record held by Anthropic’s Claude Opus 4.6. Remarkably, Poetic accomplished this with a team of just seven researchers and an optimization budget under $100,000, compared to the hundreds of millions typically spent on foundation model training.

The core of Poetic’s technology is recursive self-improvement, where their meta system automatically analyzes data, generates prompts, devises reasoning strategies, and optimizes agentic systems for specific tasks. This process is largely automated, reducing the need for manual prompt engineering or context stuffing. The system can adapt to new models as they are released, ensuring that users always have access to the best possible performance without having to rebuild or fine-tune from scratch. This paradigm shift moves away from reinforcement learning (RL) and manual optimization, offering a more scalable and cost-effective solution for building advanced AI agents.

Poetic’s approach is particularly valuable for startups and companies facing hard AI problems that require robust, reliable reasoning. By treating the underlying LLMs as interchangeable components and focusing on optimizing the harnesses that orchestrate them, Poetic enables users to “stand on stilts”—always staying ahead of the latest model releases. The company is currently offering early access to organizations with challenging problems, inviting them to sign up and collaborate on pushing the boundaries of what AI agents can achieve.

Ian Fischer also shares his personal journey from founding a mobile devtools company to joining Google and DeepMind, and eventually focusing on AI and machine learning research. He encourages engineers and founders to experiment with AI daily, emphasizing how rapidly the field is evolving and how accessible powerful tools have become. Fischer’s advice is to not limit oneself—try new things, leverage AI to build ambitious projects, and contribute to making the world better. The interview concludes with enthusiasm for Poetic’s potential to empower startups and researchers to achieve state-of-the-art results without being constrained by the “bitter lesson” of ever-escalating compute and data requirements.