What Lies Beneath the API — Benjamin Cowen, Modal

Benjamin Cowen from Modal discusses the growing need for fine-tuning AI models as companies scale, highlighting how modern serverless platforms simplify this process by reducing infrastructure challenges and enabling faster, more accessible customization beyond basic API usage. He advises businesses to prepare by collecting quality data and developing evaluation metrics, noting that the transition to custom model training is becoming increasingly practical and imminent.

In this talk, Benjamin Cowen, a machine learning engineer at Modal, discusses an emerging trend in AI application development: the increasing shift towards fine-tuning models as companies and their products mature. He highlights that while frontier APIs have revolutionized rapid AI development by enabling fast and broad capabilities, they lack customization beyond prompt engineering. This limitation becomes apparent as startups scale or secure enterprise contracts with specific performance requirements, prompting a need for more tailored AI solutions.

Cowen explains that fine-tuning traditionally involves significant infrastructure challenges, such as managing large clusters and requiring specialized engineering resources. However, a new middle ground is emerging with cloud providers like Modal, which simplify the process by offering serverless compute platforms that allow developers to fine-tune models without the heavy overhead of managing infrastructure. This approach provides both algorithmic control and fast iteration cycles, making fine-tuning more accessible and practical for businesses aiming to optimize their AI models for specific tasks.

He emphasizes that many companies are already collecting the necessary data and developing evaluation metrics, which are critical prerequisites for successful fine-tuning. Cowen points out that if a product is differentiated and domain-specific, it is likely approaching the point where fine-tuning becomes beneficial. He advises companies to monitor signals such as high API costs relative to revenue, latency or throughput bottlenecks, and plateauing evaluation scores as indicators that it might be time to consider training custom models.

The talk also touches on the practical aspects of fine-tuning and reinforcement learning today, noting that modern open-source libraries and serverless platforms have drastically lowered the barrier to entry. Cowen shares that training algorithms can now be implemented in just a few hundred lines of code, and serverless infrastructure enables scalable, parallelized training workflows like hyperparameter tuning and reinforcement learning rollouts. This democratization of training technology allows teams without deep infrastructure expertise to experiment and iterate quickly.

Finally, Cowen stresses the importance of preparing for the transition to fine-tuning by collecting quality data and developing robust evaluation frameworks. He encourages developers to start thinking about when and how they might train their own models, suggesting that this moment could come sooner than expected—within months or a year rather than years. He invites interested parties to engage with Modal for support and highlights that serving custom-trained models at scale is also feasible with modern serverless solutions, completing the end-to-end AI development cycle.