Dominic Kundal from OpenAI introduces GPT-OSS, a new family of open, locally runnable models with advanced reasoning and tool integration capabilities, designed to address privacy, latency, and offline use cases while matching or exceeding proprietary model performance. He demonstrates their flexibility through offline applications, fine-tuning examples, and seamless integration with the OpenAI ecosystem, highlighting resources available for developers to customize and deploy these models on local hardware.
In this presentation, Dominic Kundal from OpenAI introduces GPT-OSS, a new family of open models released in August. GPT-OSS consists of two models: GPT-OSS 20B, a medium-sized model that can run on high-end consumer hardware with at least 16 GB of VRAM, and GPT-OSS 12B, a larger model designed to run on a single 80 GB GPU or powerful laptops like the 128 GB MacBook Pro. These models were developed in response to user demand for open models that can run locally or on-premise, addressing needs related to data privacy, hardware constraints, latency, and offline use cases.
The GPT-OSS models are reasoning models with advanced capabilities such as variable levels of chain-of-thought reasoning and tool calling, including web browsing and Python code execution. They are the only open models currently supporting such tool integrations, enabling complex task completion through a series of tool calls. Licensed under the permissive Apache 2 license, these models can be used commercially or fine-tuned for specific applications. Performance-wise, GPT-OSS models match or exceed comparable proprietary models like GPT-4 Mini on benchmarks, while offering the unique advantage of running fully locally.
Dominic demonstrates how GPT-OSS can be integrated into applications using the agents SDK and various inference frameworks like Olama, Llama CPP, and LM Studio. He showcases a finance agent running entirely offline on a MacBook, which can access local files securely and perform complex tasks such as portfolio summarization using Python tool calls. The model’s ability to self-correct and reason through multi-step processes is highlighted, as well as its seamless integration with other OpenAI models like GPT-5 for specialized tasks such as generating visualizations, all while maintaining data privacy through local execution and input filtering.
The talk also covers fine-tuning GPT-OSS models to enhance performance on specific tasks. Dominic shares an example where they fine-tuned GPT-OSS 20B to play the game 2048 by training it with reinforcement learning techniques, resulting in a model that significantly outperforms the base version. This fine-tuning process is available on GitHub, allowing developers to customize the models for their own use cases. The demonstration emphasizes the flexibility of GPT-OSS to be adapted and improved for specialized domains or internal data while running on local hardware.
Finally, Dominic reveals that while the initial demos ran locally on his laptop, the more intensive fine-tuning and model runs were performed on a pre-production Nvidia DGX Station, a powerful computing system provided by Nvidia. He concludes by summarizing GPT-OSS as a versatile, high-performance open model solution that integrates smoothly with the OpenAI ecosystem, supports offline and on-premise use, and offers fine-tuning capabilities. Developers interested in exploring GPT-OSS further are encouraged to visit openai.com/openmodels for resources and code.