Gpt-oss, Bringing Agentic Reasoning Models to AMD

merefield · 10 November 2025 18:00

Dominic from OpenAI introduces GPT-OSS, an open-source family of agentic reasoning models optimized for efficient local execution on AMD hardware and consumer devices, featuring advanced capabilities like chain-of-thought reasoning and tool calling while ensuring privacy and offline functionality. The collaboration between OpenAI and AMD has led to significant performance enhancements on AMD platforms, making GPT-OSS a powerful solution for privacy-conscious, latency-sensitive applications with broad compatibility and customizable deployment options.

merefield · 10 November 2025 18:23

In this presentation, Dominic from OpenAI introduces GPT-OSS, an open-source family of agentic reasoning models designed to run efficiently on a variety of hardware, including AMD GPUs and consumer devices. GPT-OSS consists of two models: the 12B model optimized for powerful data center GPUs like the AMD MI300X, and the 20B model designed to run on consumer hardware with 16GB of VRAM or unified memory. These models are permissively licensed under Apache 2, allowing commercial use and fine-tuning. They support advanced features such as chain-of-thought reasoning, tool calling including web browsing and Python execution, and configurable reasoning levels, making them highly capable for complex tasks while running locally without reliance on cloud services.

Dominic demonstrates the capabilities of GPT-OSS by running the 12B model locally on his laptop with 128GB of unified memory. He showcases how the model can perform reasoning and tool calling seamlessly, including Python code execution for complex calculations. He builds a rudimentary financial agent that processes local files without uploading data to the cloud, emphasizing privacy and offline functionality. The agent uses a local MCP server to access files and combines Python and chain-of-thought reasoning to answer queries about the user’s financial data. Additionally, Dominic illustrates how GPT-OSS can be integrated with other specialized models like GPT-5 to generate HTML dashboards, ensuring sensitive data remains local through input guardrails.

A key technical innovation discussed is the OpenAI Harmony format, a structured prompt format that enhances model performance by organizing messages with clear roles and instruction hierarchies. This format helps reduce prompt injection risks and supports complex interactions like chain-of-thought tool calling. OpenAI collaborated with various inference providers such as Olama, Lama CPP, LM Studio, VLM, and Hugging Face to ensure broad compatibility and optimized performance of GPT-OSS models across different platforms. The models are shipped in a quantized MXFP4 format to balance efficiency and intelligence, and variable reasoning levels allow users to tailor performance to their needs.

Chungrug from AMD then discusses the close collaboration between AMD and OpenAI to optimize GPT-OSS performance on AMD hardware. From day one, AMD provided optimized Docker images for multiple platforms, including Instinct devices, Radeon, and Ryzen products, ensuring users can easily deploy GPT-OSS models. AMD’s Raccoon team continuously works on optimizing key components such as MOE kernels, communication protocols between GPUs, and attention modules tailored to GPT-OSS architecture. These efforts have resulted in performance improvements of two to three times since the initial launch, with flagship devices like the MI355X delivering significantly higher tokens per second and better responsiveness compared to older GPUs.

In conclusion, GPT-OSS represents a significant step forward in open-source agentic reasoning models that combine advanced capabilities with efficient local execution. The collaboration between OpenAI and AMD ensures that these models run optimally on AMD hardware, providing users with powerful tools for privacy-conscious, offline, and latency-sensitive applications. Interested users are encouraged to explore GPT-OSS through various inference providers or by engaging with the OpenAI Harmony format for custom deployments. Resources and further information are available at openai.com/models, inviting developers to experiment and build innovative applications with these versatile models.