The Build Hour session introduced GPT-Realtime-2 models, showcasing their advanced voice-powered AI capabilities through live demos like voice-enabled shopping assistants and product analytics dashboards, highlighting improvements such as expanded context windows, parallel tool calls, and enhanced multilingual support. The event also featured insights from the Sierra team on deploying reliable, scalable voice AI in customer service, addressing real-world challenges and best practices, while providing developers with resources and encouraging community engagement.
The Build Hour session, hosted by Sarah Urbonus from OpenAI, introduced the newly released GPT-Realtime-2 models, focusing on empowering developers and companies to integrate advanced voice-powered AI capabilities into their products. Joined by Terry, a multimodal API product manager, and Erica, a solutions engineer, the session aimed to showcase practical applications of these models through live demos, including voice-powered search agents and product analytics dashboards. The event also featured insights from the Sierra team, who shared their experience building scalable, reliable voice AI agents for enterprise customer service.
The new GPT-Realtime-2 release includes three key models: a real-time translation model supporting over 70 input and 13 output languages with low latency; the GPT realtime whisper model for fast, accurate transcription across 80 languages; and the GPT realtime 2 model, which brings advanced reasoning and tool-calling capabilities to voice applications. Notable improvements include a fourfold increase in context window size to 128k tokens, parallel tool calls, enhanced domain vocabulary understanding, and controllable expressiveness, enabling more natural and intelligent voice interactions.
Erica demonstrated a voice-powered e-commerce shopping assistant that not only understands natural language queries but also operates the user interface by calling multiple tools in parallel, such as weather checks and product reviews. This showcases the model’s ability to reason across various data sources and maintain conversational context without constant verbal confirmation. Another demo highlighted a product analytics dashboard where the AI assistant helped a product manager investigate regional issues by filtering data, comparing metrics, and summarizing findings verbally, illustrating the model’s potential as an intelligent analyst in the loop.
The Sierra team provided a customer spotlight, explaining the challenges of deploying voice AI in real-world customer service environments. They emphasized the importance of reliability, policy adherence, and handling noisy, interrupted, and accented speech. Their production system layers additional infrastructure on top of GPT-Realtime-2, including custom voice activity detection, workflow orchestration, and compliance features, to ensure agents perform accurately and safely at scale. They also discussed evaluation methods focusing on end-to-end task success rather than just voice quality or transcription accuracy.
During the Q&A, the panel addressed common concerns such as handling interruptions, session length beyond one hour, and escalation strategies for complex reasoning tasks. They highlighted best practices like customizing voice activity detection, managing session state across calls, and dynamically injecting context to maintain decision consistency. The session concluded with resources for developers, announcements of upcoming build hours, and encouragement for community feedback to tailor future content, reinforcing OpenAI’s commitment to advancing accessible, intelligent voice AI technologies.