The video explains OpenAI’s major shift from stateless REST APIs to stateful websockets, enabling persistent connections that drastically reduce bandwidth and speed up agentic AI workflows by maintaining session context server-side. The creator highlights how this technical change, though less flashy than new model releases, represents a foundational improvement that will significantly impact the efficiency and evolution of AI infrastructure.
The video discusses a significant recent change made by OpenAI to their APIs, shifting from traditional REST calls to websockets. The creator explains that while most of their videos are influenced by both personal interest and potential reach, this topic is covered purely out of passion for the technical details, even if it might not attract a large audience. The move to websockets is described as a massive shift, promising over 90% reduction in bandwidth and 20-30% speed improvements, which is particularly impactful for applications involving AI agents and complex tool calls.
To help viewers understand the importance of this change, the creator breaks down how context management and requests work with AI models. In current systems, every time a user interacts with an AI agent—especially when tool calls are involved—the entire conversation history (context) must be sent back to the API with each request. This stateless approach is highly inefficient, as it results in repeatedly transmitting large amounts of data, even when only a small update or tool call result is needed. The video clarifies common misconceptions about caching and compaction, emphasizing that caching only reduces compute time, not the amount of data sent, and compaction is a separate process that summarizes history but breaks cache efficiency.
The inefficiency of the stateless REST approach is further highlighted by explaining the architecture behind OpenAI’s APIs. Requests are routed through multiple API servers and GPUs, making it impossible to maintain session state between calls. As a result, every request must include the full context, leading to unnecessary bandwidth usage and processing overhead, especially when agentic workflows involve many rapid tool calls that each require the entire history to be resent.
Websockets offer a solution by enabling a persistent connection to the same API server throughout a session. This allows the server to maintain in-memory state, so only new information (such as the latest tool call result) needs to be sent, rather than the entire conversation history. This stateful approach dramatically reduces data transfer, processing time, and the complexity of managing cache and authentication checks for every request. The creator notes that while this change may not be as beneficial for simple chat apps, it is transformative for agentic use cases with frequent tool calls.
The video concludes by emphasizing the broader significance of this shift, noting that OpenAI’s adoption and open-sourcing of these standards will likely influence the entire industry. The creator expresses excitement about the opportunity to rethink and improve foundational aspects of AI infrastructure, highlighting how early and rapidly evolving the field still is. They encourage viewers to appreciate the importance of deep technical work and to stay engaged with these kinds of fundamental improvements, which can have a much greater impact than new model releases alone.