Testing and building with Computer Use from OpenAI

The video showcases OpenAI’s new “computer use” feature, which allows the AI to interact with a local computer environment, demonstrated by having it open the Edge browser and search for NVIDIA graphics cards. The presenter highlights the feature’s adaptability and potential for various applications, encouraging viewers to explore its capabilities and download a working example for experimentation.

In the video, the presenter discusses the recent release of new tools by OpenAI, including web search, file search, and a feature called “computer use” designed for agentic use cases. The video focuses on demonstrating the computer use functionality, which allows the AI to interact with a local computer environment. The presenter has already implemented this feature and plans to showcase its capabilities by asking the AI to open the Edge browser and search for NVIDIA graphics cards on Google.

One notable aspect of the computer use feature is its ability to accept a 1080p resolution, which is convenient for users with that monitor setup. The presenter contrasts this with a previous tool from Anthropic, which had limitations on resolution and required conversions. The demonstration is conducted in a Windows environment, and the presenter emphasizes the importance of monitoring the script’s actions since it takes control of the local computer without a sandbox environment.

As the script runs, the AI successfully opens the Edge browser and navigates to Google, although it initially searches Bing instead. The presenter notes that the AI is capable of correcting itself, which indicates a level of reasoning and adaptability. Throughout the demonstration, the AI takes screenshots and sends them back to OpenAI for processing, allowing it to receive new instructions and continue executing tasks until completion.

The video also touches on the documentation provided by OpenAI for setting up the computer use feature. The presenter explains that users can create a local browsing environment using tools like Playwright or Selenium, or even set up a virtual machine with Docker. The process involves specifying the display resolution and environment, sending text input, and receiving suggested actions from the AI, which can include clicking, scrolling, or typing.

Finally, the presenter plans to reimplement the computer use functionality from scratch using the documentation as a guide. They express excitement about the potential of this feature, especially in constrained environments where the AI can control the cursor. The video concludes with the presenter encouraging viewers to explore the new capabilities and consider downloading the working example from their Patreon for further experimentation.