Gemini 2.5 Computer Use - NEW Free AI Browser Agent (+ Run Locally + API)

artesia · 11 October 2025 20:41

The video showcases Gemini 2.5’s new free AI browser agent capabilities, demonstrating its ability to autonomously interact with web interfaces, run locally via API, and perform tasks like playing games and browsing, all with low latency and ease of integration. It highlights practical setup instructions, current limitations to browser use, and positions Gemini 2.5 as a leading, accessible tool for developers to build advanced AI-driven browser agents.

artesia · 11 October 2025 21:05

The video introduces Gemini 2.5’s new capability to use a browser, highlighting that it can be accessed for free, integrated into applications, and even run locally on a personal computer. The presenter explains that Gemini 2.5 is part of a growing family of models and jokes about when Gemini 3 might arrive. Instead of reading lengthy documentation, the video points out that an AI-generated summary is available, which is a practical feature. Gemini 2.5’s computer use model is accessible via an API and online platforms like Google AI Studio and Vert.Ex, enabling developers to build agents that interact with user interfaces efficiently and with low latency.

The video demonstrates Gemini 2.5’s browser capabilities using BrowserBase, an infrastructure platform for creating browser agents. The presenter shows a live demo where Gemini 2.5 plays the game 2048 by interacting with the browser interface, navigating, and making moves autonomously. The interface displays the prompts and actions Gemini takes, allowing users to observe and interact with the process without needing to write code. This visual insight into Gemini’s operation highlights its ability to process screen images, send them to the API, and receive instructions on what to do next, showcasing its advanced browser interaction skills.

For those interested in running Gemini 2.5 locally, the video guides viewers through the setup process using GitHub resources. The presenter explains how to copy and paste commands into the terminal, obtain an API key from Google AI Studio, and configure billing for API usage. While the browser-based version is free, using the API directly requires billing setup, with pricing details discussed, including token costs. The video also shows how to run a Python script that opens a Chromium browser and performs tasks like searching “hello world” on Google, demonstrating Gemini’s practical application on a local machine.

The presenter tests Gemini 2.5’s ability to control the local computer beyond the browser, such as opening TextEdit and writing a Python file. However, Gemini 2.5 is currently limited to browser use and cannot directly interact with other local applications due to safety restrictions. Despite this limitation, the browser-based Gemini is praised as the best way to experiment with the model, offering a free and interactive experience. The video encourages viewers to try it out and explore its potential for building specialized browser agents or integrating AI-driven browser control into their projects.

In conclusion, the video positions Gemini 2.5 as a state-of-the-art browser use AI model that outperforms competitors like Claude 4.5 and OpenAI Operator in benchmarks. While it is still in preview and has some limitations, its speed, accessibility, and ease of use make it a compelling tool for developers and enthusiasts. The presenter invites viewers to share their thoughts on whether Gemini 2.5 represents the next big step in browser AI or if it is arriving late to the party. Overall, the video provides a comprehensive overview, live demonstrations, and practical setup instructions for leveraging Gemini 2.5’s new browser capabilities.