Gemini 3 just got *scary* good

Gemini 3 is a major advancement in AI, outperforming competitors across diverse benchmarks including business simulations, academic exams, coding, and multimodal tasks, while offering scalable options like the Deep Think variant for higher accuracy. Integrated into various Google platforms and paired with new tools like the Google Anti-gravity agentic development platform, it promises to revolutionize AI-driven applications and developer capabilities.

Gemini 3 has been released and represents a significant leap forward in AI capabilities, far beyond a minor update. Available now in the Gemini app, AI Studio, and Vertex AI, it also features in Google’s AI mode in search, with access levels depending on subscription tiers like Google AI Pro and Ultra. Additionally, Google announced a new agentic development platform called Google Anti-gravity, which promises to enhance AI-driven applications. Gemini 3 Deep Think, a more advanced version, is currently limited to safety testers and Google AI Ultra subscribers.

One of the standout benchmarks for Gemini 3 is the Vending Bench 2, which tests AI models on their ability to autonomously run a simulated vending machine business over a long period. Gemini 3 Pro dramatically outperforms previous leaders like Claude and Grock 4, increasing its net worth from $500 to over $5,000 in simulation, showcasing its superior negotiation skills and strategic supplier management. In the competitive multi-agent Arena version of the benchmark, Gemini 3 Pro again dominates, pushing other models into negative returns and proving its prowess in competitive business environments.

Gemini 3 also excels in a variety of challenging academic and professional benchmarks. It leads in Humanity’s Last Exam with a 45.8% score, significantly ahead of competitors, and performs strongly in ARC AGI 2, GPQA Diamond (graduate-level science exams), and the AIME 2025 math exam. It sets a new standard for intelligence per dollar spent, offering top-tier accuracy at a lower cost compared to rivals. The Deep Think variant achieves even higher accuracy, albeit at a much greater expense, highlighting Gemini 3’s scalability and versatility.

In multimodal and specialized tasks, Gemini 3 continues to impress. It scores highest in the MMU Pro multimodal university-level benchmark, excels in graphical user interface understanding with Screenspot Pro, and leads in video-based learning assessments. It also outperforms others in competitive programming (Life Code Bench) and terminal command evaluations. Its ability to handle large documents with up to one million tokens and produce extensive outputs of 64K tokens marks a significant advancement in context handling and information retrieval.

Overall, Gemini 3 is a powerful, versatile AI model that sets new performance standards across a wide range of tasks, from business simulations and academic exams to coding and multimodal understanding. Combined with Google’s new agentic tools and platforms like Anti-gravity, it opens exciting possibilities for developers and users alike. Early tests show exceptional coding capabilities, far surpassing previous versions, and the AI community eagerly anticipates the innovative applications that will emerge from this breakthrough technology.