This week in AI saw major advancements including Google’s Gemini 3 dominating multiple AI benchmarks, Meta’s SAM 3 excelling in 3D object segmentation, and new open-source video models like Tencent’s Hunyan Video 1.5 pushing video generation quality. Additionally, innovative tools such as OpenAI’s GPT-Codex-Max for coding, Google’s Weather Next 2 for forecasting, and several powerful multimodal and 3D AI models were released, highlighting rapid progress across diverse AI domains.
This week in AI has been packed with groundbreaking releases and updates across various domains. One of the standout innovations is Depth Anything 3, an AI capable of generating detailed 3D maps from just a few images or videos, even in chaotic, high-action scenes. It outperforms other models in speed, accuracy, and coverage, and is accessible via an open-source GitHub repository. Alongside this, Meta introduced Segment Anything Model 3 (SAM 3), which excels at detecting, segmenting, and tracking objects in images and videos with remarkable speed and precision. SAM 3 also includes a specialized 3D model generator, SAM 3D, which can accurately reconstruct objects and human bodies in complex poses, outperforming competitors in multiple benchmarks.
In the realm of video generation, Tencent released Hunyan Video 1.5, a lightweight yet high-quality open-source model capable of generating videos up to 10 seconds long at 720p resolution, with an optional super-resolution enhancement to 1080p. It surpasses previous leading models in instruction following, visual quality, and motion stability. Another open-source video model, Kandinski 5, offers both video and image generation with multiple model sizes, though it has received less attention due to some quality inconsistencies in high-motion scenes. Both models come with comprehensive instructions and support for integration with popular tools like Comfy UI.
Google made significant strides with the launch of Gemini 3, a state-of-the-art AI model that dominates multiple leaderboards in text, vision, and coding tasks. Gemini 3 Pro notably outperforms human experts in location guessing from images and medical image analysis. Complementing this, Google released Nano Banana Pro, the best image generator and editor currently available, capable of generating and editing images with exceptional detail and versatility. OpenAI also quietly introduced GPT-Codex-Max, their most advanced agentic coding model designed for complex, multi-step coding tasks, outperforming previous versions and even Gemini 3 Pro in certain benchmarks.
Other notable AI advancements include the proactive hearing assistant, an AI system that enhances conversational clarity by isolating desired voices in noisy environments, with its dataset and training code publicly available. Google DeepMind unveiled Weather Next 2, an efficient and highly accurate weather forecasting model that predicts weather outcomes up to the hour level, significantly faster than traditional physics-based models. Additionally, Google introduced Anti-Gravity, an agent-first IDE that allows teams of AI agents to autonomously work on codebases, featuring a built-in browser for live testing and debugging, currently available in public preview.
Finally, several open-source multimodal and 3D AI models were released. Dr. Tulu, an 8-billion-parameter deep research agent from the Allen Institute, rivals proprietary models in multi-step reasoning and evidence synthesis. PartXM LLM offers part-aware 3D model generation and editing with natural language interaction, though its code is not yet publicly available. UniO 2 Omni is a powerful open-source omnimodal model capable of understanding and generating text, images, audio, and video, though it requires substantial computational resources. These developments collectively showcase the rapid pace and diversity of AI innovation, with many tools already accessible for public use and experimentation.