New open Nano Banana, AI plays any video game, new top open source models, long videos: AI NEWS

This week’s AI news highlights major breakthroughs in open-source models and tools, including state-of-the-art language models, advanced video and image editing, 3D scene manipulation, and robotics, many of which now rival or surpass leading commercial solutions. Notable releases include powerful coding and reasoning models, infinite-length video generation, natural language image editing, 3D scene reconstruction, and Nvidia’s AI agent that can autonomously play thousands of video games.

This week in AI has seen a surge of groundbreaking open-source models and tools, many of which rival or even surpass leading closed models like Gemini 3 and GPT-5.2. Notably, two new state-of-the-art open-source models, Miniax M2.1 and GLM 4.7, have demonstrated exceptional abilities in agentic coding, multi-step reasoning, and data analysis, outperforming top commercial models on several benchmarks. These models are freely available for enterprise and research use, with Miniax M2.1 excelling in coding complex applications and GLM 4.7 showing strong results in competitive math and science tasks.

Video generation and editing have also made significant strides. Tools like Flashportrait from Alibaba enable infinite-length, highly consistent portrait animations, outperforming previous methods in both speed and quality. New AI models such as RICO allow for micro-editing of videos using text prompts, enabling users to replace, add, or stylize elements within a video seamlessly. Meanwhile, ByteDance’s StoryMem and Dream Montage introduce advanced memory and keyframe control, allowing for longer, more coherent, and customizable cinematic videos.

Image editing has become more accessible and powerful with the release of Quen ImageEdit 2511, an offline, open-source image editor that supports natural language prompts for complex edits. It integrates popular features like relighting and novel view synthesis, making it easier to adjust lighting, perspectives, and combine elements from multiple images. Generative refocusing tools now allow users to fix out-of-focus photos and adjust depth of field after the fact, further enhancing post-processing capabilities.

3D scene understanding and manipulation have also advanced. AI models like 3D Regen can reconstruct editable 3D scenes from a single photo, while MV Inverse predicts detailed physical properties of objects in a scene from one or more images. Nvidia’s Carry 4D reconstructs dynamic 3D scenes from video, tracking human-object interactions, which is particularly useful for robotics training. Additionally, Animate Any Character in Any World enables users to insert and animate 3D characters in various environments using simple text prompts.

Other notable developments include Nvidia’s Nitrogen, an AI agent capable of autonomously playing thousands of video games by mimicking human vision and actions, and IMC Cam, which can alter camera perspectives in existing videos while maintaining consistency. Robotics also saw progress with UniTree’s teleoperation demo, where a humanoid robot mirrors human movements in real time without bulky equipment. Collectively, these innovations highlight the rapid pace of AI development, with open-source tools increasingly matching or exceeding proprietary solutions in capability and accessibility.