Google’s open-source AI, Claude Code leaked, new Wan, new Qwen, image gen on phone: AI NEWS

This week in AI saw major advancements including Google’s open-source Gemma 4 model supporting multimodal inputs on consumer devices, Netflix’s Void for seamless video object removal, and innovative tools for video and image generation like Generative World Renderer and Dreamlight. Significant progress was also made in voice synthesis with Muan’s LongCat Audio Dit, coding AI with the leaked Anthropic Claude Code and Alibaba’s Qwen models, alongside breakthroughs in 3D scene reconstruction, graphic design automation, and improved video consistency technologies.

This week in AI has been packed with groundbreaking releases and developments. Google unveiled Gemma 4, a powerful open-source AI model family that can run efficiently on consumer hardware, including phones and Raspberry Pi. Gemma 4 supports multimodal inputs—text, images, and audio—with large context windows and over 140 languages, making it highly versatile for real-time applications. Netflix also released Void, an open-source video editing model that can remove objects from videos seamlessly, filling in the background naturally, though it requires a high-end GPU to run locally.

In the realm of video and image generation, several innovative tools emerged. The Generative World Renderer allows users to restyle AAA game graphics by manipulating detailed scene data like depth and lighting. Another AI uses web search to enhance image generation accuracy by referencing real-world images before creating new ones. Token Dial offers fine-grained control over video attributes such as emotion, style, and motion through intuitive sliders. Additionally, Dreamlight by ByteDance is a tiny image generator and editor that runs offline on phones, producing images in seconds with various artistic styles, albeit with lower detail compared to larger models.

Voice synthesis and coding AI also saw significant advancements. Muan released LongCat Audio Dit, a top-tier text-to-speech and voice cloning tool capable of replicating voices with just seconds of audio, supporting multiple languages and running efficiently on consumer devices. Anthropic’s Claude Code, an AI coding assistant framework, was accidentally leaked, revealing internal features like a virtual pet and undercover mode, offering insights into how agentic coding tools are engineered. Meanwhile, Alibaba introduced Quen 3.5 Omni and Quen 3.6 Plus, state-of-the-art multimodal models excelling in video, audio, and text understanding, capable of coding from video inputs and supporting extensive context windows.

Other notable AI innovations include ZAI’s GLM 5V Turbo, a vision coding model that can analyze images and videos to generate functional apps and websites, outperforming many competitors. Alibaba also released One 2.7, a video generator with audio and an image generator/editor that excels at realistic face generation and precise color control. Google introduced VGGPO, a system improving video consistency by teaching diffusion models to understand 3D scene geometry, reducing jitter and warping in generated videos. LGTM offers high-resolution 3D scene reconstruction from few images, and Han X provides a detailed dataset for training humanoid robots on complex hand movements.

Finally, tools like PS Designer automate graphic design by generating layered Photoshop files from prompts, and hybrid memory models improve video world consistency by remembering objects even when they move out of view. Platforms like Hugging Face continue to integrate and support these models, making them accessible for developers and creators. With so many advancements across multimodal AI, coding assistants, voice synthesis, and video generation, this week highlights the rapid pace and breadth of innovation in AI technology.