This week in AI saw major breakthroughs, including real-time video and text-to-speech generation on consumer devices, the release of powerful new models like Alibaba’s Qwen 3.5 and Google’s Gemini 3.1 Pro, and advances in open-source tools for 3D worlds and brain-computer interfaces. Creative and robotics tools also improved, with high-resolution video and music generators, advanced image editing, and impressive robot demonstrations, reflecting rapid progress and wider accessibility in AI technology.
This week in AI has seen a flurry of groundbreaking releases and advancements across multiple domains. Notably, real-time video generation is now possible on consumer GPUs with Monarch RT, which can generate video at 16 frames per second on an RTX 5090. Similarly, Kitten TTS, a lightweight text-to-speech model, can run in real time on almost any device, including mobile phones, thanks to its tiny size (as small as 14 million parameters). These developments make high-quality AI capabilities more accessible to everyday users without the need for expensive hardware.
Major AI model releases have also made headlines. Alibaba introduced Qwen 3.5, a multimodal model with 397 billion parameters (17 billion active at a time), boasting a million-token context window and strong performance in reasoning, coding, and multimodal understanding. Google released Gemini 3.1 Pro, an incremental but powerful upgrade that tops independent leaderboards and is highly cost-effective. ByteDance continued its momentum with Seed 2.0, a large language model excelling in visual reasoning and agentic tasks, ranking highly on independent benchmarks.
Open-source innovation is thriving. Anchorwave, based on Genie 3, enables the creation of interactive 3D world videos where users can navigate scenes with keyboard controls, maintaining scene consistency and realism. Zuna Thought-to-Text is an open-source brain-computer interface model that denoises and reconstructs EEG data, laying the groundwork for future thought-to-text translation. Tiny A by Coher Labs is another open-weight model, supporting over 70 languages and running efficiently on consumer hardware.
AI-generated content continues to improve in quality and versatility. Louv can generate ultra-high-resolution videos (2K and 4K) with impressive detail, outperforming other open-source alternatives. Google’s Lyria 3 is a free music generator integrated into the Gemini platform, capable of creating songs in various styles from text prompts or images. Audio X is a unified model that generates audio and music from text, images, or video, supporting tasks like inpainting and extension, and is available as a small, downloadable model.
Robotics and creative tools are also advancing rapidly. The Unitree G1 robot showcased acrobatic feats at China’s Spring Festival Gala, demonstrating significant progress in balance, coordination, and swarm intelligence. VetoPix offers a novel approach to image editing by converting images into editable vector shapes, allowing precise manipulation of objects and colors. Meanwhile, new platforms like Higsfield are aggregating the best image and video generators, making it easier for creators to access cutting-edge tools in one place. Overall, this week’s developments highlight the accelerating pace and democratization of AI technology across hardware, software, and creative applications.