This week in AI news features major advancements including new video generation techniques from 3D models, powerful open-source tools from ByteDance and Tencent for video animation and 3D segmentation, and Nvidia’s AI for creating coherent 4D scenes. Alibaba leads with multiple cutting-edge multimodal AI models and tools, while other highlights include Google’s Gemini 2.5 updates, DeepSeek’s improved language model, and innovative releases like Omnihuman’s deepfake lipsync animator and Suno’s enhanced music AI.
This week in AI news, several groundbreaking advancements and releases have taken center stage. A new video generation method called “video from 3D” allows users to create videos by applying the texture and style of a reference image onto a moving 3D model, offering precise camera and object movement control. This approach outperforms similar tools by maintaining consistency and accuracy in details. Additionally, ByteDance introduced “Links,” an open-source video generator that can animate any reference photo with high-quality, expressive videos, including subtle facial expressions and complex actions, making it a powerful tool for creating personalized video content.
Tencent Hunan unveiled “Hunen 3D Part,” a 3D model segmentation tool that breaks down complex 3D models into meaningful parts using AI. It consists of two components: P3 SAM, which segments the model into parts, and XART, which reconstructs each segmented part into a complete shape for further editing. Nvidia released LRA, an AI capable of generating coherent 3D scenes from images or videos, including 4D videos that add the time dimension, useful for applications like autonomous driving. Both tools come with GitHub repositories for local use, though Nvidia’s LRA requires high-end GPUs.
DeepSeek upgraded its AI model to version 3.1 Terminus, showing significant improvements in language consistency and performance across various benchmarks, tying it with other leading open-source models. ByteDance also launched Omniinsert, a tool that allows users to insert characters or objects into existing videos seamlessly, outperforming previous similar tools in accuracy and consistency. Google quietly updated its Gemini 2.5 Flash and Flash Light models, enhancing instruction following, multimodal capabilities, and efficiency, now accessible for free on Google’s AI Studio platform.
Alibaba dominated the AI scene with multiple major releases. Their One 2.5 video generator supports native audio synchronization and offers free access on their platform. They also launched Quen 3 Max, a powerful text-based AI rivaling top models like GPT-5, and Quen 3 Omni, a versatile multimodal model handling text, images, audio, and video with impressive speed and accuracy. Alibaba’s Quen 3VL is currently the best vision-language model, excelling in image analysis, visual reasoning, and complex tasks like coding from sketches. They also released Quen ImageEdit, a free open-source image editor comparable to Nano Banana, and an upgraded Quen 3 Coder, a coding AI outperforming competitors like Claude.
Other notable updates include Kimmy’s new “Okay Computer” agent mode, which autonomously executes multi-step tasks like creating websites or reports, and OpenAI’s ChatGPT Pulse, a proactive personalized update feature for pro mobile users. Suno released version 5 of their music generation AI, offering more coherent and expressive vocals. Finally, Omnihuman 1.5, a state-of-the-art deepfake lipsync animator, is now available commercially, capable of animating images with audio-driven facial expressions and gestures. These developments highlight the rapid pace and diversity of AI innovations across video, language, robotics, and creative tools.