AI co-scientist, AI for DNA, AI NPCs, open-source robots, new Qwen, new video editors: AI NEWS

This week in AI saw significant breakthroughs including ByteDance’s Lance multimodal model for image and video generation, Apple’s LTO 3D model generator, and Carbon’s open-source DNA foundation model, alongside advancements in AI-generated NPCs, realistic avatars, multilingual translation, and robotics. Additionally, innovations like Alibaba’s Qwen models, Stability AI’s music generator, Google DeepMind’s AI co-scientist, and flexible video generation systems underscore the rapid and diverse progress across creative, scientific, and practical AI applications.

This week in AI has seen remarkable advancements across multiple domains, showcasing the rapid evolution and expanding capabilities of artificial intelligence. ByteDance introduced Lance, a powerful 3-billion parameter unified multimodal model capable of generating and editing both images and videos. Lance supports text-to-video generation, video editing with sequential prompts, and strong visual and textual understanding, making it a versatile tool for creative workflows. Apple unveiled LTO, a 3D model generator that reconstructs objects with view-dependent visual fidelity, outperforming existing models by capturing how objects look from different angles. Additionally, Flash GPO emerged as a novel method to enhance video model quality efficiently by optimizing training steps, resulting in more realistic and detailed video outputs.

In the gaming and image generation space, Reactive GWM offers a breakthrough by enabling AI-generated NPCs with controllable high-level strategies, allowing for more interactive and dynamic game simulations. Meanwhile, L2P, an innovative image model, generates images directly in pixel space without relying on latent space compression, achieving superior quality and supporting resolutions up to 8K. On the biological front, Carbon, an open-source foundation model for DNA, processes massive DNA sequences rapidly, enabling tasks like sequence completion and protein structure prediction with unprecedented speed and efficiency, potentially democratizing genetic research.

Several practical AI tools also made headlines. Mtoan released Long Cat Video Avatar 1.5, a realistic avatar generator that animates talking avatars from images and audio, supporting multi-person interactions and various art styles. Mega ASR, a transcription model, excels at transcribing noisy, real-world audio with significantly lower error rates than competitors, making it invaluable for challenging acoustic environments. Tencent introduced HYMT2, a family of multilingual translation models designed to follow detailed instructions and preserve formatting across 33 languages, outperforming larger models in specialized domains like finance and law.

Robotics and video control technologies also advanced notably. Robot Plus+ showcased a magnetic wall-climbing industrial robot capable of performing complex tasks like welding and painting on vertical steel surfaces, operated remotely via VR. HuggingFace released an open-source 3D-printed humanoid robot platform aimed at making robotics research more accessible. Uni Tree Robotics demonstrated voice-controlled autonomous actions in their G1 robot, highlighting the future of intuitive robot interaction. Cog Omni Control presented a flexible video generation system that integrates multiple inputs like sketches, poses, and reference images to produce videos faithful to creative directions, though it is not yet publicly released.

Finally, other exciting developments include Alibaba’s latest Quen 3.7 Max model, which excels in agentic tasks and vision capabilities, and Quen 3.5 Live Translate, a real-time translation model that leverages visual context for improved accuracy. Stability AI launched Stable Audio 3, an open-source music generation model capable of producing long audio tracks and sound effects. Pano World offers a generative 3D panorama system for consistent virtual home tours based on floor plans. Fashion Chameleon by Alibaba enables real-time virtual try-ons in video, allowing seamless garment switching. Google DeepMind revealed the AI co-scientist, a multi-agent system designed to collaborate with researchers by generating hypotheses and proposing experiments, potentially revolutionizing scientific discovery. These innovations collectively highlight the dynamic and multifaceted progress in AI technologies this week.