AI talks to dolphins, robot marathon, new deepfakes, o3 & o4-mini, new AI video model

artesia · 20 April 2025 02:30

This week in AI featured significant advancements, including Google’s Dolphin Gemma, an AI that communicates with dolphins, and the launch of new tools for character animation and video generation. Additionally, a humanoid robot marathon showcased robotics progress, while OpenAI introduced new models with enhanced STEM capabilities and multimodal functionalities.

artesia · 20 April 2025 02:50

This week in AI has been particularly eventful, featuring groundbreaking advancements such as an AI capable of communicating with dolphins, new open-source tools for image generation and character animation, and the first humanoid robot marathon. Google introduced Dolphin Gemma, an AI designed to analyze dolphin vocalizations in real-time and generate new sounds in dolphin language. This lightweight model can run on smartphones and aims to enhance our understanding of dolphin communication. Google plans to open-source Dolphin Gemma, allowing researchers to adapt it for other animal species.

Another significant development is the release of Uni Animate DIT, a plugin for the open-source video generator One 2.1. This tool allows users to animate characters by inputting a photo and a reference pose video, resulting in realistic animations that can even transfer intricate movements like hand gestures. The tool is accessible for free, and its GitHub repository provides instructions for local installation. Additionally, Tencent’s Instant Character tool enables users to generate images of characters in various scenarios while preserving their details, showcasing impressive accuracy in character transfer.

Nvidia unveiled Part Field, an AI that segments 3D models into different parts, enhancing the efficiency of 3D modeling and animation. This tool is noted for its accuracy and speed compared to other segmentation tools. Meanwhile, Alibaba released a new video generator that allows users to create videos by uploading start and end frames, providing greater control over video content. The AI can generate videos in real-time, making it a powerful tool for creators.

In robotics, a humanoid robot marathon took place in Beijing, featuring various companies competing with their robots. Notable participants included Uni Tree’s G1 robot and the Tien Gong Ultra, which impressively completed the marathon. This event highlights the advancements in robotics and the potential for future competitions, possibly leading to a dedicated sports league for humanoid robots.

Lastly, OpenAI launched two new models, 03 and 04 Mini, which excel in STEM-related tasks and visual perception. These models demonstrate significant improvements in competitive math and coding benchmarks compared to their predecessors. They also feature multimodal capabilities, allowing them to analyze text, audio, and images. The introduction of agentic tool use enables these models to autonomously select and utilize various tools for complex tasks, showcasing their advanced reasoning and problem-solving abilities. Overall, this week has seen remarkable strides in AI technology across various domains.