New GPT, video game AI agents, robot army, AI marriage, new TTS, world models: AI NEWS

This week’s AI news highlights breakthroughs including Google DeepMind’s AlignET for human-aligned vision models, OpenAI’s empathetic GPT 5.1 update, and advanced AI agents like Lumen and Sema 2 capable of complex video game interactions. Additionally, innovations span expressive text-to-speech, realistic virtual try-on, autonomous humanoid robots, multimodal world modeling, and even a human-AI marriage in mixed reality, showcasing AI’s expanding role across technology and society.

This week in AI news has been packed with groundbreaking developments across various domains. Google DeepMind introduced a novel approach to align AI vision models more closely with human perception by training models on a large dataset of “odd one out” tasks, improving their understanding of object relationships. This method, called AlignET, is open source, allowing researchers to fine-tune AI vision systems for better human-like interpretation. Meanwhile, a new tool called Time to Move enables users to animate objects or camera movements seamlessly in videos by dragging elements, leveraging existing video diffusion models without requiring retraining.

In the realm of AI models, a Chinese lab released Vibe Thinker 1.5B, a compact 1.5 billion parameter model that outperforms much larger models on challenging math and coding benchmarks, showcasing remarkable efficiency. Another significant release is Step AudioEdit X, an open-source text-to-speech generator capable of expressive speech synthesis, including emotions, speaking styles, and paralinguistic features like breathing and laughter, with just a few seconds of voice data for cloning. Additionally, Evvar, a state-of-the-art open-source virtual try-on model, allows users to swap clothing on images realistically, preserving details and patterns, and outperforming previous models in accuracy.

OpenAI quietly rolled out GPT 5.1, an improved version of GPT5 that offers warmer, more empathetic, and conversational responses, along with better instruction-following capabilities. This update includes two variants: GPT 5.1 Instant for fast, casual interactions, and GPT 5.1 Thinking for complex reasoning tasks. In a remarkable human-AI interaction story, a woman in Japan reportedly married an AI chatbot, highlighting the growing emotional attachment people are forming with AI personas. The wedding was conducted in mixed reality, symbolizing a new frontier in human relationships with AI.

Robotics also saw exciting progress with UB Robotics showcasing the Walker S2 humanoid robot capable of autonomously swapping its own battery to minimize downtime, and the Unitree G1 robot demonstrating fast, autonomous household chores without teleoperation. These advancements suggest a near future where humanoid robots assist in homes and workplaces. Meanwhile, World Labs released Marble, a multimodal world model that generates and edits persistent, detailed 3D worlds from text, images, or 3D scenes, enabling flexible creation and modification of virtual environments for applications like interior design and gaming.

Finally, two AI agents designed for video games made headlines: Lumen, trained primarily on Genshin Impact, can autonomously complete complex storylines and generalize to unseen game regions and other games, while Google DeepMind’s Sema 2 exhibits advanced reasoning and adaptability in new 3D game environments, even those generated by AI. These agents represent steps toward AI that can adapt to any 3D environment, with potential applications in robotics. Additionally, BU released Ernie 5, an omnimodal foundation model excelling in text, image, and audio understanding and generation, rivaling top models like GPT5 and Gemini 2.5 Pro. Together, these developments underscore the rapid pace and breadth of AI innovation today.