This week saw major advancements in AI, including Alibaba’s Mimo body-swapping tool, ByteDance’s new video generation models, and the introduction of Meta’s Llama 3.2 and Orion AR glasses. Additionally, researchers at Harvard Medical School developed an AI model to repurpose drugs for rare diseases, while OpenAI faces significant organizational changes.
This week has been monumental in the AI landscape, with several groundbreaking developments announced. Alibaba introduced a revolutionary AI body-swapping tool called Mimo, which allows users to replace any person in a video using just a single photo reference. This tool simplifies the previously complex process of body swapping that required multiple cameras and motion capture technology. Mimo can handle high-action scenes and complex backgrounds effectively, showcasing its potential for creative applications in video production.
In addition to Mimo, ByteDance, the parent company of TikTok, unveiled two new AI video generation models named Seaweed and Pixel Dance 1.4. These models utilize a diffusion Transformer architecture, enabling them to create videos with controls for zooming, panning, and rotation, while maintaining character consistency across different clips. Pixel Dance can generate 10-second videos, while Seaweed can produce up to 30 seconds, although they are not yet available for public use.
The AI image generation space also saw significant competition with the emergence of two new models, Blueberry 0o and Blueberry 1, which reportedly outperform the current leader, Flux 1 Pro. These models were tested in a blind format, making it difficult to manipulate results, and they achieved higher scores than Flux in user evaluations. Speculation surrounds the origins of these models, with some suggesting they could be linked to OpenAI’s upcoming developments.
Meta made headlines with their Connect 2024 conference, where they introduced their new AR glasses, Orion, and the Llama 3.2 model. The Orion glasses feature advanced controls, including eye tracking and hand gestures, and are designed to be more user-friendly than competitors. Llama 3.2, while not a significant upgrade in terms of language processing, now includes vision capabilities, allowing users to upload images for analysis. Meta also announced a new voice feature that allows users to select from various AI voices, including celebrity voices, although its capabilities may be limited compared to OpenAI’s advanced voice feature.
Finally, researchers at Harvard Medical School developed an AI model called TXGN, aimed at repurposing existing drugs to treat rare and neglected diseases. This model has shown promising results, identifying potential drug candidates from a large database and outperforming other leading AI tools in accuracy. The model’s unique approach connects well-understood diseases with rare conditions, offering hope for patients with limited treatment options. Meanwhile, OpenAI is undergoing significant changes, with key personnel departures and a potential shift from a nonprofit to a for-profit structure, raising questions about the future direction of the organization.