This new AI video tool is so powerful! Free & offline

The video introduces Ditto, a free and open-source AI tool by Ant Group that enables users to edit videos using simple text prompts, showcasing impressive capabilities like style transfer and anime-to-realistic conversions. It also provides a detailed tutorial on installing and using Ditto locally via Comfy UI, highlighting its features, limitations, and encouraging viewers to experiment and engage with the community.

The video introduces Ditto, a powerful new free and open-source AI tool developed by Ant Group, a branch of Alibaba, which allows users to edit videos using simple text prompts. Ditto builds upon the previously released open-source video model called Juan but expands its capabilities by enabling direct video editing through textual descriptions. The presenter showcases various impressive examples, such as changing a character’s clothing, adding objects like a vintage street lamp, altering backgrounds, and even transforming anime scenes into realistic videos. This tool offers a user-friendly way to manipulate videos creatively and efficiently.

The video then dives into a detailed tutorial on how to install and run Ditto locally on a computer using Comfy UI, a popular platform for running open-source AI models for images, videos, and audio. The installation process involves downloading several large model files, including diffusion models, LoRA accelerators, VAE files, and text encoders, and placing them in the appropriate Comfy UI directories. The presenter explains how to load a pre-made workflow JSON file into Comfy UI, which simplifies the process by providing a ready-to-use interface for Ditto, avoiding the need to build nodes from scratch.

Once set up, users can select different Ditto models depending on their editing goals: a general global model for broad edits, a style transfer model for changing video styles, and a sim-to-real model for converting animated videos into realistic ones. The presenter demonstrates how to input prompts and upload videos, explaining key settings such as frame rate, output length, resolution, and generation parameters like step count, CFG scale, and seed. The use of a LoRA accelerator helps speed up generation, though it may slightly reduce output quality.

The presenter highlights some limitations of Ditto, such as its current inability to accurately transfer facial expressions and perform precise local edits like changing only a character’s clothing without affecting the entire scene. For facial expression transfer, another tool called OneVAE is recommended. Additionally, the local editing models are not yet publicly available but are expected in future releases. Despite these limitations, Ditto is praised for its ease of use, relatively low hardware requirements (minimum 11 GB VRAM), and impressive results, especially in style transfer and anime-to-realistic video conversion.

In conclusion, the video encourages viewers to try Ditto by following the installation steps and experimenting with different prompts and models. The presenter invites feedback and offers troubleshooting help in the comments. They also promote Epidemic Sound as a sponsor, highlighting its AI-powered music adaptation tool for creators. Finally, viewers are encouraged to subscribe to the presenter’s newsletter for ongoing updates on AI tools and news, emphasizing the rapid pace of innovation in the AI video editing space.