FINALLY! Open-source AI video with audio!

artesia · 9 October 2025 02:08

The video introduces OVI, the first open-source AI video generator with integrated audio capabilities, allowing users to create dynamic, multi-character videos with dialogue, sound effects, and background audio from text prompts, all runnable locally for free. It also provides a detailed tutorial on installing and using OVI via Comfy UI, highlighting its features, system requirements, and potential for future improvements despite current quality limitations compared to proprietary models.

artesia · 9 October 2025 02:33

The video introduces OVI, the first open-source AI video generator with built-in audio capabilities, allowing users to create videos with dialogue, background sounds, and sound effects simply by providing a text prompt. Unlike other closed-source models like V3 or Sora 2, OVI is completely free and can be run offline on your own computer. The system supports detailed prompt formatting, enabling users to specify dialogue, voice characteristics, background audio, character actions, and emotions, making the generated videos more dynamic and expressive. It also supports multiple characters, languages, and even singing, showcasing its versatility.

The presenter demonstrates how OVI can animate not only facial expressions but also full-body movements, including hand gestures, to match the context of the dialogue. Users can input sequences of actions and dialogue for characters, allowing for complex scene generation. Additionally, OVI supports image-to-video generation, where users can upload a reference image as the starting frame, a feature not available in some other models when using realistic human photos. The model can also generate ambient sounds like rain and various sound effects based on the video context, enhancing the overall audiovisual experience.

For those without powerful GPUs, the video explains several online platforms where OVI can be tested, such as waves.ai, foul.ai, replicate, and Hugging Face, though these services charge per generation. However, the main focus is on running OVI locally using Comfy UI, a popular interface for open-source AI generators. The tutorial walks through the installation process, including cloning the necessary repositories, installing dependencies, and downloading large model files like text encoders and VAE files. The presenter emphasizes the importance of having a CUDA GPU with at least 16 GB of VRAM, with 24 GB recommended, and explains how Comfy UI can offload memory to system RAM to accommodate lower VRAM GPUs.

The video then details how to load and configure the OVI workflow within Comfy UI, including selecting the appropriate model versions based on VRAM, setting CPU offload options, and entering prompts with the correct tags for dialogue and audio. Users can customize video resolution, frame rate, seed values for reproducibility, and sampling steps to balance quality and speed. The workflow also supports negative prompts to exclude unwanted elements from the video or audio. The presenter demonstrates both text-to-video and image-to-video generation, showing the capabilities and current limitations of OVI, noting that while the quality is not yet on par with closed-source alternatives, the open-source nature allows for customization and future improvements.

In conclusion, the video highlights OVI as a groundbreaking open-source tool for AI video generation with audio, offering a free and flexible alternative to proprietary models. Although the current quality may lag behind commercial options, the community-driven development promises rapid enhancements, including potential model compression and customization through LoRAs. The presenter encourages viewers to try OVI, share feedback, and seek help with installation issues in the comments. They also promote their newsletter for ongoing AI updates and invite viewers to like, share, and subscribe for more content on emerging AI technologies.