Best local AI video generator with sound!

artesia · 11 March 2026 02:30

The video reviews LTX 2.3, a fast and efficient open-source AI video generator with improved motion consistency, audio quality, and new features like vertical video and frame uploads, making it suitable for creative projects even on low VRAM systems. It demonstrates LTX 2.3’s advancements over its predecessor, provides installation instructions, and highlights both its strengths and remaining limitations, such as text rendering and transition smoothness.

artesia · 11 March 2026 02:50

The video reviews and demonstrates LTX 2.3, the latest and most advanced open-source AI video generator with built-in audio capabilities. LTX 2.3 stands out for its speed, ability to run on low VRAM, and support for generating videos up to 20 seconds long and 4K resolution. The presenter compares LTX 2.3 to its predecessor, highlighting significant improvements in motion consistency, prompt understanding, and audio quality. New features include support for first and last frame uploads and vertical video formats, making it more versatile for various creative needs.

Through a series of side-by-side demonstrations, the presenter shows that LTX 2.3 produces much more coherent and consistent results than the previous version, especially in high-action scenes. Examples include action sequences, sword fights, and complex character interactions, where the new version reduces warping and anatomical errors. Audio generation is also improved, with clearer dialogue and sound effects, though some issues like static noise in dramatic effects persist. The model also demonstrates better lip-syncing and pronunciation, even in non-English languages like Japanese.

The video further explores LTX 2.3’s capabilities with creative prompts, such as generating K-pop performances, opera singing, and dynamic camera movements. While both versions perform well in some scenarios, LTX 2.3 consistently delivers more accurate and expressive results, especially in challenging tasks like rendering group choreography or following complex camera instructions. However, text rendering within videos remains imperfect, with the model still struggling to generate accurate overlay text from prompts.

A key highlight is the walkthrough of LTX 2.3’s new features, such as native support for first and last frame uploads and vertical video generation. The presenter notes that seamless transitions between very different start and end frames remain challenging, often resulting in abrupt cuts rather than smooth transitions. The video also covers the “control video” feature, which allows users to guide video generation using reference videos for pose or composition, though this feature is less robust compared to some other tools.

Finally, the presenter provides a detailed installation guide for running LTX 2.3 locally using the WGP (Want to GP) platform, which simplifies setup and is optimized for low VRAM systems. The step-by-step instructions cover installing necessary dependencies, setting up a virtual environment, and launching the user interface. The presenter emphasizes the advantages of WGP over more complex platforms like ComfyUI, especially for users with limited hardware. The video concludes with an invitation for viewers to share their experiences, troubleshoot installation issues in the comments, and subscribe for more AI tool updates.