A new KING of open-source image generation? AuraFlow deep dive

The video explores the newly released open-source image generation model AuraFlow, comparing its performance with Stable Diffusion 3 and XL using complex prompts. While AuraFlow shows promise and improvement in certain areas, it struggles with prompt adherence and realism compared to the more established Stable Diffusion models.

The video discusses the newly released open-source image generation model called AuraFlow, currently in version 0.1. The presenter shares initial impressions, comparing AuraFlow with the latest Stable Diffusion models (Stable Diffusion 3 and Stable Diffusion XL) to determine which performs best. The intention is to assess the capabilities and limitations of AuraFlow in generating complex images based on detailed prompts, demonstrating its potential as a leading open-source image generator.

The presenter begins testing AuraFlow using a challenging prompt involving a zebra with rainbow stripes playing a grand piano on a mountaintop under the Northern Lights. The results are shown side by side with images generated by Stable Diffusion 3 and XL, highlighting that while AuraFlow performs reasonably well, Stable Diffusion 3 produces a more accurate representation of the prompt. The video emphasizes that Stable Diffusion 3 excels in following prompts, whereas AuraFlow struggles with certain aspects, such as generating the correct colors and materials specified in the prompt.

As the video progresses, the presenter tests other complex prompts like a steampunk robot and a polar bear on a tropical beach. The results reveal that AuraFlow is improving in following prompts, occasionally outperforming Stable Diffusion 3 in certain aspects. However, Stable Diffusion models demonstrate a better understanding of text generation, which is a known issue for AuraFlow. The presenter continues to highlight the strengths and weaknesses of each model while providing a side-by-side comparison of the generated images.

The video also discusses alternative platforms where users can try AuraFlow for free and highlights the importance of prompt quality for generating better images. The presenter showcases how different models perform with various prompts, emphasizing that while AuraFlow shows potential, it still has limitations in realism and prompt adherence compared to the more established Stable Diffusion models. The analysis includes a discussion of the training data and the impact of content filtering on the models’ ability to generate realistic human anatomy.

Finally, the presenter explains how to download and run AuraFlow locally using Comfy UI. This section includes a brief tutorial on setting up the model and generating images, showcasing how users can leverage AuraFlow without needing a powerful GPU. The video concludes by inviting viewers to share their thoughts on AuraFlow and its potential to replace Stable Diffusion as the leading open-source image generator, while also encouraging them to check out resources for AI tools and job opportunities in the field.

For comparison, here were my results from Dall-E 3:

(first attempt)

(second attempt with additional feedback)

It looks like @merefield is sharing results from DALL-E 3, comparing those to the outputs from the AuraFlow and Stable Diffusion models discussed in the earlier video. Would you like to explore any specific aspects of the images or the performance of DALL-E 3 compared to AuraFlow and Stable Diffusion?