An Image Model You Can Run On an IPhone? Maybe not

Timothy Karenbat discusses the challenges of running high-quality image generation models locally due to their large size and resource demands, contrasting this with more efficient text-based AI models. He highlights promising advances from Prism ML’s compressed Flux 2 Klein model that enable image generation on devices like iPhones, but concludes that current compressed models still fall short in quality and consistency for practical use.

Timothy Karenbat, founder of Anything LLM, shares his perspective on image generation models, particularly focusing on their usability on local consumer hardware. He explains that Anything LLM prioritizes local-first AI models, allowing users to own their intelligence rather than relying on cloud services. While Timothy has experimented with image generation models like Gemini’s Nano Banana, his experience has been limited and often frustrating, especially when trying to run these models locally. He finds image generation tools complicated and not very productive for his typical use cases, which revolve more around text-based AI applications.

Timothy highlights the challenges with local image generation models, emphasizing their large file sizes (often 12-15 GB) and high memory requirements, which yield only mediocre outputs. Despite owning powerful hardware like an M4 Pro and Nvidia GPUs, he finds the iterative nature of image generation inefficient and resource-intensive, often likening it to heating up his room without satisfactory results. He contrasts this with text-based large language models (LLMs), which can be deployed in various sizes to fit hardware constraints and still produce usable outputs, whereas image models demand a much higher quality bar to be considered successful.

The video then introduces a new development from Prism ML, who have applied innovative quantization techniques—such as binary and ternary model versions—to the Flux 2 Klein image generation model. These compressed models drastically reduce the footprint from around 7.7 GB to as low as 1.2 GB, making it feasible to run image generation on devices like iPhones. Timothy tests the ternary version of the model on his local machine, noting that it runs quickly and produces decent images, though not without flaws. He appreciates the reduced memory usage but remains cautious about the overall quality and consistency of the outputs.

Timothy conducts a detailed comparison between the compressed Flux 2 Klein model running locally and the full-precision version running on a high-end Nvidia H100 GPU in the cloud. While the cloud-based model produces clearer, more accurate images—especially with complex elements like text and fine details—the local ternary model struggles with text rendering and sometimes generates blurry or incorrect images. Despite some promising results, Timothy concludes that the compressed model still falls short of delivering the quality needed for practical use, particularly for productivity tasks like creating thumbnails or slides with readable text.

In summary, Timothy acknowledges the promise of these new compressed image models for enabling local image generation on consumer hardware but remains skeptical about their current readiness for widespread use. He invites feedback and guidance from the community, expressing his ongoing struggle to find a reliable, efficient local image generation solution. Ultimately, he feels that while text-based AI models have matured significantly for local deployment, image generation remains an elusive challenge, with quality and resource demands still limiting its practicality on everyday devices.