Lemonade: Fast GenAI on Ryzen AI and Radeon

AMD is advancing its ROCm platform to provide seamless, high-performance support for Very Large Models (VLM) on AMD and SYNC GPUs, featuring significant improvements in quantization, stability, and hardware compatibility, including RDNA GPUs. They also support multimodal models through VM Omni integration and are fostering community engagement with developer programs, aiming for broad adoption and continuous enhancements aligned with the evolving AI ecosystem.

The presentation, led by Maddie, an AI solution architect at AMD, alongside Andy, focuses on AMD’s efforts to establish ROCm as a first-class platform within the VLM (Very Large Models) ecosystem. They emphasize the importance of making ROCm fully integrated and supported in the main VLM branch, eliminating the need for special knowledge, unofficial forks, or complicated builds. Their goal is to ensure that deploying and running high-performance inference on AMD and SYNC GPUs is seamless and reliable, matching the experience on any other platform.

A key achievement highlighted is the significant performance improvements in the latest ROCm release, particularly around quantization techniques such as FP8 and FP4, which enhance efficiency and speed. The update includes new high-performance kernels, optimized memory handling, and faster tensor loading, all contributing to higher throughput and better model execution. Additionally, ROCm now supports advanced model architectures and hardware features, including support for RDNA GPUs like the RX 7900 XTX, which broadens the range of AMD hardware compatible with VLM.

Stability and usability have also been major focuses, with AMD increasing test coverage dramatically—from 37% to 93% of VLM test groups passing on AMD’s continuous integration system within a few months. This progress means users can now confidently use pip install to get the latest VLM versions with ROCm support, without needing to build from source or rely on specialized containers. Official ROCm Docker containers are also available, providing a safe and standardized deployment environment that aligns with common VLM usage patterns.

The presentation also covers AMD’s support for VM Omni, an extension of VLM that handles multimodal models involving text, images, audio, and video. AMD has integrated ROCm support into VM Omni, including pre-configured hardware YAML files and Docker images, simplifying deployment and enabling developers to focus on building applications rather than configuring environments. This reflects AMD’s commitment to keeping pace with the rapidly evolving AI ecosystem and supporting new model types from day zero.

Looking ahead, AMD plans to continue enhancing ROCm with nightly builds, improved Python installation experiences, and further kernel optimizations. They aim to expand support for consumer Radeon GPUs and maintain day-zero support for new models like GPT-3.5 and Minimax 2.5. To foster community engagement, AMD has launched a developer program offering free GPU access, educational resources, and hardware giveaways, encouraging developers to adopt AMD’s AI platform and stay connected through events and training.