The video compares Nvidia’s RTX 5090 and AMD’s RX 7900 XTX GPUs running large language models locally, finding that while Nvidia offers faster token generation, AMD excels in prompt evaluation, power efficiency, and cost-effectiveness. Despite Nvidia’s superior raw speed, AMD’s larger VRAM and lower power consumption make it a strong choice for running large AI models within a more affordable budget.
In this video, the creator compares two flagship consumer GPUs: Nvidia’s RTX 5090 and AMD’s RX 7900 XTX, focusing on their performance running large language models locally. The RTX 5090 boasts 32 GB of GDDR7 memory and a high memory bandwidth of 1,700 GB/s, while the AMD RX 7900 XTX has 24 GB of RDNA3 memory with 960 GB/s bandwidth. Despite the Nvidia card being significantly more expensive (over $3,000) compared to AMD’s sub-$1,000 price, the video aims to determine if the price difference translates into meaningful performance gains for AI workloads.
The creator sets up two identical systems differing only in the GPU to run the same 4-billion parameter Gemma 3 model using LM Studio. Initial tests show the Nvidia RTX 5090 is much faster in token generation, producing roughly twice the tokens per second compared to the AMD card. However, AMD excels in prompt evaluation speed, consistently outperforming Nvidia in processing input prompts. Power consumption measurements reveal the AMD GPU draws about 440 watts, while the Nvidia card consumes around 550 watts, highlighting AMD’s efficiency advantage despite slower generation speeds.
Further testing with different backends like Vulkan and CUDA on Linux and Windows confirms Nvidia’s superior token generation speed, reaching up to 248 tokens per second on Linux, while AMD lags behind. Interestingly, AMD maintains faster prompt evaluation rates across platforms. The video also explores running a larger 20 GB Quen 2.5 coder model, where AMD again shows better prompt processing but slower token generation. Nvidia’s power draw spikes dramatically to over 770 watts during this test, emphasizing the trade-off between speed and energy consumption.
The creator then compares the AMD RX 7900 XTX with Nvidia’s RTX 5080, which shares the same 960 GB/s memory bandwidth but only has 16 GB of VRAM. The limited VRAM on the 5080 causes model spillover to CPU memory, drastically reducing performance for larger models like Quen 2.5 coder. AMD’s larger VRAM allows it to run these models more efficiently, resulting in significantly higher token generation speeds and better overall performance despite the lower memory bandwidth.
Finally, the video demonstrates connecting VS Code to the local LLM instance running on the AMD machine, showcasing real-time code generation and story writing with impressive responsiveness. The creator concludes that while Nvidia’s RTX 5090 offers superior raw speed, AMD’s RX 7900 XTX provides a compelling balance of performance, power efficiency, and cost-effectiveness, especially for users running large models that fit within its 24 GB VRAM. The video ends with a teaser for upcoming experiments and a recommendation to check out a related comparison between AMD’s 7900 XTX and 970 XT GPUs.