Near silent LLM Monster... NVIDIA, take notes

The video reviews the Framework Desktop featuring AMD’s Ryzen Max Plus 3950 and 128GB RAM, highlighting its near-silent operation and strong performance for local AI workloads, including large language models, with flexible GPU memory configurations and competitive CPU benchmarks. Despite some limitations like soldered RAM and network bottlenecks in clustering, it offers an affordable, highly compatible x86 platform with Linux and Windows support, making it a compelling choice for quiet, powerful AI development.

The video reviews the Framework Desktop, a new machine featuring AMD’s Ryzen Max Plus 3950 chip and 128GB of RAM, designed for local AI workloads such as large language models (LLMs) and image/video generation. While not the first or second in the market, the Framework Desktop impresses with its near-silent cooling system, comparable to the quietness of a Mac Studio. The presenter compares it to other powerful machines like the Asus Flow Z13 and GMK Tech Evo X2, noting that the Framework Desktop stands out for its whisper-quiet operation despite running demanding AI tasks.

The presenter conducts performance tests using the Quen 3 coder 30 billion parameter model with Q8 quantization, running on four identical Framework Desktop boards configured with different GPU memory modes (dynamic and fixed). Results show that the dynamic memory setting often yields better token generation speeds, especially for medium-length prompts, while fixed memory settings perform better on very large prompts. The integrated GPU (iGPU) on the AMD chip delivers impressive performance, rivaling Apple’s M4 series chips in some scenarios, with the Framework Desktop excelling in handling longer prompts due to its larger RAM capacity.

Comparisons with Apple’s M4 Pro and M4 Max chips reveal that while Apple’s hardware generally outperforms the Framework Desktop in speed and memory bandwidth, the Framework Desktop offers a more affordable and highly compatible x86 platform with Linux and Windows support. The presenter also highlights the flexibility of running various LLMs using different runtimes like Rockom and Vulcan, with Vulcan generally providing faster performance on this hardware. Despite some quirks and crashes with certain large models, the Framework Desktop proves capable of running very large models locally, which is notable for an iGPU-based system.

The video also discusses the CPU performance of the Ryzen Max Plus 3950, which scores competitively against Apple’s M4 Max in multi-core benchmarks. The presenter praises Fedora Linux for its stability on this hardware and explains the importance of GPU acceleration APIs like Rockom and Vulcan for efficient LLM inference. The Framework Desktop’s modular design allows users to customize cases, fans, and storage, although the RAM is soldered onto the board due to technical limitations, a trade-off that Framework justifies for improved performance and reliability.

Finally, the presenter explores the potential and limitations of clustering multiple Framework Desktop boards for larger AI workloads. While clustering can increase memory capacity, network bottlenecks limit performance gains, making single-node setups more practical for now. The video concludes by recommending the Framework Desktop as a powerful, quiet, and flexible machine for local AI development, with links to related cluster setup videos and a plug for ChatLM Teams, a platform integrating multiple LLMs for various tasks.