I Plugged an RTX 5090 Into a Mac... and Didn’t Expect This

An open-source Mac OS driver called Tiny GPU now enables native Nvidia GPU support on Apple Silicon Macs via Thunderbolt, allowing users to run Nvidia GPUs like the RTX 5090 without virtual machines or hacks. While initial performance is underwhelming compared to optimized Metal-based solutions due to software inefficiencies, this breakthrough marks a significant step toward leveraging Nvidia hardware on Macs and is expected to improve with further development.

For the first time since 2019, Nvidia GPUs can be run natively on Macs thanks to an open-source driver developed by Tiny Corp. This driver, called Tiny GPU, is a Mac OS kernel extension that communicates directly with Nvidia and AMD GPUs over Thunderbolt, eliminating the need for virtual machines or hacks. The setup is straightforward: users plug their GPU into a Mac’s Thunderbolt port, enable the system extension, and install Docker Desktop to handle necessary compilers. This breakthrough comes after Apple dropped Nvidia support in 2018, leaving Mac users without native Nvidia GPU compute capabilities for seven years.

The presenter tested this new setup using a Mac Mini with an Apple M4 Pro chip and 64GB of memory, experimenting with three Nvidia GPUs: the RTX 5060 Ti, 5070 Ti, and 5090. Initial benchmarks using matrix multiplication showed respectable performance, with the 5060 Ti achieving 22.7 teraflops, though the M4 Pro’s internal GPU still outperformed it slightly. Power delivery was a challenge for the more powerful GPUs, leading to the purchase of a Razer Thunderbolt 5 enclosure with an external power supply, which allowed full use of the 12-pin Nvidia power connector and stable operation.

Performance testing with language models revealed mixed results. The RTX 5090, despite its high theoretical power and 32GB of VRAM, delivered token generation speeds that were only modestly better than the 5070 Ti and significantly slower than expected given its specs. The external Nvidia GPUs outperformed the Mac’s internal Metal GPU in token generation speed, but the overall throughput and memory bandwidth were far below the GPUs’ potential. Larger models could run on the 5090 thanks to its increased VRAM, but performance remained underwhelming compared to expectations.

A direct comparison with Llama CPP, a highly optimized Metal-based inference engine, showed that Llama CPP was 10 to 18 times faster than Tiny’s Nvidia implementation, with much quicker time to first token. This discrepancy is attributed to years of optimization and hand-tuned kernels in Llama CPP, whereas Tiny’s kernels are autogenerated and not yet performance-optimized. Despite this, the significance of Tiny’s achievement lies in creating a fully functional Nvidia GPU driver for Mac OS from scratch, enabling Nvidia GPU compute on Apple Silicon Macs over Thunderbolt for the first time in years.

The presenter concludes that Thunderbolt itself is not a bottleneck for LLM inference since model weights are loaded once into GPU VRAM, and token generation happens entirely on the GPU. The current performance limitations stem from software inefficiencies in Tiny’s kernel implementations, which are expected to improve over time. Ultimately, while Nvidia GPUs on Macs are not yet competitive with the best Metal-based solutions, the breakthrough driver opens new possibilities for Mac users wanting to leverage Nvidia hardware, marking a significant milestone in Mac GPU support.