An endless demand for compute | Jonathan Ross, founder of Groq

Jonathan Ross, founder of Groq, explains how Groq’s unique kernel-free LPUs complement GPUs to efficiently handle low-latency AI inference tasks, enabling flexible, cost-effective hardware solutions that adapt to evolving AI architectures. He also highlights the exponential growth in AI compute demand driven by new applications, emphasizing the need for innovation, collaboration, and education focused on critical questioning to fully harness AI’s transformative potential.

Jonathan Ross, founder of Groq, discusses the evolving landscape of AI compute demand and hardware architecture. He highlights the unique kernel-free design of Groq’s LPUs (Lightweight Processing Units), which contrasts with traditional GPUs by enabling faster, more cost-effective inference without relying on external memory. This design allows Groq chips to efficiently handle small batch sizes and expert models, making them particularly well-suited for low-latency AI inference tasks. Ross emphasizes the importance of flexibility in hardware to adapt quickly to new AI architectures, noting that Groq’s architecture is designed to be easily programmable to accommodate future innovations beyond current transformer models.

Ross explains the synergy between Groq’s LPUs and GPUs, likening it to a logistics network combining 18-wheelers and delivery vans. GPUs handle large-scale parallel tasks like prefill operations in language models, while LPUs excel at the latency-sensitive decode phase, especially for mixture-of-experts models. This hybrid approach optimizes cost and performance, with Groq’s chips complementing GPUs to deliver better overall efficiency. He also touches on the Vera Rubin supercomputer, a system designed for inference workloads that integrates both Groq and Nvidia technologies to support agentic AI—AI systems that break tasks into parallel subtasks, leading to exponential growth in AI usage.

The conversation delves into the impact of AI on hardware design and software development. Ross notes that AI is already capable of generating efficient code, including kernels, and Groq’s simpler hardware architecture makes it easier for AI to program chips effectively. This lowers the barrier for hardware innovation, potentially leading to more players designing chips. However, he cautions that manufacturing and bringing hardware to market remains challenging and costly, requiring significant investment and reliability, which will likely limit the number of companies that can successfully produce chips at scale.

Ross also discusses the broader implications of AI-driven compute demand, referencing Jevons paradox: as AI becomes cheaper and more accessible, overall consumption of compute resources will increase exponentially. This is driven by AI’s ability to enable new applications and users, creating an endless demand for more compute power. He underscores that as long as there are unsolved problems in society—such as curing cancer or addressing aging—there will be a continuous need for more advanced AI and compute infrastructure, fueling ongoing innovation and investment in hardware.

Finally, Ross offers advice for the future, particularly in education. He suggests that the traditional focus on finding answers should shift toward teaching how to ask better questions, as AI excels at answering queries but relies on well-formed questions to be effective. Preparing the next generation to think critically and formulate insightful questions will be crucial in leveraging AI’s capabilities. The discussion closes with an acknowledgment of the transformative potential of AI in both hardware and software domains, emphasizing the importance of adaptability, collaboration, and continuous learning in this rapidly evolving field.