The AI Hardware Podcast S2E4 // Groq, Etched, Taalas, SambaNova

In this episode of the AI Hardware Podcast, hosts Ian Katrris and Sallywood Foxton explore the competitive and rapidly evolving landscape of data center inference hardware for large language models, highlighting companies like Groq, Etched, U Chips, SambaNova, Taalas, and Posetron, each pursuing unique architectures and strategies to optimize AI inference performance and efficiency. They discuss innovations ranging from Groq’s upcoming 4nm chip with stacked DRAM to Etched’s transformer-specific SoC, as well as the challenges and opportunities faced by startups and established players in balancing specialization, scalability, and adaptability in a fast-changing market.

In this episode of the AI Hardware Podcast, hosts Ian Katrris and Sallywood Foxton discuss the evolving landscape of data center inference hardware, focusing on large language model (LLM) inference. They begin with Groq, a pioneering company known for its deterministic batch-one latency chip built on older GlobalFoundries 12nm technology. Despite using older silicon, Groq has maintained competitive performance and recently pivoted from selling chips to offering cloud-based AI inference services. The upcoming Groq Chip 2, expected to feature stacked DRAM and built on Samsung 4nm technology, is highly anticipated as it may address current memory limitations and improve efficiency.

Next, the conversation shifts to Etched, a stealthy startup betting heavily on transformer-specific inference hardware. Etched’s Sohoo chip claims orders of magnitude higher token processing speeds compared to Nvidia and AMD GPUs, though skepticism exists in the industry regarding these claims. Despite some critics labeling the company a “scam,” the hosts emphasize that Etched has real hardware and a serious team, though concrete performance data remains to be seen. The company’s highly specialized approach to transformer inference represents a high-risk, high-reward strategy in a rapidly evolving market.

The discussion then covers U Chips, a Taiwanese startup that initially focused on recommendation engine acceleration but has pivoted to LLM inference with their Raptor chip. This chip supports flexible 8-bit floating-point formats and targets lower power consumption (sub-75 watts), making it suitable for scalable deployment in data centers. While the technology is somewhat older, U Chips’ experience with memory bandwidth and quantization formats positions them well for future developments in inference hardware.

SambaNova and Taalas are also highlighted as significant players. SambaNova’s SN40L chip is a high-power, chiplet-based design with HBM memory, primarily targeting enterprise AI workloads with a focus on on-premise deployment and managed services rather than full cloud offerings. Taalas, founded by a former SambaNova CEO, is a stealthy startup rumored to be developing structured ASICs tailored to specific LLM models, aiming for extreme efficiency gains by baking model-specific data flows into silicon. However, questions remain about the feasibility and agility of such highly specialized hardware given the fast pace of model evolution.

Finally, the hosts introduce Posetron, a small startup founded by former Groq engineers, which currently offers FPGA-based appliances optimized for LLM inference with very high memory bandwidth utilization. Posetron plans to transition to ASICs in the future but faces challenges scaling from a small engineering team to a full-fledged hardware company. The episode concludes by emphasizing the intense competition and innovation in the data center inference space, with many companies exploring diverse architectures and business models to capture the growing AI workload market.