There's Performance Left on The Table | Ian Interviews #45

merefield · 29 October 2025 12:25

The video highlights the inefficiencies in AI inference workloads caused by data orchestration bottlenecks, emphasizing the need for specialized hardware like New Reality’s NR1 and NR2 chips that manage data flow and networking to maximize GPU utilization. New Reality’s innovative solutions, including an AI hypervisor and modular chiplets, aim to optimize data movement and computation, offering scalable, low-latency performance improvements for large-scale AI systems.

merefield · 29 October 2025 12:48

The video discusses the current state and challenges of AI inference workloads, emphasizing that while GPUs and AI accelerators receive most of the attention for their computational power, a significant portion of their time—up to 70-80%—is spent waiting on data. This waiting is primarily due to inefficiencies in data location and orchestration, highlighting the critical role of CPUs and networking in managing data flow. CPUs, though versatile and essential for general computing tasks, are not optimized for the high-speed demands of large-scale AI inference, leading to underutilized expensive hardware in data centers.

To address this imbalance, the concept of a dedicated chip designed specifically for data orchestration tasks has emerged. This “AI CPU” would handle data preparation, networking, and flow control, allowing GPUs to focus solely on computation. The company New Reality has developed such a chip called the NR1, which acts as a conductor in an orchestra, coordinating data flow so that accelerators can operate at full capacity. Their approach integrates embedded ARM CPUs and advanced networking capabilities to efficiently manage data movement and processing, reducing latency and improving overall system performance.

New Reality’s CEO, Mosha Tash, explains that their technology targets the “AI head node,” a critical component that manages data fetching, processing, and communication between GPUs and storage. They have recently announced a second-generation product, the NR2 AI SuperNIC, which addresses the challenges of east-west communication in large training pods by providing ultra-low latency networking at speeds up to 1.6 terabits per second. This product is designed to be modular and flexible, supporting various transport protocols and in-network compute capabilities to offload some processing from GPUs, thereby reducing data transfer volumes and improving efficiency.

The company also introduced the concept of an AI hypervisor, a hardware-based scheduler and dispatcher that manages the control flow of AI workloads, offloading these tasks from the CPU to specialized hardware. This innovation allows for better parallelization and optimization of data movement and compute tasks across different engines, including GPUs and DSPs. New Reality emphasizes ease of integration and compatibility with existing software stacks, aiming to provide a plug-and-play solution that can work seamlessly with various AI accelerators and networking protocols.

Looking ahead, New Reality plans to expand its product line with modular chiplets and sell both chips and systems to meet diverse customer needs, including hyperscalers and enterprise data centers. They acknowledge the competitive landscape dominated by Nvidia but believe their specialized networking and orchestration solutions offer significant value, especially as AI workloads grow in complexity and scale. The company is actively engaging with partners and customers to refine their technology and expects to have the NR2 product in customers’ hands by the second half of the next year, aiming to enhance AI inference efficiency and performance across the industry.