It's Hard To Make Bigger Chips

The video explains how the semiconductor industry is moving from large monolithic chips to modular chiplets connected by a coherent interconnect to overcome scaling challenges in power, cost, and yield. It highlights Saverite’s innovative Omniclusters and Multiplexus fabric, which enable scalable, efficient AI compute platforms from edge devices to data centers with significant power and cost savings while maintaining software compatibility.

The central challenge in modern computing is scaling—scaling up, scaling out, and doing so efficiently. Traditionally, this was achieved by building larger monolithic chips using the latest manufacturing nodes. However, as chip sizes grow, issues like power consumption, yield, and cost have become significant barriers. To overcome these, the industry is shifting towards chiplets—smaller, specialized dies that are easier to manufacture and can be combined to form larger systems. When connected through a fast, coherent, and efficient interconnect, these chiplets can function as a unified system, enabling new architectural approaches seen in AMD, Intel, and Nvidia products. This shift is paving the way for fully composable architectures that scale from die to system to rack.

A new semiconductor startup called Saverite is pioneering this modular approach with its coherent network of chiplets called Omniclusters, connected via a novel interconnect named Multiplexus. Saverite’s architecture integrates CPUs, memory, and AI engines into a single coherent fabric, designed to scale from edge devices to large data centers. Their chiplets, Omniflex and Skylex, are optimized for compute density and memory capacity respectively, allowing flexible configurations tailored to different AI workloads. Saverite claims their approach delivers up to 90% lower power and cost at the system level compared to conventional GPU clusters, while maintaining full software compatibility through their runtime stack.

The economics and physics of chip manufacturing highlight the advantages of chiplets. As process nodes advance, wafer costs and defect rates increase, making large monolithic chips expensive and yield-limited. Chiplets, being smaller, have higher yields and can be reused across multiple products, optimizing cost and performance. Different chiplets can be manufactured on process nodes best suited to their function, such as compute on leading-edge nodes and IO on mature nodes. The key challenge is building a coherent interconnect fabric that allows multiple chiplets to behave as a single system, overcoming limitations of traditional fabrics like PCIe and UPI.

Multiplexus, Saverite’s distributed coherent fabric, addresses this challenge by enabling a unified memory space across chiplets, packages, and even racks. It maintains coherence, low latency, and high bandwidth, allowing workloads to move seamlessly between compute and memory tiles without costly data copying. This architecture supports large-scale AI workloads, enabling models with hundreds of billions of parameters to run within a single address space. Saverite’s chiplets and fabric are integrated into their Helix product line, which scales from small edge devices (Helix M) to desktop workstations (Helix D) and rack-scale systems (Helix R), all sharing the same architecture and software stack.

Saverite has already demonstrated their architecture on FPGA prototypes running live AI workloads and plans silicon production in early 2026. Their software stack, compatible with CUDA and Triton, allows existing AI frameworks to run without modification, simplifying developer adoption. With over $100 million in pre-orders and numerous design opportunities, Saverite aims to deliver a unified, modular AI compute platform that spans edge to data center. While challenges remain in scaling coherence and building ecosystem support, Saverite’s approach represents a pragmatic and innovative step towards solving the complex demands of AI hardware scalability.