In this interview, Ilan Tayari, VP of Architecture at NextSilicon, explains how their Maverick 2 processor uses a novel dataflow-based architecture that directly executes intermediate representation graphs, enabling greater parallelism and efficiency compared to traditional CPUs and GPUs. He highlights the processor’s adaptability, strong performance in high-performance computing (HPC) workloads, and NextSilicon’s focus on HPC markets while planning for future AI and scalable chiplet-based designs.
In this interview, Ilan Tayari, VP of Architecture at NextSilicon, discusses the company’s innovative approach to computer architecture, particularly with their Maverick 2 processor. The conversation begins by highlighting the stagnation in traditional computer architectures, which have largely followed the same Von Neumann principles for decades. Tayari explains that while GPUs and specialized processors like ASICs and DSPs have introduced some diversity, most architectures still serialize instructions and reconstruct dependency graphs at runtime, leading to inefficiencies. NextSilicon’s approach is fundamentally different: instead of converting software into a linear instruction stream, their hardware executes the intermediate representation (IR) graph directly, eliminating the need for instruction fetch, decode, and reordering stages.
Tayari elaborates on how this dataflow-based architecture allows for much greater parallelism and efficiency. By mapping the IR graph directly onto hardware, each arithmetic logic unit (ALU) can be kept busy every cycle, maximizing throughput. This contrasts with traditional CPUs and GPUs, which, despite out-of-order execution and wide pipelines, are limited in the number of instructions they can execute simultaneously. The architecture also distributes memory management units (MMUs) across the chip, allowing each to handle only a small subset of memory accesses, which improves memory bandwidth utilization and reduces bottlenecks. The system can dynamically adapt memory allocations at runtime to avoid issues like false sharing, further optimizing performance.
A key innovation in Maverick 2 is its ability to reconfigure the dataflow graph while the application is running, based on real-time telemetry. This enables the processor to adapt to different computational kernels and workloads efficiently, with minimal overhead. Tayari notes that this flexibility is crucial for high-performance computing (HPC) applications, which often involve a wide variety of computational patterns and require rapid switching between different configurations. The hardware acceleration for configuration switching was a major lesson learned from the first-generation Maverick chip, ensuring that overheads are kept low enough not to impact performance.
NextSilicon’s initial focus is on the HPC market rather than AI, despite the latter’s much larger commercial potential. Tayari explains that HPC customers, such as national laboratories like Sandia, are sophisticated and willing to collaborate closely, providing valuable feedback for refining the technology. The Maverick 2 processor has demonstrated exceptional performance on benchmarks like STREAM and GUPS, showing its ability to fully utilize memory bandwidth even in irregular and sparse workloads. The architecture is designed to require zero code changes for existing applications, but NextSilicon also provides profiling tools to help users further optimize their code, often yielding benefits on other platforms as well.
Looking ahead, Tayari acknowledges that while dataflow architectures are inherently more efficient than traditional CPUs and GPUs, highly specialized AI chips will always have an edge in their specific domains. NextSilicon’s roadmap includes exploring AI-focused products, but their current priority remains HPC and hybrid HPC+AI workloads. The company is also investing in chiplet-based designs for greater scalability and density, with plans to enhance inter-chip communication for large-scale distributed computing. Overall, the interview underscores NextSilicon’s commitment to breaking away from legacy architectures and delivering significant performance gains through a fundamentally new approach to processor design.