Nvidia Backs DeepInfra in $107 Million Raise

Nvidia has invested in DeepInfra, a company specializing in an optimized AI inference cloud built around Nvidia’s GPU hardware, to enhance efficient access to open-source AI models and reduce inference costs. With $107 million in new funding, DeepInfra plans to expand its infrastructure across the US and globally, overcoming supply chain challenges through strategic partnerships to meet the growing demand for scalable AI inference services.

Nvidia has invested in DeepInfra, a company focused on building a specialized inference cloud that enables efficient access to open-source AI models. DeepInfra’s infrastructure is purpose-built to optimize AI inference, which is a critical phase in deploying AI models. Nvidia’s participation in the funding round reflects its interest in supporting technologies that enhance the utilization of its GPU hardware for AI workloads.

Unlike competitors such as Cerebras, which position themselves as alternatives to Nvidia, DeepInfra aligns closely with Nvidia’s hardware ecosystem. The company believes Nvidia’s GPUs remain the most efficient solution for AI inference and has heavily invested in leveraging this hardware. DeepInfra collaborates with Nvidia to improve inference efficiency and reduce the cost per token processed, anticipating that inference will account for the majority of AI compute demand in the future.

DeepInfra currently processes around five trillion tokens per week, achieving cost efficiencies through a combination of hardware deployment and software optimization. Key to their approach is effective caching mechanisms, such as key-value caches, which reduce redundant computations when AI agents make repeated requests with similar contexts. This focus on the entire stack—from data center selection to software design—has allowed DeepInfra to build a scalable and efficient inference cloud over the past four years.

With the recent $107 million funding, DeepInfra plans to expand its platform by deploying the latest Nvidia chips and scaling operations across the United States, with future plans to enter European and Asian markets. The company operates out of eight data centers and aims to increase its infrastructure footprint to meet growing demand for AI inference services globally.

Supply chain challenges, particularly shortages of memory and storage components, have impacted DeepInfra’s ability to build inference clusters. Strategic investors like Samsung and Super Micro play a crucial role in securing necessary hardware supplies. Given the immense compute requirements for AI inference—far exceeding traditional computing needs—DeepInfra emphasizes the importance of strong partnerships to navigate chip shortages and ensure steady growth in its AI infrastructure capabilities.