Blazing-Fast GenAI: How Fireworks AI and Crusoe Are Unleashing Performance with AMD Instinct GPUs

merefield · 10 November 2025 18:00

The video highlights the collaboration between Crusoe and Fireworks AI in building scalable, cost-effective AI infrastructure powered by AMD Instinct GPUs, with Crusoe providing managed AI clusters and energy-efficient data centers, while Fireworks delivers optimized large language model inference and training solutions. They emphasize overcoming challenges like multi-tenancy, hardware reliability, and network redundancy through innovative techniques and agile infrastructure management, encouraging businesses to leverage advanced hardware and expert-managed platforms to accelerate AI development.

merefield · 10 November 2025 18:23

The video features a discussion between representatives from Crusoe and Fireworks AI, highlighting their collaboration and innovations in AI infrastructure powered by AMD Instinct GPUs. Crusoe is introduced as a NeoCloud provider specializing in managed AI clusters and data center construction, including energy sourcing. They are actively building the Stargate data center in Texas, showcasing their commitment to scalable and efficient AI infrastructure. Fireworks AI, represented by Chenu, focuses on delivering high-performance large language model (LLM) inference with an emphasis on low latency and cost-effective throughput, serving a diverse clientele from startups to Fortune 500 companies.

Fireworks AI’s core strength lies in owning their inference engine, which they developed from scratch before the widespread popularity of generative AI models like ChatGPT. This ownership allows them to optimize performance across various use cases, such as long context processing and different model sizes. Beyond inference, Fireworks supports customers in tuning and training their own models using techniques like supervised fine-tuning (SFT) and reinforcement learning, enabling businesses to differentiate themselves by leveraging proprietary data and customized AI models rather than relying solely on generic frontier models.

A significant challenge discussed is managing multi-tenancy and cost-performance optimization across diverse hardware and cloud environments. Fireworks employs techniques such as efficient serving of low-rank adaptation (LoRA) models, allowing hundreds or thousands of fine-tuned models to run on a single GPU node. They also emphasize the importance of minimizing data ingress and egress costs, especially when frequent large-scale weight synchronizations occur between training and inference. Crusoe’s role as a cloud provider helps Fireworks achieve competitive GPU pricing and infrastructure reliability, which is critical given the scale and complexity of their operations.

The conversation also touches on the operational realities of running large-scale AI infrastructure, including hardware reliability and network redundancy. Fireworks has experienced unexpected failures such as simultaneous fiber cuts affecting data center connectivity, underscoring the need for robust multi-region failover strategies and close collaboration with cloud providers. They highlight the heterogeneity of hardware and cloud environments, which introduces unique challenges that require continuous adaptation and proactive infrastructure management to maintain high availability and performance.

Finally, both speakers stress the rapid pace of innovation in the AI industry and the importance of agility. They encourage developers and businesses to stay informed, experiment with tuning their own models, and leverage emerging hardware like AMD’s MI300, MI325, and MI355 GPUs. Fireworks advocates for a “virtual cloud” approach where customers specify workload requirements and let the platform optimize hardware configurations transparently. The discussion concludes with a call to embrace the fast-moving AI ecosystem, collaborate with partners who share this pace, and focus on building products while relying on expert-managed AI infrastructure to accelerate development and deployment.