Under 5 minutes to a deployed LLM endpoint — Audry Hsu, RunPod

Audry Hsu from RunPod presents the company as a cloud AI infrastructure provider that simplifies deploying machine learning models by offering flexible GPU resources and user-friendly products like Serverless and the Hub, enabling developers to launch production-ready LLM endpoints in under five minutes. Rooted in a community-driven approach, RunPod supports over 500,000 developers worldwide with scalable, cost-transparent solutions and active engagement through platforms like Reddit and Discord.

Audry Hsu from RunPod introduces the company as a cloud AI infrastructure provider that simplifies deploying machine learning models by offering GPU hardware and management. RunPod supports both private and open-source models, allowing developers to focus on building applications rather than managing complex infrastructure. The company addresses challenges such as the difficulty of managing GPU infrastructure, the global GPU supply crunch, and the need for flexible, reliable compute resources.

RunPod’s origin story is rooted in a community-driven approach, starting from the founders’ experience with unused GPU rigs after failed crypto mining efforts. They initially offered free GPU access in exchange for user feedback, which helped shape the platform. Today, RunPod serves over 500,000 developers worldwide with more than 30 data centers and has achieved significant revenue milestones. The company remains engaged with its user community through platforms like Reddit and Discord.

The platform offers several products tailored to different workloads: Pods for sandboxed container environments, Serverless for auto-scaling real-time inference, Clusters for heavy-duty multi-node training, and the Hub, a repository of preconfigured AI repositories. The Hub allows users to quickly deploy vetted open-source models with customizable settings, making it easy to get started without deep infrastructure knowledge.

Audry demonstrates deploying a large language model (LLM) using RunPod’s Serverless product via the web console. She highlights the simplicity of selecting a model from the Hub, configuring parameters like context window size, and launching an endpoint that auto-scales based on demand. The deployment process includes GPU allocation (H100s or A100s), cost transparency, and observability features such as request telemetry and execution metrics.

In conclusion, Audry emphasizes that deploying a production-ready LLM endpoint on RunPod can take under five minutes, making it accessible for developers to quickly launch AI-powered APIs. She also mentions an upcoming session focused on using RunPod’s Python SDK for terminal-based deployment, reinforcing the company’s commitment to supporting developers with flexible tools and community engagement.