What Is an AI Stack? LLMs, RAG, & AI Hardware

The video explains that building effective AI systems requires a well-designed AI stack comprising infrastructure, models, data, orchestration, and application layers, each playing a crucial role beyond just the AI model itself. It highlights how components like specialized hardware, diverse models, data augmentation through retrieval-augmented generation, complex workflow orchestration, and user-friendly interfaces collectively enable AI solutions that are practical, accurate, and aligned with real-world needs.

The video explains the concept of an AI stack, emphasizing the importance of getting each layer right to build AI systems that solve meaningful problems rather than just generating answers. Using the example of an AI-powered application for drug discovery researchers, it highlights that while the model is a crucial part of the stack, it is only one piece. The infrastructure layer is equally important because large language models (LLMs) require specific hardware, such as GPUs, which may not be supported by standard enterprise servers or laptops. The choice of infrastructure—whether on-premise, cloud, or local—affects the deployment and performance of the AI system.

Next, the video discusses the model layer, where AI builders have a variety of options. Models differ in terms of being open-source or proprietary, their size, and specialization. Large language models offer broad capabilities but may require more powerful hardware, while smaller models can run on lighter devices but might be specialized for specific tasks. There are thousands of models available on platforms like Hugging Face, catering to different needs such as reasoning, tool calling, or code generation.

The data layer is critical for supplementing the model’s knowledge, especially since models have knowledge cutoffs and may not be up-to-date with the latest information. This layer includes data sources, pipelines for processing data, and vector databases used in retrieval-augmented generation (RAG). Vector databases store embeddings of external data, enabling the model to retrieve relevant context quickly and augment its responses with information beyond its original training data.

Orchestration is another essential layer that manages complex AI workflows. Instead of simply inputting a prompt and receiving an output, orchestration breaks down user queries into smaller tasks such as planning, execution (including tool or function calling), and reviewing. This layer allows the AI system to think through problems, perform multiple steps, and even critique its own outputs to improve accuracy. The orchestration layer is rapidly evolving with new protocols and architectures designed to handle increasingly sophisticated AI tasks.

Finally, the application layer focuses on the user interface and integration aspects of AI systems. While many AI tools use simple text input and output, real-world applications often require more complex interfaces that support multiple data modalities like images, audio, or numerical data. Features such as revision capabilities and citation tracking enhance usability. Integration with other tools is also vital, allowing AI outputs to be seamlessly incorporated into users’ workflows. Understanding and optimizing all these layers—from hardware to application—enables the creation of AI systems that are reliable, efficient, cost-effective, and aligned with real-world needs.