LoRA Fine-tuning Tiny LLMs as Expert Agents

The video explains how to fine-tune small, 1-billion-parameter language models using Nvidia’s Nemo microservices to enhance their ability to perform function calling tasks, making them suitable for complex agent applications. It covers data preparation, training, and deployment processes, demonstrating that even tiny models can reliably execute function calls after fine-tuning within an accessible, scalable microservice architecture.

The video provides a comprehensive overview of how to fine-tune small language models (LLMs), specifically 1-billion-parameter models, to enhance their capabilities as expert agents, particularly in function calling tasks. The presenter emphasizes the importance of function calling for enabling LLMs to perform complex workflows such as code review, web searches, and email management. Despite the limitations of off-the-shelf LLMs in this area, the video demonstrates that with fine-tuning, even tiny models can be significantly improved to reliably perform function calls, making them viable for various agentic applications.

The process begins with deploying Nvidia’s Nemo microservices, which facilitate the training, hosting, and management of custom LLMs. The presenter walks through deploying these microservices using Helm charts, setting up necessary components like data stores, entity stores, and model deployment services. They highlight the importance of configuring the environment correctly, including handling storage issues, setting API keys for Nvidia’s container registry, and ensuring all components are properly connected. This setup allows for scalable and efficient fine-tuning of models within a microservice architecture, making the process more accessible and manageable.

Next, the video discusses data preparation, focusing on using a dataset from Salesforce that trains models for function calling. The dataset, available on Hugging Face, contains user queries and assistant responses that include function call instructions. The presenter explains the need to convert this dataset into the OpenAI-compatible format, which involves restructuring the data and normalizing function schemas. They detail the process of converting Python-based function schemas into the OpenAI standard, filtering for single function calls, and formatting messages appropriately. This step ensures the data is suitable for fine-tuning with Nvidia’s Nemo customizer.

The fine-tuning process itself is described in detail, including uploading the prepared data to the Nvidia data store, registering it with the microservices, and initiating training jobs. The presenter recommends using Weights & Biases for monitoring training progress and performance metrics like validation loss. They demonstrate how to track training, evaluate results, and even cancel ongoing jobs if needed. Once training completes, the custom model is registered and integrated into the deployment pipeline, allowing it to be used for inference through Nvidia’s NIM containers, which are compatible with OpenAI APIs.

Finally, the video showcases testing the fine-tuned model in a real-world scenario, where the model performs function calling within an OpenAI-compatible environment. The presenter demonstrates how to send prompts, interpret function call responses, and verify that the model correctly utilizes the trained function schemas. They emphasize the impressive performance of a small model after fine-tuning, noting its ability to reliably perform function calls, which is typically challenging for models of this size. The overall message highlights the power and accessibility of Nvidia’s microservice ecosystem for building custom, cost-effective LLM agents tailored to specific tasks and industries.