I Ran Out of 8TB for LLMs… So I Built This

artesia · 4 June 2026 16:20

The video presents a streamlined workflow for managing large AI models locally using the open-source Model Shelf software combined with the high-performance, fanless Aacasis TB504 Thunderbolt 5 enclosure, which supports multiple NVMe drives for scalable and fast storage. This setup simplifies model organization, enables quick on-demand downloads via AI voice agents, and offers advanced configurations like RAID 0 to maximize speed and capacity, addressing common storage and accessibility challenges faced by AI developers.

artesia · 4 June 2026 16:40

The video showcases a new workflow for managing large language models (LLMs) and other AI models locally, addressing the common problem of running out of storage space on high-capacity drives. The creator demonstrates using a voice agent to quickly locate, download, and load specific models like the Qwen 3 4-billion parameter MLX version in 3-bit quantization without needing to remember exact model names or manually search through cluttered folders. This streamlined approach is powered by a tool called Model Shelf, an open-source project designed to organize and manage AI models efficiently across multiple drives.

Central to this setup is the Aacasis TB504, a Thunderbolt 5 external enclosure that supports four M.2 NVMe drives, each with its own dedicated 80 Gbit/s lane. The enclosure is fanless and silent, providing a balance between portability and high performance. It offers speeds comparable to internal SSDs, making it ideal for loading large models quickly. The presenter highlights how this hardware fits perfectly between portable SSDs and full NAS systems, offering a quiet, fast, and scalable storage solution for AI workloads that demand large amounts of data throughput.

Model Shelf simplifies model management by creating a clean, organized directory structure on any drive, avoiding the confusing and messy cache folders typical of platforms like Hugging Face. It integrates seamlessly with AI agents like Claude Code, enabling them to query the live model hub directly and fetch models on demand. This eliminates issues with outdated agent memory and ensures the latest models are always accessible. The tool also provides exact file paths and command-line instructions for running models, making it easy to use models across different environments and machines.

The video also explores advanced storage configurations, such as setting up RAID 0 across multiple drives in the TB504 enclosure to maximize speed and capacity. While RAID 0 offers no redundancy, it is suitable for AI models since they can be re-downloaded if lost, and inference workloads benefit from the increased bandwidth. The presenter tests the RAID setup and confirms that it maintains high speeds, further enhancing the efficiency of loading and running large models. This approach frees up internal drive space and consolidates model storage in a scalable, high-performance external solution.

In conclusion, the combination of the Aacasis TB504 enclosure and the Model Shelf software provides a powerful, user-friendly system for managing and accessing large AI models locally. It addresses common pain points like storage limitations, disorganized model files, and slow load times. The open-source nature of Model Shelf and the availability of the TB504 on Kickstarter make this solution accessible to developers and AI enthusiasts. The presenter invites viewers to share their current model storage methods and consider adopting this streamlined workflow for their own projects.