DeepSeek 3.1 FULL Just Launched!

The video announces the launch of DeepSeek 3.1, a major upgrade featuring a 128K context window, enhanced agentic tool-calling, and significant performance improvements, designed to run locally without massive hardware requirements. It also provides practical advice for users, highlights ongoing development efforts, and expresses optimism about this being the final V3 release before moving to V4.

The video announces the launch of DeepSeek 3.1, emphasizing that this is the full version, not just a base model. The speaker clarifies the difference between base and instruct models, noting that the latter is essential for running locally without requiring massive hardware like H100 GPUs. The new DeepSeek 3.1 release is described as a significant upgrade rather than a minor point release, with improvements in thinking and non-thinking modes integrated into a hybrid model. Users are advised to watch out for chat template updates, as initial versions may not provide the best experience immediately after release.

DeepSeek 3.1 supports an impressive 128K context window, a substantial increase from previous versions, with recommendations for 64K as a decent setting. The model has undergone extensive additional training, including a 32K extension phase involving hundreds of billions of tokens. The total parameters stand at 671 billion, with 37 billion active parameters, and there are ongoing efforts to create a single merged file for easier use with tools like Olama and Llama C++. The speaker mentions plans to experiment with quantization and highlights the excitement around the model’s enhanced agentic tool-calling capabilities.

Performance benchmarks show that DeepSeek 3.1 significantly outperforms previous versions, with major improvements in thinking efficiency, multilingual benchmarks, and terminal tasks. The model’s training included a new tokenizer configuration and a large volume of pre-training tokens, contributing to its enhanced capabilities. The video also touches on the geopolitical implications of the model’s training using UE8 MO FP8, which is designed for upcoming domestic chips, possibly including Huawei Ascend 920s or a dedicated DeepSeek ASIC, hinting at strategic technological developments.

The speaker provides practical advice for users interested in running DeepSeek 3.1 locally, recommending at least 128GB of RAM and multiple GPUs for decent performance, with 256GB being ideal. While Llama C++ can run the model with paging, it is noted to be very slow. The video also references a recent three-part series and accompanying guides that help users set up various runners like Lama, Open Web UI, and Llama C++, making it easier for enthusiasts to get started. The speaker cautions against downloading incomplete files from unofficial sources and encourages waiting for official releases.

In conclusion, the video expresses optimism that DeepSeek 3.1 might be the final release in the V3 line before moving on to V4. The speaker plans to produce a Q4 quantized version if possible and suggests relying on trusted sources like Unsloth for stable quantizations. The overall tone is enthusiastic about the model’s advancements and potential, with a promise of further reviews and updates as the speaker continues to explore DeepSeek 3.1’s capabilities. Viewers are encouraged to share their thoughts and stay tuned for more content.