DeepSeek v3.1 Update is Better than I Expected... BUT?

DeepSeek v3.1 introduces significant improvements over its predecessor, particularly in tool calling, code generation, and hybrid reasoning, making it a more powerful and versatile open-source AI model despite some quirks and slow performance. While not yet ideal for production use, the update rekindles hope for future versions and encourages experimentation, highlighting its potential in research and coding workflows.

The video reflects on the evolution and improvements of DeepSeek, particularly focusing on the transition from version 3 to version 3.1. The creator reminisces about the initial excitement when DeepSeek v3 was first released, highlighting its fast APIs and potential to compete with major players in the AI space. However, as the model gained popularity, performance issues arose, notably slow API responses that made it less usable. Despite these challenges, DeepSeek v3 marked a significant milestone as an open-source model that disrupted the market and even influenced the stock market.

DeepSeek v3.1 brings notable upgrades, especially in tool calling capabilities, which were a weak point in the previous version. The update supports structured tool calling, code agents, and search agents, making it more suitable for research, coding, and agentic workflows. The model also introduces a large hybrid reasoning feature, allowing users to dial up reasoning effort. Additionally, it supports a 128k context window and the Anthropic API format, enabling smoother integration with cloud code environments. These enhancements collectively make DeepSeek v3.1 a more powerful and versatile tool.

In practical use cases, DeepSeek v3.1 demonstrates improved performance in navigating and modifying codebases compared to its predecessor. The video showcases examples where the model successfully updates code, creates UI elements, and handles complex tasks like validating dates. Interestingly, the model even attempts to optimize prompts dynamically, a behavior not observed in other models like GPT-5. However, the model still exhibits some quirks, such as occasionally inserting Chinese text into code, which remains an unresolved issue. Despite these flaws, the improvements in tool calling and code generation are significant.

Performance-wise, DeepSeek v3.1 shows substantial gains in benchmarks like SweetBench and Terminal Bench, with nearly two and a half times improvement in some areas. However, the model is notably slow, especially when used directly via the DeepSeek API, with token processing rates around 15-18 tokens per second. This sluggishness makes it challenging for daily coding tasks but could be manageable for queued background work. Comparisons with other models and environments, such as Root Code and Cloud Code, reveal that while DeepSeek v3.1 excels in accuracy and structured tool calling, it still lags behind in speed and some interactive capabilities.

In conclusion, while DeepSeek v3.1 is a meaningful upgrade that rekindles hope for future versions like v4, it is not yet ideal for production use due to speed and occasional language issues. The creator expresses a nostalgic fondness for DeepSeek v3 but acknowledges that newer models like Quinn3 coder and GPT-5 outperform it in several respects. The video encourages viewers to experiment with DeepSeek v3.1 themselves, especially in cloud code setups, to appreciate its strengths and limitations. Overall, DeepSeek v3.1 represents a promising step forward in open-source AI coding models, with room for further refinement and community involvement.