DeepSeek R1 Local Ai Server LLM Testing on Ollama

The video reviews the DeepSeek R1 local AI server model, highlighting its impressive 128k context window and performance in reasoning tasks, while also noting its shortcomings in handling simpler tasks and coding challenges. The host emphasizes the need for ongoing improvements and invites viewer feedback on their experiences with the model.

In the video, the host discusses the DeepSeek R1, a newly released local AI server model that has garnered attention for its impressive capabilities, particularly in reasoning tasks. The model, which boasts a context window of 128k, is said to outperform other models like Claude when run locally. The host encourages viewers to subscribe to the channel and highlights the various machines and GPUs tested in previous videos, emphasizing the importance of having an always-on AI server for maximizing value.

The host provides insights into the setup process for running the DeepSeek R1 model, mentioning the use of Proxmox, LXC, and Docker. They also discuss the power requirements of the rig, noting an unexpected spike in power consumption that may necessitate adding a second power supply. The video aims to evaluate the model’s performance by running a series of reasoning tests and measuring tokens per second, while also addressing the importance of context window size for effective reasoning.

As the testing begins, the host poses a variety of questions to the model, including coding challenges and ethical dilemmas. While some responses are satisfactory, others reveal shortcomings, such as the model’s failure to accurately rewrite code or reference external assets. The host expresses disappointment in the model’s performance on simpler tasks, suggesting that it may not yet be ready to replace other leading models like Llama 3.3 or QWQ.

Throughout the video, the host emphasizes the need for precision in the model’s responses, particularly when dealing with straightforward questions. They note that the model struggles with basic tasks, such as counting letters in a word or providing accurate mathematical calculations. This inconsistency raises concerns about the model’s reliability for practical applications, especially in conversational or home assistant scenarios.

In conclusion, the host reflects on the overall performance of the DeepSeek R1 model, describing it as “good but not great.” They highlight the importance of ongoing testing and updates to improve the model’s capabilities. The video serves as a candid exploration of the model’s strengths and weaknesses, encouraging viewers to share their thoughts and experiences in the comments. The host remains committed to providing unfiltered insights into AI developments, aiming to cut through the hype surrounding new technologies.