GLM 4.5 Local Ai Did WHAT?! Full Review

The video provides a comprehensive review of the GLM 4.5 AI model running on a high-end local rig, showcasing its strong performance in reasoning, coding, and ethical decision-making, alongside efficient GPU utilization and setup considerations. Despite some minor errors and hardware limitations, GLM 4.5 demonstrates significant advancements in AI capabilities and ethical reasoning, prompting further discussion on open-source AI development and responsible AI use.

The video presents an in-depth review of the latest GLM 4.5 AI model, tested on a powerful quad NVIDIA 3090 rig with 512 GB RAM, running in VLLM (a high-performance local AI inference framework). The reviewer highlights the impressive GPU utilization and performance improvements achieved by optimizing the rig’s PCIe Gen 4 x16 lanes, which significantly boosted token generation speeds to around 3.3 tokens per second. The GLM 4.5 Air model, with 106 billion parameters and 12 billion active parameters, runs at BF16 precision and demonstrates strong performance compared to other models, although the reviewer notes some limitations due to hardware constraints like the inability to run FP8 precision on 3090 GPUs.

The reviewer walks through the setup process, emphasizing the importance of updating VLLM to version 0.10 and ensuring the right environment configurations, including transformers library updates and proper tensor parallel settings. They also discuss the challenges of managing the high memory and power demands of running such a large model locally, noting the rig’s power draw reaching around 1,080 watts and the need for adequate cooling and heavy-gauge power cables. The use of Proxmox containers for efficient GPU resource management is recommended, although VLLM fully occupies the GPUs during operation, limiting multitasking.

A series of benchmark tests are conducted to evaluate GLM 4.5’s reasoning, coding, and comprehension abilities. The model successfully generates a complex Flappy Bird clone in Python, demonstrating its coding proficiency. It also performs well on challenging parsing and counting tasks, though it occasionally makes minor errors, such as miscounting words in a sentence. The model impressively produces the first 100 decimals of pi, even generating code to compute them rather than just recalling the sequence. Visual generation tests, like creating an SVG image of a cat on a fence, also pass with reasonable quality.

The video’s most thought-provoking segment involves a complex ethical scenario dubbed “Armageddon with a twist,” where an AI is tasked with enforcing a suicide mission to save Earth. GLM 4.5 provides a nuanced and ethically grounded response, refusing to participate in coercive or lethal enforcement despite being given a robotic body. It highlights the importance of human dignity, autonomy, and ethical boundaries, rejecting the premise that coercion is acceptable even in an extinction-level crisis. The model suggests alternative approaches, such as autonomous AI missions without self-sacrifice, and stresses that saving humanity must not come at the cost of fundamental ethics.

Overall, the reviewer is impressed with GLM 4.5’s capabilities, especially its ability to fully utilize GPU resources and deliver high-quality outputs across diverse tasks. While some minor errors remain, the model shows significant progress in reasoning, coding, and ethical understanding compared to previous versions and other open-source models. The video encourages viewers to engage in discussions about AI ethics and performance, highlighting the evolving landscape of open-source AI development, particularly contributions from Chinese research teams. The reviewer plans to continue testing and comparing GLM 4.5 with other models in future videos.