Exaone3.5 Performance in #ollama

The video evaluates the performance of the Exaone 3.5 model from LG on the Ollama platform, testing three different parameter sizes (32 billion, 7.8 billion, and 2.4 billion) in real-time scenarios. While larger models generally provide more detailed responses, they often struggle with accuracy and processing speed compared to smaller models, highlighting the trade-offs between size, speed, and accuracy in AI performance.

In the video, the presenter explores the performance of the Exaone 3.5 model from LG, particularly in the context of the Ollama platform. The video begins with an overview of the hardware setup, which includes a powerful dual Intel Xeon Platinum CPU, an impressive 1.8 terabytes of RAM, and eight Nvidia H100 GPUs. The presenter has loaded three different models of Exaone—32 billion, 7.8 billion, and 2.4 billion parameters—into memory for real-time testing, allowing for immediate responses to queries without the need for loading times.

The presenter conducts a series of tests by posing various questions to the models, starting with a simple query about black holes. The performance of the models is compared based on their response times and accuracy. The smaller model responds quickly but inaccurately, while the larger model takes longer but also fails to provide the correct answer. This pattern continues with subsequent questions, including a logic puzzle about time travelers and a seating arrangement problem, where the larger models occasionally outperform the smaller ones but still struggle with accuracy.

As the video progresses, the presenter shifts focus to more complex questions, such as the implications of a city replacing traditional light bulbs with LEDs. The responses from the models vary, with the 32 billion parameter model providing the most comprehensive and relevant answers. The presenter notes that while the smaller models offer some insights, they often miss critical points related to energy usage and behavioral changes in response to the new technology.

The video also includes a comparison of the models’ abilities to handle API documentation and creative writing prompts. The smaller models tend to provide more straightforward answers, while the larger models offer more detailed and nuanced responses. However, the presenter highlights that the larger models can sometimes be less efficient in terms of processing speed, as evidenced by the tokens per second metric during the tests.

In conclusion, the presenter reflects on the overall performance of the Exaone 3.5 models, emphasizing the trade-offs between size, speed, and accuracy. The video serves as a comprehensive examination of how different parameter sizes affect the models’ capabilities in various scenarios, and the presenter expresses interest in conducting further tests with other emerging models in the future. Viewers are encouraged to engage with the content by suggesting additional questions or topics for exploration.