Microsoft's Phi 3.5 - The latest SLMs

artesia · 21 August 2024 13:01

The video discusses Microsoft’s latest Phi 3.5 models, highlighting three new additions that enhance performance, particularly in multimodal tasks and non-European languages, while maintaining compact sizes for local deployment. It emphasizes the strengths of the Phi 3.5 Mini Instruct, the mixture of experts model, and the vision-focused model, showcasing their benchmarks and potential applications for developers and researchers.

artesia · 21 August 2024 13:21

In the latest update from Microsoft, the Phi 3.5 models have been enhanced with three new additions, expanding on the already popular Phi 3 series. These models are particularly appreciated for their compact size and ability to run locally, making them accessible for various applications. The video discusses the new models, their improvements, and the benchmarks that highlight their performance compared to previous versions and other models in the market.

The first new model introduced is the Phi 3.5 Mini Instruct, which features improved instruct tuning rather than being a completely new base model. This model, with 3.8 billion parameters, shows significant performance improvements in benchmarks, especially for multimodal tasks and non-European languages like Arabic and Chinese. The model’s ability to handle long contexts, up to 128K tokens, is also noteworthy, as it competes well against larger models like Llama 3.1, despite being smaller in size.

The second model is a mixture of experts version of Phi 3.5, which is larger and trained on nearly 5 trillion tokens. This model demonstrates impressive benchmarks, performing comparably to proprietary models like Gemini Flash and GPT-4o Mini. Its open weights allow for fine-tuning and private use, making it a versatile option for developers. The model also supports a 128K context window, similar to the Mini Instruct, and is designed for efficient local deployment.

The third model focuses on vision capabilities, built on a 4.2 billion parameter architecture and fine-tuned on 500 billion tokens. This model aims to enhance the post-training processes and is positioned as a precursor to future Phi 4 models. While it shows strong performance against other open models, it still trails behind some proprietary options. However, its open-source nature allows for extensive customization and private use, appealing to a wide range of users.

The video concludes with a comparison of the new models through various tests, highlighting that the Phi 3.5 Mini Instruct often outperforms the mixture of experts model in certain tasks. The Mini Instruct is noted for its speed and efficiency, making it a preferred choice for many applications. The presenter encourages viewers to explore these models further, emphasizing their potential for local data processing and structured data generation. Overall, the Phi 3.5 models represent a significant advancement in Microsoft’s AI offerings, providing powerful tools for developers and researchers alike.

Phi 3.5 MoE Colab: Google Colab
Phi 3.5 Mini Colab: Google Colab
Phi 3.1 Mini Colab: Google Colab

Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: Building LLM Agents

Github:
GitHub - samwit/langchain-tutorials: A set of LangChain Tutorials from my youtube channel (updated)
GitHub - samwit/llm-tutorials: A set of LLM Tutorials from my youtube channel

Time Stamps:
00:00 Intro
00:10 Phi 3 GitHub
00:36 Phi 3.5 Mini Instruct
04:59 Phi 3.5 MoE Instruct
07:03 Phi 3.5 Vision Instruct
08:44 Phi 3 Cookbook
09:05 Code Demo