Phi3-70B: Microsoft's TINY Competitor to Llama3 might beat Mixtral8x7B?

Microsoft has introduced two new models, 53 mini and 53 medium, as competitors to Llama3 and MixRal 8X 7B, with impressive performance benchmarks and optimized for performance with minimal parameters. These models have the potential to be game-changers by offering high performance on consumer hardware at more affordable costs, though concerns remain about their performance on factual information and the need for human evaluations to determine their true capabilities.

Microsoft released two new models, 53 mini and 53 medium, as competitors to Llama3 and MixRal 8X 7B. These models have impressive performance benchmarks, with 53 mini rivaling MixRal 8X 7B and outperforming Llama3 instruct on certain tasks. Microsoft’s strategy focuses on creating small yet capable models, optimized for performance with minimal parameters.

The highlight of the models is their potential to be game-changers for those unable to afford running models like Llama3 on expensive GPUs. The 53 mini has 3.8 billion parameters, while the 53 medium has 14 billion parameters, showing Microsoft’s ability to achieve impressive performance with smaller models. The models are optimized for safety and efficiency, with versions capable of running on mobile devices with lower RAM usage.

Microsoft’s approach involves using heavily filtered web data and synthetic data for training, aiming to produce high-quality tokens. Despite similarities in optimization strategies with Meta, Microsoft’s models have unique strengths, such as incorporating the same architecture as Llama2. The models are designed to run on consumer hardware, offering accessibility and practicality.

Concerns arise regarding the models’ performance on factual information, raising questions about their overall usability compared to Llama3 instruct. The reliance on synthetic benchmarks for evaluation may limit the models’ true capabilities, emphasizing the need for human evaluations. Microsoft’s previous model, F2, faced criticisms for overpromising, leading to skepticism about the new models’ potential.

Overall, the release of 53 mini and 53 medium signifies Microsoft’s continuous efforts to stay competitive in the LM field. The models show promise in terms of performance and efficiency, catering to a wider range of users with their smaller parameter sizes. As the industry evolves, the true impact and capabilities of these models will be determined over time through practical applications and thorough evaluations.