iPhone Edge AI: What Apple didn't tell you about their new Foundation LLM (WWDC 2024)

The video discusses Apple’s development of a groundbreaking 3 billion parameter Language Model (LLM) for deployment on edge devices, particularly iPhones, showcasing remarkable text and image recognition capabilities at 0.6 milliseconds per prompt token. Apple’s innovative approach, including unique features like grouped query attention activation and embedded quantization, outperforms existing state-of-the-art models and highlights their commitment to advancing AI solutions for enhanced on-device performance.

In the video, the speaker discusses Apple’s groundbreaking work on developing their own bespoke Language Models (LLMs) for deployment on edge devices, particularly iPhones. Apple’s new on-device and server Foundation model is a 3 billion parameter model that rivals the performance of existing 7 and 8 billion parameter models. This model is capable of text and image recognition, running at a remarkable 0.6 milliseconds per prompt token on the latest iPhone. Apple’s model uses grouped query attention activation and embedded quantization, features not commonly seen in edge device models.

Apple’s innovative approach includes dynamic load cash and adaptive models for different languages or applications, making it highly performant and versatile. Their model outperforms previous state-of-the-art models like GPT-3.5 Turbo and GPT-4 Turbo, particularly excelling in on-device performance. By leveraging synthetic data and ablation techniques, Apple’s model achieves impressive benchmarks, matching the performance of larger models like GPT-4 Turbo on Apple servers. The speaker highlights Apple’s product-first mindset in developing AI solutions, leading to enhanced user satisfaction and performance.

The video discusses Apple’s use of the Ax Learn framework for pre-training, tpus, and GPUs for optimization, and post-training techniques like rejection sampling. Apple’s benchmarks show that their 3 billion parameter adapter model outperforms existing models on common LLM tasks and human preference evaluations. Apple’s focus on Edge compute signifies a shift in how tasks are processed, enabling developers to achieve more with less resource consumption. The speaker emphasizes the potential of Apple’s AI ecosystem and the impact of Apple silicon on local LLM advancements.

Overall, the video presents Apple’s significant advancements in developing on-device and server-based LLMs for edge devices, particularly iPhones. Apple’s innovative model, with its unique features and impressive performance metrics, showcases their commitment to pushing the boundaries of AI deployment. The speaker praises Apple’s product-first approach and anticipates continued advancements in the local LLM space with the widespread adoption of Apple silicon. The video concludes by inviting viewers to share their thoughts on Apple’s progress in the AI field and the implications of their cutting-edge technology on the future of edge computing.