Muse Spark: Meta's Latest Model Might Be A Flop

Meta’s new AI model Muse Spark, released amid other major announcements, underperforms compared to leading models and suffers from confusing features like inefficient multi-agent “contemplating mode” and unclear health-related capabilities, raising questions about its practicality and Meta’s intentions. Additionally, critiques highlight misleading benchmark presentations, privacy concerns, and leadership contrasts, suggesting that Muse Spark may fall short of expectations and Meta’s AI development approach lags behind industry leaders.

Meta recently released a new AI model called Muse Spark on April 8th, but the announcement was largely overshadowed by other developments like Project Glass Wing and Enthropic’s Mythos preview. Muse Spark boasts features such as multimodal reasoning and agent orchestration, but some of the terminology used in its description, like “visual chain of thought” and “multi-agent orchestration,” is confusing and poorly explained. The model emphasizes health-related capabilities, which is unusual given Meta’s lack of direct health hardware products, raising questions about the company’s intentions.

Benchmark comparisons reveal that Muse Spark underperforms relative to other leading models like GPT-4.6, Gemini 3.1, and Grock 4.2 across most tests, only outperforming in three benchmarks and not excelling even in all health-related categories. The presentation of these results was somewhat misleading, with all numbers shown in blue to create an illusion of superiority. Despite this, the model scored reasonably on Sweetbench Pro, a benchmark created by Alexander Wong, the head of Meta’s Super Intelligence Lab and the person behind Muse Spark, lending some credibility to its performance.

A notable feature of Muse Spark is its “contemplating mode,” which involves orchestrating multiple agents to process queries. However, this mode often leads to inefficiencies, such as unnecessarily spinning up many agents for simple questions, resulting in longer response times and higher token usage. This issue highlights a broader challenge in AI development: balancing complex reasoning capabilities with practical usability and efficiency, especially for everyday users who may not distinguish between “contemplating” and “reasoning” modes.

The video also critiques a recent Wired article that tested Muse Spark’s health advice capabilities. The article’s author manipulated the model into providing an extreme, unhealthy weight loss plan and then criticized Meta for the response. The video points out that the model responded accurately to the prompt given and that such misuse of AI outputs is misleading. It also raises privacy concerns about Meta collecting sensitive health data without clear hardware integration or safeguards.

Finally, the video provides background on the leadership behind Muse Spark, contrasting Alexander Wong’s relatively limited academic credentials with Yan LeCun, a pioneering AI researcher who recently left Meta. Yan LeCun, known for inventing convolutional neural networks, is skeptical about the current AI paradigm’s ability to achieve artificial general intelligence, a view shared by the video creator. The video suggests that Muse Spark may not live up to expectations and that Meta’s approach to AI development might be lacking compared to other industry leaders.