Muse Spark Review: 262K Context & 1491 Arena Elo (2026)
Meta's first closed frontier model: 262K context, 89.5% GPQA Diamond, 1491 Chatbot Arena Elo. Free on meta.ai. API in private preview. Full benchmark breakdown.
Muse Spark is Meta Superintelligence Labs' first closed frontier model, released April 8, 2026, with a 262K-token context, 89.5% GPQA Diamond, and 1491 Elo on Chatbot Arena at rank 5 in May 2026. It is free to use on meta.ai and the Meta AI app; API pricing is unannounced with access in private preview only as of June 2026.
Muse Spark, released April 8, 2026 by Meta Superintelligence Labs, is Meta's first closed proprietary frontier model with a 262,144-token context window and 131K max output. It scores 89.5% on GPQA Diamond, 56.6% on SWE-bench Verified, and reached 1491 Elo on Chatbot Arena (rank 5, May 2026). Consumer access is free via meta.ai; a public API has not launched as of mid-2026.
Provider: Meta AI · Family: Muse
Context window: 262,144 tokens · Max output: 131,072
Input modalities: text, image, audio, tool-calls · Output: text, tool-calls
About Muse Spark
Muse Spark is the first model in Meta's Muse family, released on April 8, 2026, by Meta Superintelligence Labs (MSL). It marks Meta's most significant strategic pivot in AI since the Llama 1 release: for the first time, Meta shipped a frontier-class model with closed weights and no open download. Built from the ground up by MSL under Chief AI Officer Alexandr Wang and Chief Scientist Shengjia Zhao, Muse Spark was designed to compete directly with GPT-5 and Claude Opus 4.6 on frontier tasks while simultaneously powering the Meta AI consumer assistant across Facebook, Instagram, WhatsApp, and Messenger. Meta reports that Muse Spark achieved equivalent benchmark performance to Llama 4 Maverick using over 10x less compute during pretraining, suggesting a qualitative shift in training efficiency rather than pure parameter scaling. Benchmark performance places Muse Spark at the second tier of the frontier. On GPQA Diamond, it scores 89.5%, one of the strongest results among models released through mid-2026. On SWE-bench Verified (software engineering), it scores 56.6%, ahead of some competitors in the 50-55 range but below the 70+ scores achieved by the top agentic coding models. MMLU-Pro sits at 78.2% and the Artificial Analysis Intelligence Index puts Muse Spark at 52, fourth among the models tracked in mid-2026 behind Gemini 3.1 Pro (57), GPT-5.4 (57), and Claude Opus 4.6 (53). On Chatbot Arena, human preference voting placed Muse Spark at #5 with 1491 Elo as of May 2026, up from an initial #10 at 1441 Elo in April, indicating strong performance in head-to-head human evaluations once vote counts accumulated. The context window is 262,144 tokens, with a maximum output limit of 131,072 tokens per response. These figures make Muse Spark competitive with other long-context models like Claude's 200K window and Gemini's 1M offering at the pro tier. Meta has not published information about long-context recall accuracy above 100K tokens, so independent needle-in-haystack results remain unavailable as of mid-2026. The large output cap (131K) is notable: most frontier models cap output at 32K-64K, and the higher limit benefits long-form generation tasks such as drafting detailed reports or generating extended code. Muse Spark is natively multimodal. Unlike prior-generation models that bolt a vision encoder onto a language backbone, Muse Spark processes text, images, and voice in a unified architecture, with visual information integrated at the architectural level and processed synchronously. The model supports visual chain-of-thought, enabling it to work through image-based problems step by step rather than producing a flat one-pass answer. Tool use and multi-agent orchestration are native capabilities, not add-ons. Three reasoning modes are offered: Instant (fast single-pass response), Thinking (explicit chain-of-thought), and Contemplating (the headline feature, which orchestrates multiple reasoning agents running in parallel for the hardest tasks). Pricing for the Muse Spark API has not been announced. As of mid-2026, API access is in private preview for select Meta partners only. Meta's consumer meta.ai and mobile app provide free access for general users. Analysts estimate that when a public API launches, pricing will likely fall in the $3-6 per million input tokens and $20-30 per million output tokens range, based on the model's benchmark positioning relative to GPT-5.4 and Claude Opus 4.6. No AWS Bedrock, Google Vertex, or Azure deployment has been confirmed, in contrast with the Llama 4 family which is broadly available on cloud provider marketplaces. Access to Muse Spark is currently limited to the meta.ai web interface, the Meta AI mobile app, and a private API preview. Meta integrated Muse Spark into the Meta AI assistant, replacing the previous Llama-based backend. The assistant runs across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban AI smart glasses. Meta has not published SDK bindings specific to Muse Spark, though any future public API is expected to follow an OpenAI-compatible format based on Meta's prior developer tooling patterns. Safety documentation for Muse Spark is more complete than the model's API access. Meta published a Safety and Preparedness Report covering catastrophic risk domains including chemical and biological threats, cybersecurity, and loss-of-control scenarios, as well as a separate Contemplating Mode Safety Report for the multi-agent reasoning feature. The evaluation process follows Meta's Advanced AI Scaling Framework, which defines tiered deployment thresholds analogous to Anthropic's RSP and OpenAI's Preparedness Framework. Muse Spark demonstrates strong refusal behavior across high-risk domains, enabled by pretraining data filtering, safety-focused post-training, and system-level guardrails. Specific harmbench or jailbreak resistance rates have not been published. Training data cutoff is undisclosed, as is parameter count and full architecture specification. Muse Spark is best for multimodal reasoning tasks where a single model must handle text, images, and voice in the same session; for users who already live in the Meta AI app ecosystem and want the highest-capability model available without an API; and for research teams who want access to the Contemplating Mode parallel reasoning architecture. It is not the right choice for teams that need a stable public API today, for enterprise buyers needing SOC 2 and zero-retention guarantees, or for developers who want to build production applications with no access dependency on Meta's private preview program. For software engineering tasks specifically, models with higher SWE-bench scores — such as those in the 70-80% range — will outperform Muse Spark in autonomous coding loops. Muse Spark represents Meta's entry into the proprietary frontier race, backed by a company spending up to $145 billion on AI infrastructure in 2026. Its architecture-level multimodality and Contemplating Mode are genuine differentiators, but the locked-down API status limits developer adoption. For users inside Meta's consumer ecosystem it is the best model Meta has shipped. For developers building applications, the Llama 4 family through third-party API providers remains the practical choice until Meta opens the Muse API publicly.
Pricing
API pricing is unannounced as of June 2026. Access is in private preview for select partners. Analyst estimates suggest $3-6 per 1M input tokens and $20-30 per 1M output tokens when public API launches.
Key Features
- Contemplating Mode: Orchestrates multiple reasoning agents running in parallel, enabling more thorough analysis on complex tasks without proportionally higher latency.
- Native Multimodal Architecture: Text, image, and voice are processed in a unified architecture from the ground up, not as separate modules bolted onto a language backbone.
- 262K Token Context Window: Handles inputs of up to 262,144 tokens with a maximum output of 131,072 tokens per response — among the highest output caps of any frontier model.
- Visual Chain-of-Thought: Steps through image-based problems iteratively rather than generating a flat one-pass answer, enabling more accurate visual reasoning.
- Meta Ecosystem Integration: Powers Meta AI across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban AI glasses, reaching over 700 million monthly active users.
Pros
- 89.5% GPQA Diamond is one of the strongest published science and reasoning scores among models available through mid-2026.
- Contemplating Mode's parallel multi-agent architecture is a genuine differentiator — no other widely-deployed frontier model orchestrates parallel agent reasoning at the inference level.
- Free consumer access via meta.ai provides the most capable Meta AI experience to date with no API key required.
Cons
- No public API as of mid-2026 — private preview only, with no confirmed launch date despite multiple reported delays.
- 56.6% SWE-bench Verified trails the top agentic coding models by 15-25 points, limiting its usefulness in autonomous software engineering workflows.
- Parameter count, training cutoff, and architecture details are fully undisclosed, making independent capability assessment and safety auditing difficult.
Benchmarks
- mmlu pro: 78.2
- lmarena elo: 1491
- gpqa diamond: 89.5
- lmarena rank: 5
- swe bench verified: 56.6
- artificial analysis intelligence index: 52
Frequently Asked Questions
What is Muse Spark and who built it?
Muse Spark is the first model in Meta's Muse family, released on April 8, 2026, by Meta Superintelligence Labs (MSL) — a research division established by Meta CEO Mark Zuckerberg in June 2025 and led by Chief AI Officer Alexandr Wang (founder of Scale AI) and Chief Scientist Shengjia Zhao (ex-OpenAI). It is a natively multimodal frontier model that processes text, images, and voice in a unified architecture, rather than attaching a vision module to a language backbone. Muse Spark is the first Meta AI model released without open weights — a deliberate break from the company's Llama open-weights strategy. It scores 89.5% on GPQA Diamond and 56.6% on SWE-bench Verified, and reached 1491 Elo on Chatbot Arena in May 2026 (rank 5). The model powers the Meta AI assistant across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban AI glasses, reaching over 700 million monthly active users. Meta reports that Muse Spark achieves equivalent performance to Llama 4 Maverick with more than 10x less pretraining compute, suggesting a significant leap in training efficiency.
How much does Muse Spark cost per 1M tokens?
Meta has not announced public API pricing for Muse Spark as of June 2026. API access is in private preview for select Meta partners, with no application portal or public waitlist. Consumer access via meta.ai and the Meta AI mobile app is free. Analysts estimate the public API, when it launches, will likely be priced at $3-6 per million input tokens and $20-30 per million output tokens, based on comparable benchmark positioning to GPT-5.4 and Claude Opus 4.6. For teams that need a stable cost model today, OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6 both have published pricing at similar capability tiers. If cost is a primary driver and API access is needed now, Llama 4 Maverick through Together AI or Fireworks AI may offer the best Meta-architecture option at a fraction of what Muse Spark is estimated to cost. Monitor ai.meta.com/blog/ for any official pricing announcements.
What is Muse Spark's context window and max output?
Muse Spark has a context window of 262,144 tokens (approximately 262K tokens), which positions it between Claude's 200K context and Gemini's 1M extended context. The maximum output per response is 131,072 tokens — 2-4x higher than most frontier models, which typically cap output at 32K-64K. This high output cap makes Muse Spark particularly useful for long-form generation tasks such as extended reports, large code files, or multi-section analysis. Meta has not published needle-in-haystack or long-context recall benchmarks showing how well the model performs above 100K input tokens, so independent verification of long-context quality is not available as of mid-2026. For context window comparison: Gemini 3.1 Pro offers 1M tokens, Claude Opus 4.6 offers 200K, and GPT-5.4 offers 128K as its standard context size. If your workload involves documents above 100K tokens and you need verified long-context recall accuracy, Claude Opus 4.6's internal needle-in-haystack data is currently more documented.
How does Muse Spark compare to GPT-5.4 and Claude Opus 4.6 on benchmarks?
Muse Spark scores 89.5% on GPQA Diamond, 56.6% on SWE-bench Verified, 78.2% on MMLU-Pro, and 1491 Elo on Chatbot Arena (rank 5, May 2026). The Artificial Analysis Intelligence Index places Muse Spark at 52, versus 57 for Gemini 3.1 Pro and GPT-5.4, and 53 for Claude Opus 4.6. On GPQA Diamond specifically, Muse Spark's 89.5% is a strong result competitive with or ahead of those models. On SWE-bench Verified (software engineering), Muse Spark's 56.6% trails the top performers significantly — models in the 70-80% range handle autonomous coding loops more reliably. Chatbot Arena human preference voting at 1491 Elo places Muse Spark fifth globally, which reflects genuine multimodal strength in human evaluations even where coding-specific benchmarks are weaker. The benchmark gap translates practically: Muse Spark is a strong choice for science reasoning and multimodal tasks, but not the right model for autonomous coding agents where higher SWE-bench scores directly correlate with task completion rates.
Is Muse Spark open source or proprietary?
Muse Spark is fully proprietary. Meta has not released the model weights and has not announced plans to do so under any timeline. This is a significant departure from Meta's prior strategy: all Llama-family models (Llama 1 through Llama 4) were released as open weights under a commercial license that allowed download, fine-tuning, and self-hosting. Muse Spark marks Meta's first closed frontier model. The shift was announced without prior indication and reflects Meta's belief that frontier-level capability now requires keeping weights private to maintain competitive advantage. Consumer access is available for free via meta.ai and the Meta AI app. API access is in private preview for select partners, with no public endpoint as of mid-2026. Developers who need an open-weights Meta-architecture model today should use Llama 4 Maverick or Llama 4 Scout, available via Together AI, Fireworks AI, AWS Bedrock, and Google Vertex AI. Meta has stated it may release open-source weights in the future but has given no timeline.
What modalities does Muse Spark support?
Muse Spark supports text, image, and audio as inputs, with text and tool-calls as outputs. Unlike prior-generation multimodal models that attach a separate vision encoder to a language backbone, Muse Spark processes text, images, and voice in a unified architecture where visual information is integrated synchronously at the model level. Visual chain-of-thought is supported, meaning the model can step through image-based reasoning problems iteratively rather than producing a single-pass answer. Tool use and function calling are native capabilities. The model also supports multi-agent orchestration through Contemplating Mode, which runs multiple reasoning agents in parallel on complex tasks. Video input is not supported in the initial release. Audio output is not supported — Muse Spark produces text responses, not synthesized speech. For applications that need audio output, a separate text-to-speech model must be paired with Muse Spark. Compared to GPT-5.4 (which supports audio-in and audio-out natively) and Gemini 3.1 Pro (which supports video-in), Muse Spark's modality coverage is strong on vision and voice input but lacks output-side audio and video understanding.
Does Muse Spark train on user data?
Meta has not published a specific zero-retention or data handling policy for the Muse Spark API, as the API is in private preview and no public rate card or terms exist. Consumer usage via meta.ai is subject to Meta's standard privacy policy, which allows Meta to use interactions to improve products and services unless the user has opted out through Meta's data controls. API partners under the private preview program operate under separate terms that are not publicly disclosed. There is no confirmed SOC 2 Type II, ISO 27001, or HIPAA-eligible option for Muse Spark as of mid-2026. GDPR compliance applies for EU users under Meta's general data processing policies. For enterprise teams with strict data governance requirements — including zero-retention, SOC 2, or HIPAA — Anthropic's Claude API or OpenAI's enterprise tiers are better-documented options today. Check ai.meta.com for updates as the public API matures.
Who should use Muse Spark and who should avoid it?
Muse Spark is best for consumer users already in the Meta ecosystem who want the most capable Meta AI experience; researchers studying parallel multi-agent inference and Contemplating Mode architectures; developers accepted into Meta's private API preview building next-generation Meta-integrated applications; and teams benchmarking frontier model capabilities for future planning. It is not suited for teams that need a stable production API today — the private-preview-only status with no public launch date confirmed makes it impossible to build reliable production systems. Agentic software engineering teams should use models with SWE-bench scores above 70% rather than Muse Spark's 56.6%. Enterprise buyers needing SOC 2, HIPAA, or zero-retention guarantees should look at Anthropic or OpenAI enterprise tiers instead. For open-weights needs, Llama 4 Maverick (Meta's last open-weights frontier model) remains the better choice and is available through major API providers today. In short: Muse Spark is an impressive capability demonstration but not a practical API option until Meta opens access publicly.