Name: Muse Spark Review: 262K Context & 1491 Arena Elo (2026)
Brand: Meta AI
Availability: InStock

Question 1

What is Muse Spark and who built it?

Accepted Answer

Muse Spark is the first model in Meta's Muse family, released on April 8, 2026, by Meta Superintelligence Labs (MSL) — a research division established by Meta CEO Mark Zuckerberg in June 2025 and led by Chief AI Officer Alexandr Wang (founder of Scale AI) and Chief Scientist Shengjia Zhao (ex-OpenAI). It is a natively multimodal frontier model that processes text, images, and voice in a unified architecture, rather than attaching a vision module to a language backbone. Muse Spark is the first Meta AI model released without open weights — a deliberate break from the company's Llama open-weights strategy. It scores 89.5% on GPQA Diamond and 56.6% on SWE-bench Verified, and reached 1491 Elo on Chatbot Arena in May 2026 (rank 5). The model powers the Meta AI assistant across Facebook, Instagram, WhatsApp, Messenger, and Ray-Ban AI glasses, reaching over 700 million monthly active users. Meta reports that Muse Spark achieves equivalent performance to Llama 4 Maverick with more than 10x less pretraining compute, suggesting a significant leap in training efficiency.

Question 2

How much does Muse Spark cost per 1M tokens?

Accepted Answer

Meta has not announced public API pricing for Muse Spark as of June 2026. API access is in private preview for select Meta partners, with no application portal or public waitlist. Consumer access via meta.ai and the Meta AI mobile app is free. Analysts estimate the public API, when it launches, will likely be priced at $3-6 per million input tokens and $20-30 per million output tokens, based on comparable benchmark positioning to GPT-5.4 and Claude Opus 4.6. For teams that need a stable cost model today, OpenAI's GPT-5.4 and Anthropic's Claude Opus 4.6 both have published pricing at similar capability tiers. If cost is a primary driver and API access is needed now, Llama 4 Maverick through Together AI or Fireworks AI may offer the best Meta-architecture option at a fraction of what Muse Spark is estimated to cost. Monitor ai.meta.com/blog/ for any official pricing announcements.

Question 3

What is Muse Spark's context window and max output?

Accepted Answer

Muse Spark has a context window of 262,144 tokens (approximately 262K tokens), which positions it between Claude's 200K context and Gemini's 1M extended context. The maximum output per response is 131,072 tokens — 2-4x higher than most frontier models, which typically cap output at 32K-64K. This high output cap makes Muse Spark particularly useful for long-form generation tasks such as extended reports, large code files, or multi-section analysis. Meta has not published needle-in-haystack or long-context recall benchmarks showing how well the model performs above 100K input tokens, so independent verification of long-context quality is not available as of mid-2026. For context window comparison: Gemini 3.1 Pro offers 1M tokens, Claude Opus 4.6 offers 200K, and GPT-5.4 offers 128K as its standard context size. If your workload involves documents above 100K tokens and you need verified long-context recall accuracy, Claude Opus 4.6's internal needle-in-haystack data is currently more documented.

Question 4

How does Muse Spark compare to GPT-5.4 and Claude Opus 4.6 on benchmarks?

Accepted Answer

Muse Spark scores 89.5% on GPQA Diamond, 56.6% on SWE-bench Verified, 78.2% on MMLU-Pro, and 1491 Elo on Chatbot Arena (rank 5, May 2026). The Artificial Analysis Intelligence Index places Muse Spark at 52, versus 57 for Gemini 3.1 Pro and GPT-5.4, and 53 for Claude Opus 4.6. On GPQA Diamond specifically, Muse Spark's 89.5% is a strong result competitive with or ahead of those models. On SWE-bench Verified (software engineering), Muse Spark's 56.6% trails the top performers significantly — models in the 70-80% range handle autonomous coding loops more reliably. Chatbot Arena human preference voting at 1491 Elo places Muse Spark fifth globally, which reflects genuine multimodal strength in human evaluations even where coding-specific benchmarks are weaker. The benchmark gap translates practically: Muse Spark is a strong choice for science reasoning and multimodal tasks, but not the right model for autonomous coding agents where higher SWE-bench scores directly correlate with task completion rates.

Question 5

Is Muse Spark open source or proprietary?

Accepted Answer

Muse Spark is fully proprietary. Meta has not released the model weights and has not announced plans to do so under any timeline. This is a significant departure from Meta's prior strategy: all Llama-family models (Llama 1 through Llama 4) were released as open weights under a commercial license that allowed download, fine-tuning, and self-hosting. Muse Spark marks Meta's first closed frontier model. The shift was announced without prior indication and reflects Meta's belief that frontier-level capability now requires keeping weights private to maintain competitive advantage. Consumer access is available for free via meta.ai and the Meta AI app. API access is in private preview for select partners, with no public endpoint as of mid-2026. Developers who need an open-weights Meta-architecture model today should use Llama 4 Maverick or Llama 4 Scout, available via Together AI, Fireworks AI, AWS Bedrock, and Google Vertex AI. Meta has stated it may release open-source weights in the future but has given no timeline.

Question 6

What modalities does Muse Spark support?

Accepted Answer

Muse Spark supports text, image, and audio as inputs, with text and tool-calls as outputs. Unlike prior-generation multimodal models that attach a separate vision encoder to a language backbone, Muse Spark processes text, images, and voice in a unified architecture where visual information is integrated synchronously at the model level. Visual chain-of-thought is supported, meaning the model can step through image-based reasoning problems iteratively rather than producing a single-pass answer. Tool use and function calling are native capabilities. The model also supports multi-agent orchestration through Contemplating Mode, which runs multiple reasoning agents in parallel on complex tasks. Video input is not supported in the initial release. Audio output is not supported — Muse Spark produces text responses, not synthesized speech. For applications that need audio output, a separate text-to-speech model must be paired with Muse Spark. Compared to GPT-5.4 (which supports audio-in and audio-out natively) and Gemini 3.1 Pro (which supports video-in), Muse Spark's modality coverage is strong on vision and voice input but lacks output-side audio and video understanding.

Question 7

Does Muse Spark train on user data?

Accepted Answer

Meta has not published a specific zero-retention or data handling policy for the Muse Spark API, as the API is in private preview and no public rate card or terms exist. Consumer usage via meta.ai is subject to Meta's standard privacy policy, which allows Meta to use interactions to improve products and services unless the user has opted out through Meta's data controls. API partners under the private preview program operate under separate terms that are not publicly disclosed. There is no confirmed SOC 2 Type II, ISO 27001, or HIPAA-eligible option for Muse Spark as of mid-2026. GDPR compliance applies for EU users under Meta's general data processing policies. For enterprise teams with strict data governance requirements — including zero-retention, SOC 2, or HIPAA — Anthropic's Claude API or OpenAI's enterprise tiers are better-documented options today. Check ai.meta.com for updates as the public API matures.

Question 8

Who should use Muse Spark and who should avoid it?

Accepted Answer

Muse Spark is best for consumer users already in the Meta ecosystem who want the most capable Meta AI experience; researchers studying parallel multi-agent inference and Contemplating Mode architectures; developers accepted into Meta's private API preview building next-generation Meta-integrated applications; and teams benchmarking frontier model capabilities for future planning. It is not suited for teams that need a stable production API today — the private-preview-only status with no public launch date confirmed makes it impossible to build reliable production systems. Agentic software engineering teams should use models with SWE-bench scores above 70% rather than Muse Spark's 56.6%. Enterprise buyers needing SOC 2, HIPAA, or zero-retention guarantees should look at Anthropic or OpenAI enterprise tiers instead. For open-weights needs, Llama 4 Maverick (Meta's last open-weights frontier model) remains the better choice and is available through major API providers today. In short: Muse Spark is an impressive capability demonstration but not a practical API option until Meta opens access publicly.

Muse Spark Review: 262K Context & 1491 Arena Elo (2026)

About Muse Spark

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions