Name: Sakana Fugu: 95.5% GPQA, $5/M Tokens (2026 Review)
Brand: Sakana AI
Price: 5.00 USD
Availability: InStock

Question 1

What is Sakana Fugu and who built it?

Accepted Answer

Sakana Fugu is a multi-agent orchestration model released June 22, 2026 by Sakana AI, a Tokyo-based startup founded in July 2023. Unlike a standard single-model LLM, Fugu is itself a trained language model that coordinates a pool of frontier LLMs: it decides whether to answer a query directly or break it into subtasks, delegate to specialist models, verify outputs, and synthesize one response behind a single OpenAI-compatible API endpoint. The system is built on two ICLR 2026 papers: TRINITY (which assigns Thinker, Worker, and Verifier roles to models in the pool) and Conductor (which learns natural-language coordination strategies via reinforcement learning). Fugu comes in two variants: Fugu (optimized for lower latency on everyday tasks) and Fugu Ultra (tuned for maximum accuracy on hard multi-step problems, with a 1,000,000-token context window). Fugu Ultra scored 95.5% on GPQA Diamond and 73.7% on SWE-Bench Pro in Sakana's June 2026 technical report; both scores are vendor-reported and not yet independently verified. The model is built by the same team behind The AI Scientist (published in Nature) and AB-MCTS (NeurIPS 2025 Spotlight).

Question 2

How much does Sakana Fugu cost per 1M tokens?

Accepted Answer

Sakana Fugu Ultra is priced at $5.00 per million input tokens and $30.00 per million output tokens for contexts up to 272,000 tokens, with cached input at $0.50 per million tokens. For extended context requests (272K to 1,000,000 tokens), the rate rises to $10.00 input and $45.00 output per million tokens. The base Fugu tier does not have a flat rate: it charges at the rate of whichever underlying frontier model handles the request, and when multiple models are active, Sakana only charges for the most expensive one (no stacking). Subscription plans are available at $20/month (Standard), $100/month (Pro, roughly 10x the Standard usage allowance), and $200/month (Max, 20x Standard). A practical cost example: 10 Fugu Ultra requests averaging 5,000 input tokens and 2,000 output tokens each would cost roughly $0.31. A daily engineering loop of 1 million input and 200,000 output tokens via Fugu Ultra would run approximately $11. Compared to Claude Opus 4.8 ($5 input / $25 output per 1M tokens), Fugu Ultra's output rate ($30) is meaningfully higher.

Question 3

What is Sakana Fugu's context window and max output?

Accepted Answer

Sakana Fugu Ultra supports a maximum context window of 1,000,000 tokens. The standard pricing tier covers contexts up to 272,000 tokens ($5/$30 per 1M input/output). Requests exceeding 272K tokens shift to the extended context rate ($10/$45 per 1M). The maximum output is 128,000 tokens. Compared to competing models, the 1M context window matches Claude Opus 4.8's maximum context, while GPT-5.5 and Gemini 3.1 Pro also offer large context options. Sakana has not published a needle-in-haystack or long-context recall eval for Fugu Ultra as of June 2026, so long-context recall quality at the full 1M limit has not been independently characterized. The standard Fugu tier's context window is not separately specified in available documentation and likely matches the context of the underlying model chosen for each request.

Question 4

How does Sakana Fugu compare on benchmarks vs Claude Opus 4.8?

Accepted Answer

On GPQA Diamond (graduate-level reasoning), Fugu Ultra scored 95.5% vs Claude Opus 4.8's 92.0%, a 3.5-point gap in Fugu's favor. On SWE-Bench Pro (Sakana's engineering evaluation), Fugu Ultra scored 73.7% vs Opus 4.8's 69.2%, a 4.5-point gap. On Humanity's Last Exam, Fugu Ultra scored 50.0% vs Opus 4.8's 49.8%, essentially tied. On LiveCodeBench (competitive coding), Fugu scored 93.2% vs Gemini 3.1 Pro's 88.5%. However, all Fugu scores are reported in Sakana's own June 2026 technical report and have not been independently reproduced. SWE-Bench Pro and the standard SWE-bench Verified are different evaluations: Opus 4.8's published SWE-bench Verified score (88.6%) and Fugu's SWE-Bench Pro score (73.7%) are not directly comparable numbers. Claude Opus 4.8 also has independently verified benchmark results and lower latency, while Fugu Ultra is the faster-latency-variance choice only if multi-model coordination quality is more important than response-time predictability.

Question 5

Is Sakana Fugu open source or proprietary?

Accepted Answer

Sakana Fugu is proprietary and API-only. There are no open weights, no Hugging Face repository, and no option to self-host or run the model air-gapped. The Fugu API is served at https://api.sakana.ai/v1 and is OpenAI-compatible, meaning existing OpenAI SDK code works with only a base URL and API key change. There is no deployment of Fugu on AWS Bedrock, Google Vertex AI, or Microsoft Azure as of June 2026. Sakana AI has not published a commercial license document separately from its standard terms; the license is proprietary. For teams that need open-weights models for on-premise deployment, alternatives include Llama 4 (Meta AI, open-weights) or Mistral AI's open-source models. The underlying models in Fugu's pool are third-party proprietary models; Sakana does not publish which models are in the pool by default.

Question 6

What modalities does Sakana Fugu support?

Accepted Answer

Sakana Fugu supports text input and text output, plus tool-calls for function calling and structured outputs. The API is OpenAI-compatible and supports the standard messages format with system, user, and assistant roles. As of June 2026, Fugu does not accept image, audio, video, or PDF inputs natively; it is a text-in, text-out orchestration model. Structured output via tool-calls is supported, though Sakana has not published detailed documentation on supported tool schemas or parallel tool-call behavior. Code generation is a supported use case given the SWE-Bench Pro score, but there is no native code execution sandbox within the Fugu API itself. Compared to Claude Opus 4.8 (which supports vision and PDF inputs) or GPT-5.5 (which supports multimodal inputs), Fugu is text-only in its current form, which limits its applicability for workflows requiring image or document understanding.

Question 7

Does Sakana Fugu train on user data?

Accepted Answer

Sakana AI has not published a data retention policy, privacy policy, or trust center as of June 2026. The company has not publicly confirmed whether API inputs are used for training future models, what the default data retention period is, or whether an enterprise zero-retention option exists. The API is based in Japan (ap-northeast-1 region), which may be relevant for Japanese data sovereignty requirements, but no formal data residency commitment has been documented. Sakana Fugu has not confirmed SOC 2 Type II, ISO 27001, HIPAA eligibility, or GDPR compliance certifications. Enterprise customers with strict data governance requirements should contact Sakana AI directly to request a data processing agreement (DPA) or security documentation before using the API in production. This is a notable gap compared to Anthropic (SOC 2 Type II, HIPAA, GDPR), OpenAI (SOC 2 Type II, GDPR), and Google Vertex (all major certs).

Question 8

Who is Sakana Fugu best for and who should avoid it?

Accepted Answer

Sakana Fugu is best for enterprise teams in Japan needing export-control-resilient LLM routing (the swappable model pool can exclude US-restricted providers without API changes), research teams running complex multi-step reasoning where GPQA-class quality is needed, and engineering teams already using the OpenAI SDK who want a frontier-level coding alternative with a drop-in endpoint change. It is also suitable for batch workflows where response latency can exceed minutes. Teams that should avoid Fugu Ultra include real-time chat or voice applications (11-second to 4-minute latency variance is disqualifying), cost-sensitive high-volume API consumers ($30 output per 1M tokens is among the highest in the market), and enterprises requiring SOC 2, HIPAA, or GDPR certification before procurement (none confirmed as of June 2026). For latency-sensitive needs, Claude Opus 4.8 or GPT-5.5 are better choices. For cost-sensitive bulk inference, GPT-4o mini or Gemini Flash are significantly cheaper. For open-weights or self-hosting requirements, Llama 4 or Mistral Large are the right alternatives.

Sakana Fugu: 95.5% GPQA, $5/M Tokens (2026 Review)

About Sakana Fugu

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions