Fugu by Sakana AI: Multi-Model Router (2026)

Sakana Fugu, launched June 2026 by Sakana AI, routes tasks across GPT-5.5, Gemini 3.1 Pro and Claude. Fugu Ultra hits 73.7% SWE-Bench Pro at $5/$30 per 1M.

Sakana Fugu, released June 22, 2026 by Sakana AI, routes each request across GPT-5.5, Gemini 3.1 Pro, and Claude using a learned Thinker/Worker/Verifier coordinator, with Fugu Ultra scoring 73.7% on SWE-Bench Pro and 95.5% on GPQA Diamond. It costs $5 input and $30 output per 1M tokens (double above 272K context) or $20 to $200 monthly.

Sakana Fugu, released June 22, 2026 by Tokyo's Sakana AI, is a multi-agent orchestration system exposed as a single OpenAI-compatible model. Fugu Ultra scores 73.7% on SWE-Bench Pro and 95.5% on GPQA Diamond by dynamically routing tasks across GPT-5.5, Gemini 3.1 Pro, and Claude. It costs $5 input and $30 output per 1M tokens, plus $20-$200 monthly subscription tiers.

Provider: Sakana AI · Family: Fugu

Input modalities: text, tool-calls · Output: text, tool-calls

About Fugu

Fugu is a multi-agent orchestration system built by Sakana AI, the Tokyo research lab founded in July 2023 by David Ha, Llion Jones, and Ren Ito. Sakana publicly launched Fugu on June 22, 2026, positioning it not as a single trained foundation model but as a coordinator that dynamically assembles a team of frontier LLMs (GPT-5.5, Gemini 3.1 Pro, and Anthropic's Claude family) to answer each incoming request, presenting the whole system to developers as one OpenAI-compatible endpoint. The launch landed during a two-and-a-half-week window (June 12 to June 30, 2026) when US export controls temporarily blocked access to Anthropic's Fable 5 and Mythos 5 models, and Sakana explicitly marketed Fugu as a hedge against exactly that kind of single-vendor disruption. On benchmarks Sakana publishes for the higher Fugu Ultra tier, it scores 73.7% on SWE-Bench Pro, 82.1% on TerminalBench 2.1, 93.2% on LiveCodeBench, 50.0% on Humanity's Last Exam, and 95.5% on GPQA Diamond, describing these as putting it shoulder to shoulder with Fable 5 and Mythos Preview across coding, reasoning, and scientific benchmarks. The base Fugu tier trails Ultra on most axes (59.0% SWE-Bench Pro, 80.2% TerminalBench 2.1, 92.9% LiveCodeBench, 47.2% Humanity's Last Exam, matching Ultra's 95.5% on GPQA Diamond). These are Sakana's own reported numbers; no independent third party had reproduced them at launch, since the models Fugu benchmarks itself against (Fable 5, Mythos Preview) were themselves under export restriction during the comparison window. Architecturally, Fugu is grounded in two Sakana papers accepted to ICLR 2026: TRINITY, a compact coordinator model trained with evolutionary optimization that assigns Thinker, Worker, and Verifier roles to pooled LLMs sequentially without merging their weights, and Conductor, a reinforcement-learning-trained model that designs the communication topology between agents and generates the targeted instructions each one receives. Sakana has not disclosed a parameter count for the coordinator itself, nor published a context window or maximum output token limit for either Fugu tier. The only length-related detail confirmed on Sakana's own pricing page is a 272,000-token threshold: input, output, and cached-input rates for Fugu Ultra roughly double once a request crosses that point. Modality support is text-only as far as Sakana documents it; there is no mention of image, audio, video, or PDF input on the product or release pages. Tool-calling and structured output are supported through the OpenAI-compatible Chat Completions and Responses endpoints, since Fugu is explicitly designed as a drop-in swap for existing OpenAI SDK clients with no migration required. Pricing runs on two parallel models. A monthly subscription covers both Fugu and Fugu Ultra: Standard at $20 for light daily use, Pro at $100 for 10 times the Standard allowance, and Max at $200 for 20 times the allowance. Separately, pay-as-you-go token pricing for Fugu Ultra (pinned build fugu-ultra-20260615) is $5 per 1M input tokens and $30 per 1M output tokens below the 272K threshold, rising to $10 and $45 above it; cached input runs $0.50 per 1M tokens ($1.00 above 272K). Base Fugu bills at the standard rate of whichever underlying model handled the request, with no extra fee stacked on top for routing. Deployment is direct API only, through console.sakana.ai, plus third-party access via OpenRouter (listed as sakana/fugu-ultra) and integrations with the Vercel AI Gateway, opencode, Creao, and Merge. There is no Bedrock, Vertex, or Azure listing. Sakana also ships a one-line CLI installer (curl -fsSL https://sakana.ai/fugu/install | bash) that wires Fugu into OpenAI's Codex coding agent under the command codex-fugu, covering Ubuntu and macOS with manual steps for Windows. Sakana has not published a system card, training data cutoff, or red-teaming disclosure for Fugu itself; safety posture is effectively inherited from whichever underlying model a given request is routed to (GPT-5.5, Gemini 3.1 Pro, or Claude), each of which publishes its own safety documentation independently. Weights are fully closed. Sakana states the routing and orchestration logic is 'not exposed by design,' and the GitHub repository (SakanaAI/fugu) contains only the CLI installer script, not any model weights. Independent reception was more skeptical than Sakana's own framing. A Hacker News discussion following the launch questioned whether Fugu was meaningfully different from OpenRouter's existing multi-model routing, and users reported the $200 Max plan yielding under three hours of sustained heavy use per week, calling the API noticeably slow and the output quality 'nowhere near' the Fable 5 and Mythos Preview models it benchmarks against. Coverage from TechCrunch, VentureBeat, and the-decoder framed Fugu as part of a wider wave of Asian AI labs launching Mythos-adjacent products during Anthropic's export-control gap, describing the reception as cautiously optimistic in the press while noting the harsher grassroots sentiment. Fugu is best suited to teams that want one endpoint hedged against any single frontier vendor being unavailable, rate-limited, or export-restricted, and that are comfortable with usage-capped subscription tiers or metered token billing instead of a guaranteed fixed-model SLA. It is a poor fit for vision or audio workloads, for teams that need a documented context window or system card before committing budget, or for cost-sensitive high-volume production use until the reported latency and usage-cap issues are independently resolved.

Pricing

Fugu Ultra (fugu-ultra-20260615) token pricing: $5/1M input and $30/1M output below 272K tokens of context, rising to $10/1M input and $45/1M output above it; cached input is $0.50/1M ($1.00/1M above 272K). Base Fugu bills at the standard rate of whichever underlying model handled the request, with no routing surcharge. Separately, monthly subscriptions cover both tiers: Standard $20, Pro $100 (10x usage), Max $200 (20x usage).

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is Fugu and who built it?

Fugu is a multi-agent orchestration system built by Sakana AI, a Tokyo research lab founded in July 2023 by David Ha, Llion Jones, and Ren Ito. Sakana launched Fugu publicly on June 22, 2026, positioning it not as a single trained foundation model but as a coordinator that assembles a team of frontier LLMs, GPT-5.5, Gemini 3.1 Pro, and Anthropic's Claude family, for each incoming request. The system is grounded in two ICLR 2026 papers: TRINITY, a compact coordinator trained with evolutionary optimization that assigns Thinker, Worker, and Verifier roles without merging model weights, and Conductor, a reinforcement-learning-trained router that designs agent communication topology. Fugu Ultra is the higher-accuracy tier, scoring 73.7% on SWE-Bench Pro and 95.5% on GPQA Diamond by Sakana's own reporting. The launch landed during a two-and-a-half-week window when US export controls blocked Anthropic's Fable 5 and Mythos 5, and Sakana explicitly marketed Fugu as a hedge against that kind of single-vendor disruption. It is exposed to developers as a single OpenAI-compatible API endpoint.

How much does Fugu cost per 1M tokens?

Fugu Ultra (pinned build fugu-ultra-20260615) costs $5 per 1M input tokens and $30 per 1M output tokens for requests under roughly 272,000 tokens of context, rising to $10 input and $45 output above that threshold. Cached input costs $0.50 per 1M tokens below the threshold and $1.00 per 1M above it. Base Fugu bills at the standard rate of whichever underlying model handled a given request, with no extra routing fee stacked on top. Separately, Sakana offers monthly subscriptions covering both tiers: Standard at $20 for light daily use, Pro at $100 for 10 times the Standard allowance, and Max at $200 for 20 times the allowance. A 500K-token agentic coding session (above the 272K threshold) on Fugu Ultra's pay-as-you-go plan works out to roughly $9.50. Hacker News users reported the $200 Max subscription capping out under 3 hours of sustained heavy use per week, so pay-as-you-go may work out cheaper for bursty workloads.

What is Fugu's context window and max output?

Sakana has not published an exact context window or maximum output token limit for either Fugu tier as of the June 2026 launch. The only length-related figure confirmed on Sakana's own pricing page is a roughly 272,000-token threshold at which Fugu Ultra's input, output, and cached-input rates approximately double. There is no stated separate extended-context tier, and no published data on long-context recall accuracy above that point. Because Fugu routes requests to different underlying models (GPT-5.5, Gemini 3.1 Pro, Claude), effective context handling may vary by which model a given request lands on, though Sakana does not document this per-request behavior. Compared to vendors that publish exact figures, such as Anthropic's 1M-token Claude tiers, this is a notable transparency gap. Developers with strict long-document requirements should test their specific workload directly rather than relying on a published spec.

How does Fugu compare on benchmarks vs Anthropic's Fable 5 and Mythos Preview?

Sakana states that Fugu Ultra performs 'shoulder to shoulder' with Fable 5 and Mythos Preview across coding, reasoning, and scientific benchmarks, citing scores of 73.7% on SWE-Bench Pro, 82.1% on TerminalBench 2.1, 93.2% on LiveCodeBench, 50.0% on Humanity's Last Exam, and 95.5% on GPQA Diamond for Fugu Ultra. The base Fugu tier trails on most of these (59.0% SWE-Bench Pro, 80.2% TerminalBench 2.1, 92.9% LiveCodeBench, 47.2% Humanity's Last Exam) but matches Ultra's 95.5% on GPQA Diamond. These figures are Sakana's own reported numbers; no independent third party had reproduced them at launch, in part because Fable 5 and Mythos Preview were themselves under US export control during the exact comparison window Sakana used. Hacker News commenters who tested Fugu directly reported output quality 'nowhere near' Fable 5 and Mythos Preview, so the benchmark gap in Sakana's marketing did not match independent hands-on impressions at launch.

Is Fugu open source or proprietary?

Fugu is fully proprietary. Sakana states the routing and orchestration logic is 'not exposed by design,' and no model weights are published anywhere. The public GitHub repository (SakanaAI/fugu) contains only a shell-script CLI installer that wires Fugu into OpenAI's Codex coding agent under the command codex-fugu, not the model or coordinator itself. Access is API-only through console.sakana.ai, with third-party availability via OpenRouter (listed as sakana/fugu-ultra) and integrations with the Vercel AI Gateway, opencode, Creao, and Merge. There is no Hugging Face listing, no downloadable weights, and no offline or self-hosted deployment option. The exact license terms for the CLI installer script itself were not confirmed at research time.

What modalities does Fugu support?

As documented by Sakana, Fugu is text-only: there is no mention of image, audio, video, or PDF input on the product or release pages. Tool-calling and structured output are supported through the OpenAI-compatible Chat Completions and Responses endpoints, since Fugu is explicitly designed as a drop-in replacement for existing OpenAI SDK clients requiring no migration. There is no confirmation of parallel tool calls, computer use, or native JSON mode beyond what the OpenAI-compatible surface implies. Because Fugu routes each request to one of several pooled models internally, some multimodal capability could theoretically pass through depending on which underlying model handles a call, but Sakana does not document or guarantee this behavior. Teams needing confirmed vision or audio support should look to the underlying vendor APIs (GPT-5.5, Gemini 3.1 Pro, Claude) directly instead.

Does Fugu train on user data?

Sakana has not published a data retention policy, training-on-inputs disclosure, or compliance certifications (SOC 2, ISO 27001, HIPAA, GDPR) specifically for Fugu as of its June 2026 launch. No system card or safety documentation was found on Sakana's product or release pages. Because Fugu routes each request through third-party pooled models (GPT-5.5, Gemini 3.1 Pro, Claude), the effective data handling for a given request may depend on which underlying vendor processed it, though Sakana does not document this pass-through behavior explicitly. Teams with strict data governance requirements should treat Fugu's data handling as unconfirmed and verify directly with Sakana before sending sensitive data, rather than assuming parity with any single underlying vendor's policy.

Who is Fugu best for and who should avoid it?

Fugu suits engineering teams that want redundancy across GPT-5.5, Gemini 3.1 Pro, and Claude behind a single OpenAI-compatible endpoint, especially teams affected by vendor-specific disruptions like the June 2026 export-control gap that blocked Anthropic's Fable 5 and Mythos 5. It also fits developers building agentic coding workflows through the Codex CLI integration, where Fugu Ultra's 73.7% SWE-Bench Pro and 82.1% TerminalBench 2.1 scores are relevant. Teams should avoid Fugu for vision or audio workloads, since it is documented as text-only, and for cost-sensitive high-volume production, since Hacker News users reported the $200/month Max plan capping under 3 hours of heavy weekly use and called the API slow. Teams that need a guaranteed fixed underlying model, a documented context window, or a published system card before committing budget should also look elsewhere until Sakana discloses more, since Fugu's routing decisions are not exposed to the caller.

Visit Fugu Official Page