Claude Opus 4.6: 1M Context, 80.8% SWE-bench | hokai.io

Claude Opus 4.6 by Anthropic (Feb 2026): 1M-token context, 80.8% SWE-bench Verified, 91.3% GPQA Diamond. Priced at $5 input / $25 output per 1M tokens.

Claude Opus 4.6 is Anthropic's frontier model released February 5, 2026. It achieves 80.8% on SWE-bench Verified and 91.3% on GPQA Diamond, features a 1 million token context window with 128K max output, and costs $5 per million input tokens and $25 per million output tokens. It leads on agentic coding, BrowseComp, and Humanity's Last Exam.

Claude Opus 4.6, released February 5, 2026, scores 80.8% on SWE-bench Verified and 91.3% on GPQA Diamond with a 1 million token context window. It costs $5 per million input tokens and $25 per million output tokens. Prompt caching reduces cached input cost to $0.50 per million tokens. Available via Anthropic API, AWS Bedrock, Google Vertex AI, and Azure.

Provider: Anthropic · Family: Claude 4

Context window: 1,000,000 tokens · Max output: 128,000

Input modalities: text, image, pdf, tool-calls · Output: text, tool-calls

About Claude Opus 4.6

Claude Opus 4.6 is Anthropic's fourth-generation flagship large language model, released on February 5, 2026. Built on a dense Transformer architecture with extended reasoning support, it succeeded Claude Opus 4.5 and sits at the top of Anthropic's lineup below the later Opus 4.7. The model was designed to push the limits of long-horizon agentic work: tasks spanning hours, dozens of tool calls, and context that would overflow any previous Opus model. Anthropic did not disclose a parameter count, but independent analysts estimate it in the high hundreds of billions. The API model ID is claude-opus-4-6. On benchmarks, Claude Opus 4.6 achieves 80.8% on SWE-bench Verified, the leading real-world software engineering evaluation, scoring within 1 point of GPT-5.3-Codex and ahead of every other generally available model at release time. On GPQA Diamond, which tests doctoral-level reasoning in biology, chemistry, and physics, it scores 91.3%. It also leads the field on Humanity's Last Exam, Terminal-Bench 2.0 (the agentic coding evaluation), and BrowseComp, which measures the ability to locate hard-to-find information online. On GDPval-AA, a benchmark for economically valuable knowledge work in finance and legal domains, it outperforms OpenAI's GPT-5.2 by 144 Elo points. The HumanEval coding score sits at 95% and MATH at 93%. The most significant architectural addition in Opus 4.6 is its 1 million token context window, initially launched in beta and made generally available at standard pricing on March 13, 2026. The max output is 128,000 tokens per request. Long-context recall is substantially stronger than previous Opus models: on MRCR v2, the 1 million token retrieval benchmark, Opus 4.6 scores 76%, compared to Sonnet 4.5 at 18.5%. On the Message Batches API, max output can be raised to 300,000 tokens using the output-300k-2026-03-24 beta header. The full 1 million token context is billed at the same per-token rate regardless of prompt length, with no long-context surcharge. Input modalities are text and image. Opus 4.6 handles charts, diagrams, screenshots, PDFs, and photographs with strong visual reasoning. Audio input is not supported. Output is text only, plus tool call results. The model has full computer use capability, letting it read screens and take actions on a user's desktop. Native function calling supports parallel tool calls and structured JSON output. The model also gained two new API-level capabilities: adaptive thinking, where the model decides autonomously how much extended reasoning to apply based on task complexity, and effort controls, with four levels (low, medium, high, max) that let developers trade intelligence for speed and cost. A compaction API (beta) lets the model summarize its own context to sustain conversations indefinitely. Standard pricing is $5 per million input tokens and $25 per million output tokens, matching Claude Opus 4.5's price while delivering substantially better performance. Prompt caching costs $0.50 per million cache-hit tokens (10% of the base input rate), with cache write costs of $6.25 per million tokens for a 5-minute cache and $10 per million tokens for a 1-hour cache. Batch API requests receive a 50% discount, bringing prices to $2.50 input and $12.50 output per million tokens. A fast mode (beta research preview) runs the same model at 2.5x higher output token throughput at 6x standard rates ($30/$150 per million tokens). US-only inference via the inference_geo parameter costs 1.1x standard rates. A worked cost example: a 100K-token document analysis costs $0.50 for input and around $1.25 for a 50K-token output, totaling about $1.75. Opus 4.6 is available through four platforms. The direct Anthropic API (api.anthropic.com) offers the lowest latency and earliest access to new features. AWS Bedrock is available with native IAM authentication, VPC connectivity, and CloudWatch logging; the Bedrock model ID is anthropic.claude-opus-4-6-v1. Google Vertex AI is available with three endpoint types: global, multi-region, and regional (the last two carry a 10% premium); the Vertex ID is claude-opus-4-6. Microsoft Foundry (Azure) supports the model with 1 million token context and Azure billing. SDKs are available in Python, TypeScript, Java, Go, and Ruby (Go and Ruby do not support Foundry). Anthropic has stated the model will not be retired before February 5, 2027 on Vertex AI. Safety training for Opus 4.6 uses reinforcement learning from human feedback (RLHF) and reinforcement learning from AI feedback (RLAIF), along with Constitutional AI alignment. The training data cutoff is August 2025, with a reliable knowledge cutoff of May 2025. The system card, published simultaneously with the model and exceeding 200 pages, is the most detailed safety audit Anthropic had published at that time. Independent red teaming was conducted by METR (which reported a 50%-time-horizon of 14 hours 30 minutes on long-horizon task evaluations), Apollo Research, and the UK AISI. The model is deployed under ASL-3 safeguards. Anthropic noted that Opus 4.6 is in a gray zone near the ASL-4 threshold for autonomous AI capabilities. On ARC-AGI-2, it scores 69.17% with 120K thinking tokens, a state-of-the-art result at release. Teams choosing Opus 4.6 over alternatives should weigh several factors. For agentic coding, it outperforms Gemini 2.5 Pro and GPT-5.2 on SWE-bench Verified at launch and sustained tasks over hours. For long-document work above 500K tokens, it is one of only a few models with a verified 1 million token context at standard pricing. For professional knowledge work (legal, finance), the GDPval-AA lead of 144 Elo points over GPT-5.2 is meaningful. However, for real-time voice applications, Opus 4.6 has no native audio I/O and a time-to-first-token of roughly 1.6 seconds, making it unsuitable. For cost-sensitive inference at sub-second latency, Claude Haiku 4.5 ($1/$5 per million tokens) or Sonnet 4.6 ($3/$15 per million tokens) are better fits. Teams needing self-hosted or air-gapped deployment cannot use Opus 4.6: it is proprietary with no downloadable weights. Governance and compliance: Anthropic does not train on API inputs by default, with a 30-day retention window for abuse monitoring. Enterprise zero-retention plans are available. The model meets SOC 2 Type II, ISO 27001, and HIPAA-eligible standards, and is GDPR compliant. Anthropic classifies Opus 4.6 as a general-purpose AI with systemic risk obligations under the EU AI Act. Inputs sent through AWS Bedrock and Google Vertex AI are subject to those platforms' data handling terms rather than Anthropic's directly. The EU data residency option is available on the direct API. Claude Opus 4.6 was succeeded by Claude Opus 4.7 on April 16, 2026. Opus 4.7 delivers a step-change in agentic coding, raising SWE-bench Verified to 87.6% and GPQA Diamond to 94.2%, while maintaining the same $5/$25 per million token pricing. Opus 4.6 remains available as a legacy model, with Anthropic recommending migration to 4.7 for new projects. The earlier Claude Sonnet 4 and Claude Opus 4 (without the .5/.6 suffix) were deprecated on June 15, 2026, but Opus 4.6 itself has no announced deprecation date as of May 2026.

Pricing

$5.00 per 1M input tokens, $25.00 per 1M output tokens. Cache hits cost $0.50 per 1M tokens (10% of input rate). Cache writes cost $6.25/1M (5-min) or $10/1M (1-hour). Batch API: 50% off ($2.50/$12.50). Fast mode beta: 6x rates ($30/$150). US-only inference adds 1.1x multiplier.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is Claude Opus 4.6 and who built it?

Claude Opus 4.6 is Anthropic's fourth-generation flagship large language model, released on February 5, 2026. It is built on a dense Transformer architecture with support for extended and adaptive reasoning. Parameter count is not disclosed, though independent analysts estimate it in the high hundreds of billions. The model sits at the top of Anthropic's generally available lineup at release, succeeding Claude Opus 4.5 from November 2025. It was purpose-built for long-horizon agentic tasks: software engineering, multi-step research, financial analysis, and document work that spans hours and millions of tokens. On SWE-bench Verified it scores 80.8%, leading all generally available models at release and coming within 1 point of GPT-5.3-Codex. On GPQA Diamond it scores 91.3%, and it leads the field on Humanity's Last Exam and Terminal-Bench 2.0. The API model ID is claude-opus-4-6 and pricing is $5 per million input tokens and $25 per million output tokens.

How much does Claude Opus 4.6 cost per 1M tokens?

Claude Opus 4.6 costs $5.00 per million input tokens and $25.00 per million output tokens on the standard pay-as-you-go API. Prompt caching cache hits cost $0.50 per million tokens, which is 10% of the base input rate. Cache writes cost $6.25 per million tokens for a 5-minute cache duration or $10.00 per million tokens for a 1-hour cache duration. The Batch API reduces both input and output by 50%, bringing prices to $2.50 and $12.50 per million tokens respectively for asynchronous workloads with up to 24-hour turnaround. For a practical cost example: a 100K-token document analysis with a 50K-token response costs approximately $1.75. A daily agentic coding loop using 1 million input tokens and 200K output tokens costs about $10.00 at standard rates. Fast mode beta provides 2.5x throughput at 6x pricing ($30/$150 per million tokens). The full 1M token context window is billed at the same per-token rate as short prompts, with no long-context surcharge since March 13, 2026.

What is Claude Opus 4.6's context window and max output?

Claude Opus 4.6 has a 1 million token context window, which became generally available at standard pricing on March 13, 2026. The maximum output per synchronous API call is 128,000 tokens. On the Message Batches API, max output can be raised to 300,000 tokens using the output-300k-2026-03-24 beta header. Long-context recall is strong: on MRCR v2, the 1 million token retrieval benchmark, Opus 4.6 scores 76%, compared to Sonnet 4.5 at 18.5%. The compaction API (beta) lets the model summarize its own context to extend conversations effectively beyond the 1M token limit. For comparison, Claude Opus 4.5 had a 200K token context window, and Claude Opus 4.7 (the successor) also has a 1M token context window. PDFs and multi-file inputs are handled natively through the document API, with files processed as part of the input token count. There is no sliding window or KV truncation behavior; the full context is processed at the same per-token rate.

How does Claude Opus 4.6 compare on benchmarks vs GPT-5 and Gemini?

On SWE-bench Verified, the leading real-world software engineering benchmark, Claude Opus 4.6 scores 80.8%, which leads all generally available models at its February 2026 release date and trails only GPT-5.3-Codex by roughly 0.8 points. On GPQA Diamond, doctoral-level reasoning across biology, chemistry, and physics, Opus 4.6 scores 91.3%, trailing GPT-5.4 Pro (94.4%) and Gemini 3.1 Pro (94.3%) which were released later. On GDPval-AA, the professional knowledge work benchmark, Opus 4.6 leads GPT-5.2 by 144 Elo points and its predecessor Opus 4.5 by 190 points. It holds the top score on Humanity's Last Exam, BrowseComp, and Terminal-Bench 2.0 at release. AIME 2025 scores vary by source: one comparative analysis puts it at 75.5%. HumanEval coding accuracy is 95%. Anthropic did not publish an MMLU score for Opus 4.6, as MMLU is considered saturated (88-94% for all frontier models) and no longer meaningfully differentiates at this tier. Successor Claude Opus 4.7 raised SWE-bench to 87.6% and GPQA to 94.2% in April 2026.

Is Claude Opus 4.6 open source or proprietary?

Claude Opus 4.6 is proprietary with no downloadable weights. It is accessible exclusively through APIs: the direct Anthropic API (api.anthropic.com), AWS Bedrock (model ID: anthropic.claude-opus-4-6-v1), Google Vertex AI (model ID: claude-opus-4-6), and Microsoft Foundry (Azure). Authentication on the direct API requires an API key; Bedrock uses AWS IAM; Vertex uses GCP IAM; Foundry uses Azure Active Directory billing. There are no quantization options, no VRAM requirements to manage, and no GGUF or ONNX weight files. Commercial use is governed by Anthropic's Commercial Terms. There are no restrictions on the type of commercial application beyond the acceptable use policy. Teams requiring self-hosted or air-gapped deployment should evaluate open-weights alternatives such as Llama 4 or Qwen3-235B. Anthropic has stated the model will not be retired before February 5, 2027 on Google Vertex AI, giving enterprise teams at least a one-year API availability window.

What modalities does Claude Opus 4.6 support?

Claude Opus 4.6 accepts text and image inputs, handling charts, diagrams, screenshots, PDFs, and photographs natively. PDF files are processed through the document API and count toward the input token budget. Audio input and audio output are not supported; voice applications must pair Opus 4.6 with a separate automatic speech recognition model for input and a text-to-speech model for output. Video input is also not supported. Output is text only, plus structured tool call results. Function calling supports parallel tool calls and native JSON mode. The model has full computer use capability, enabling it to read screen contents and control a desktop environment via the computer use API. Context compaction (beta) enables server-side summarization so conversations can extend beyond the 1M token context window. Compared to Google Gemini 2.5 Pro, which supports native audio and video inputs, Opus 4.6 has a narrower modality footprint but stronger performance on text-based reasoning and coding benchmarks.

Does Claude Opus 4.6 train on user data?

Anthropic does not train Claude Opus 4.6 on API inputs by default. Inputs and outputs sent through the direct Anthropic API are retained for 30 days for abuse monitoring purposes, then deleted unless flagged. Users can opt out of even this retention through Anthropic's enterprise zero-retention plan, which is available to qualifying enterprise customers. Claude Opus 4.6 meets SOC 2 Type II and ISO 27001 standards and is eligible for HIPAA Business Associate Agreements. It is GDPR compliant and Anthropic classifies it as a general-purpose AI with systemic risk obligations under the EU AI Act. API inputs processed through AWS Bedrock or Google Vertex AI are subject to those platforms' respective data handling terms rather than Anthropic's directly; each platform offers its own data residency options. The Anthropic direct API offers US and EU data residency options with a 1.1x pricing multiplier for US-only inference via the inference_geo parameter. The trust and compliance documentation is available at anthropic.com/transparency.

Who is Claude Opus 4.6 best for and who should avoid it?

Claude Opus 4.6 is the strongest choice for autonomous coding agents: its 80.8% SWE-bench Verified score and METR-verified 14.5-hour task horizon make it reliable for long CI/CD and code review loops. Teams processing legal documents, research papers, or financial filings that exceed 200K tokens benefit from the 1M context at standard pricing, with no chunking required for most real-world corpora. Professional knowledge workers in finance and legal domains get the best accuracy on GDPval-AA, beating GPT-5.2 by 144 Elo points. Multi-agent orchestration teams gain from the agent teams feature, which allows parallel independent Claude instances. Teams that should avoid it include: real-time voice assistant developers, because there is no audio I/O and the 1.6-second time-to-first-token is too slow; mobile or edge teams, because there is no self-hosted option; and cost-sensitive teams running high-volume short-context tasks, where Claude Haiku 4.5 at $1/$5 per million tokens delivers similar accuracy at 80% lower cost. Teams starting new long-horizon agentic coding projects should also consider migrating directly to Claude Opus 4.7, which raised SWE-bench from 80.8% to 87.6%.

Visit Claude Opus 4.6 Official Page