Name: Claude Opus 4.6: 1M Context, 80.8% SWE-bench | hokai.io
Brand: Anthropic
Price: 5.00 USD
Availability: InStock

Question 1

What is Claude Opus 4.6 and who built it?

Accepted Answer

Claude Opus 4.6 is Anthropic's fourth-generation flagship large language model, released on February 5, 2026. It is built on a dense Transformer architecture with support for extended and adaptive reasoning. Parameter count is not disclosed, though independent analysts estimate it in the high hundreds of billions. The model sits at the top of Anthropic's generally available lineup at release, succeeding Claude Opus 4.5 from November 2025. It was purpose-built for long-horizon agentic tasks: software engineering, multi-step research, financial analysis, and document work that spans hours and millions of tokens. On SWE-bench Verified it scores 80.8%, leading all generally available models at release and coming within 1 point of GPT-5.3-Codex. On GPQA Diamond it scores 91.3%, and it leads the field on Humanity's Last Exam and Terminal-Bench 2.0. The API model ID is claude-opus-4-6 and pricing is $5 per million input tokens and $25 per million output tokens.

Question 2

How much does Claude Opus 4.6 cost per 1M tokens?

Accepted Answer

Claude Opus 4.6 costs $5.00 per million input tokens and $25.00 per million output tokens on the standard pay-as-you-go API. Prompt caching cache hits cost $0.50 per million tokens, which is 10% of the base input rate. Cache writes cost $6.25 per million tokens for a 5-minute cache duration or $10.00 per million tokens for a 1-hour cache duration. The Batch API reduces both input and output by 50%, bringing prices to $2.50 and $12.50 per million tokens respectively for asynchronous workloads with up to 24-hour turnaround. For a practical cost example: a 100K-token document analysis with a 50K-token response costs approximately $1.75. A daily agentic coding loop using 1 million input tokens and 200K output tokens costs about $10.00 at standard rates. Fast mode beta provides 2.5x throughput at 6x pricing ($30/$150 per million tokens). The full 1M token context window is billed at the same per-token rate as short prompts, with no long-context surcharge since March 13, 2026.

Question 3

What is Claude Opus 4.6's context window and max output?

Accepted Answer

Claude Opus 4.6 has a 1 million token context window, which became generally available at standard pricing on March 13, 2026. The maximum output per synchronous API call is 128,000 tokens. On the Message Batches API, max output can be raised to 300,000 tokens using the output-300k-2026-03-24 beta header. Long-context recall is strong: on MRCR v2, the 1 million token retrieval benchmark, Opus 4.6 scores 76%, compared to Sonnet 4.5 at 18.5%. The compaction API (beta) lets the model summarize its own context to extend conversations effectively beyond the 1M token limit. For comparison, Claude Opus 4.5 had a 200K token context window, and Claude Opus 4.7 (the successor) also has a 1M token context window. PDFs and multi-file inputs are handled natively through the document API, with files processed as part of the input token count. There is no sliding window or KV truncation behavior; the full context is processed at the same per-token rate.

Question 4

How does Claude Opus 4.6 compare on benchmarks vs GPT-5 and Gemini?

Accepted Answer

On SWE-bench Verified, the leading real-world software engineering benchmark, Claude Opus 4.6 scores 80.8%, which leads all generally available models at its February 2026 release date and trails only GPT-5.3-Codex by roughly 0.8 points. On GPQA Diamond, doctoral-level reasoning across biology, chemistry, and physics, Opus 4.6 scores 91.3%, trailing GPT-5.4 Pro (94.4%) and Gemini 3.1 Pro (94.3%) which were released later. On GDPval-AA, the professional knowledge work benchmark, Opus 4.6 leads GPT-5.2 by 144 Elo points and its predecessor Opus 4.5 by 190 points. It holds the top score on Humanity's Last Exam, BrowseComp, and Terminal-Bench 2.0 at release. AIME 2025 scores vary by source: one comparative analysis puts it at 75.5%. HumanEval coding accuracy is 95%. Anthropic did not publish an MMLU score for Opus 4.6, as MMLU is considered saturated (88-94% for all frontier models) and no longer meaningfully differentiates at this tier. Successor Claude Opus 4.7 raised SWE-bench to 87.6% and GPQA to 94.2% in April 2026.

Question 5

Is Claude Opus 4.6 open source or proprietary?

Accepted Answer

Claude Opus 4.6 is proprietary with no downloadable weights. It is accessible exclusively through APIs: the direct Anthropic API (api.anthropic.com), AWS Bedrock (model ID: anthropic.claude-opus-4-6-v1), Google Vertex AI (model ID: claude-opus-4-6), and Microsoft Foundry (Azure). Authentication on the direct API requires an API key; Bedrock uses AWS IAM; Vertex uses GCP IAM; Foundry uses Azure Active Directory billing. There are no quantization options, no VRAM requirements to manage, and no GGUF or ONNX weight files. Commercial use is governed by Anthropic's Commercial Terms. There are no restrictions on the type of commercial application beyond the acceptable use policy. Teams requiring self-hosted or air-gapped deployment should evaluate open-weights alternatives such as Llama 4 or Qwen3-235B. Anthropic has stated the model will not be retired before February 5, 2027 on Google Vertex AI, giving enterprise teams at least a one-year API availability window.

Question 6

What modalities does Claude Opus 4.6 support?

Accepted Answer

Claude Opus 4.6 accepts text and image inputs, handling charts, diagrams, screenshots, PDFs, and photographs natively. PDF files are processed through the document API and count toward the input token budget. Audio input and audio output are not supported; voice applications must pair Opus 4.6 with a separate automatic speech recognition model for input and a text-to-speech model for output. Video input is also not supported. Output is text only, plus structured tool call results. Function calling supports parallel tool calls and native JSON mode. The model has full computer use capability, enabling it to read screen contents and control a desktop environment via the computer use API. Context compaction (beta) enables server-side summarization so conversations can extend beyond the 1M token context window. Compared to Google Gemini 2.5 Pro, which supports native audio and video inputs, Opus 4.6 has a narrower modality footprint but stronger performance on text-based reasoning and coding benchmarks.

Question 7

Does Claude Opus 4.6 train on user data?

Accepted Answer

Anthropic does not train Claude Opus 4.6 on API inputs by default. Inputs and outputs sent through the direct Anthropic API are retained for 30 days for abuse monitoring purposes, then deleted unless flagged. Users can opt out of even this retention through Anthropic's enterprise zero-retention plan, which is available to qualifying enterprise customers. Claude Opus 4.6 meets SOC 2 Type II and ISO 27001 standards and is eligible for HIPAA Business Associate Agreements. It is GDPR compliant and Anthropic classifies it as a general-purpose AI with systemic risk obligations under the EU AI Act. API inputs processed through AWS Bedrock or Google Vertex AI are subject to those platforms' respective data handling terms rather than Anthropic's directly; each platform offers its own data residency options. The Anthropic direct API offers US and EU data residency options with a 1.1x pricing multiplier for US-only inference via the inference_geo parameter. The trust and compliance documentation is available at anthropic.com/transparency.

Question 8

Who is Claude Opus 4.6 best for and who should avoid it?

Accepted Answer

Claude Opus 4.6 is the strongest choice for autonomous coding agents: its 80.8% SWE-bench Verified score and METR-verified 14.5-hour task horizon make it reliable for long CI/CD and code review loops. Teams processing legal documents, research papers, or financial filings that exceed 200K tokens benefit from the 1M context at standard pricing, with no chunking required for most real-world corpora. Professional knowledge workers in finance and legal domains get the best accuracy on GDPval-AA, beating GPT-5.2 by 144 Elo points. Multi-agent orchestration teams gain from the agent teams feature, which allows parallel independent Claude instances. Teams that should avoid it include: real-time voice assistant developers, because there is no audio I/O and the 1.6-second time-to-first-token is too slow; mobile or edge teams, because there is no self-hosted option; and cost-sensitive teams running high-volume short-context tasks, where Claude Haiku 4.5 at $1/$5 per million tokens delivers similar accuracy at 80% lower cost. Teams starting new long-horizon agentic coding projects should also consider migrating directly to Claude Opus 4.7, which raised SWE-bench from 80.8% to 87.6%.

Claude Opus 4.6: 1M Context, 80.8% SWE-bench | hokai.io

About Claude Opus 4.6

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions