Claude Sonnet 4.6: Benchmarks, Pricing & API Guide | hokai.io
Claude Sonnet 4.6 by Anthropic (Feb 2026): 79.6% SWE-bench Verified, 1M-token context, $3/$15 per 1M tokens. Best for agentic coding and computer use.
Claude Sonnet 4.6 by Anthropic launched February 17, 2026, scoring 79.6% on SWE-bench Verified and 72.5% on OSWorld (computer use). It carries a 1-million-token context window at no premium, priced at $3 per million input tokens and $15 per million output tokens. The model is the default on claude.ai for Free and Pro users and is available on AWS Bedrock, Vertex AI, and Microsoft Foundry.
Claude Sonnet 4.6 is Anthropic's mid-tier production model released February 17, 2026. It scores 79.6% on SWE-bench Verified and 72.5% on OSWorld with a 1-million-token context window. Pricing is $3.00 per million input tokens and $15.00 per million output tokens. It is available via the Anthropic API, AWS Bedrock, Google Vertex AI, and Microsoft Foundry.
Provider: Anthropic · Family: Claude 4
Context window: 1,000,000 tokens · Max output: 64,000
Input modalities: text, image, pdf, tool-calls · Output: text, tool-calls
About Claude Sonnet 4.6
Claude Sonnet 4.6 is Anthropic's mid-tier flagship model, released on February 17, 2026. It sits in the Claude 4 family between the budget-oriented Haiku 4.5 and the research-grade Opus 4.6, and it replaced Sonnet 4.5 as the default model for Free and Pro users on claude.ai. The model uses a dense transformer architecture; Anthropic has not disclosed the parameter count, though third-party estimates place it in the 50-100 billion range. Sonnet 4.6 was designed to collapse the performance gap between Anthropic's mid-tier and flagship tiers, delivering near-Opus results on the tasks most development teams care about: agentic coding, long-document analysis, and computer use automation. On benchmark evaluations, Claude Sonnet 4.6 scores 79.6% on SWE-bench Verified, matching Claude Opus 4.6 (80.8%) within 1.2 points and placing it ahead of Claude Sonnet 4.5 (77.2%) by 2.4 points. On GPQA Diamond, the model scores 74.1%, compared to 91.3% for Opus 4.6 and 73.8% for GPT-5.2 — a meaningful gap in graduate-level scientific reasoning tasks that tips decisions toward Opus when deep domain expertise matters. On OSWorld (computer use), Sonnet 4.6 scores 72.5%, nearly matching Opus 4.6 at 72.7% and far ahead of GPT-5.2 at 38.2%. On MATH, the model achieves 89% accuracy, up from 62% on Sonnet 4.5. On ARC-AGI-2, Sonnet 4.6 scores 60.4%. Claude Sonnet 4.6 has a 1 million token context window, which became generally available on March 13, 2026, meaning no beta header is required and no long-context premium applies. A 900,000-token request bills at the same per-token rate as a 9,000-token request. The synchronous API supports up to 64,000 output tokens per call. For larger generation tasks, the Message Batches API supports up to 300,000 output tokens per call via the output-300k-2026-03-24 beta header. The model's reliable knowledge cutoff is August 2025; its training data cutoff extends to January 2026. Anthropic has not published a needle-in-haystack recall figure for Sonnet 4.6, but the 1M context window is architecturally shared with Opus 4.6, which showed strong recall in internal evaluations. The model also features adaptive context compaction for extended agentic sessions. Claude Sonnet 4.6 accepts text, image, and PDF inputs and produces text and tool-call outputs. Vision is live and supports up to 600 images or PDFs per request. Individual images can be up to 8,000x8,000 pixels. There is no native audio input or output. Function calling uses Anthropic's standard tool_use schema with support for parallel tool calls and structured JSON output. The model also supports computer use, including GUI navigation, web form completion, and spreadsheet manipulation. Adaptive thinking (the successor to extended thinking) lets the model decide how much internal reasoning to apply per query based on an effort parameter; the three standard levels are low, medium, and high, plus a max level for intensive tasks. Sonnet 4.6 also inherits Opus-level prompt injection resistance, which is a notable upgrade over Sonnet 4.5. Pricing for Claude Sonnet 4.6 on the Anthropic API is $3.00 per million input tokens and $15.00 per million output tokens. Prompt caching reduces costs substantially: a 5-minute cache write costs $3.75 per million tokens and a cache read costs $0.30 per million tokens (10% of the base input rate). A 1-hour cache write costs $6.00 per million tokens. The Batch API offers a 50% discount, bringing prices to $1.50 input and $7.50 output per million tokens, with results returned asynchronously and a 300K output token ceiling via beta header. To put the pricing in context: summarizing a 100,000-token research paper costs roughly $0.30; running a daily coding agent generating 1 million input tokens and 200,000 output tokens costs approximately $6.00; processing 1,000 customer support turns at an average 2,000 input and 500 output tokens each costs approximately $9.50. Claude Sonnet 4.6 is accessible via the Anthropic API (api.anthropic.com), AWS Bedrock (model ID: anthropic.claude-sonnet-4-6), Google Vertex AI (model ID: claude-sonnet-4-6), and Microsoft Foundry. On Bedrock, the model supports global cross-region, geo cross-region (US, EU, AU, JP), and select in-region endpoints. On Vertex, global, multi-region, and regional endpoints are available. Regional and multi-region endpoints on Bedrock and Vertex include a 10% premium over global routing. Authentication uses API keys for the Anthropic API and IAM credentials for Bedrock and Vertex. SDKs are available in Python, TypeScript, Java, Go, and Ruby; Go and Ruby do not support Microsoft Foundry. The model is closed-weights and cannot be self-hosted. Anthropic deployed Claude Sonnet 4.6 under AI Safety Level 3 (ASL-3), the same standard as Opus 4.6. The system card notes that automated safety evaluations placed Sonnet 4.6 at or below the capability level of Claude Opus 4.6, meaning it does not push the capability frontier beyond what was already managed under ASL-3 safeguards. The model did not cross the threshold for ASL-4 classification on biological domain uplift tasks. On cyber capability evaluations across more than 1,500 CyberGym tasks, Sonnet 4.6 found security flaws 65% of the time (versus 67% for Opus 4.6 and 83% for the experimental Claude Mythos Preview). The alignment method combines Constitutional AI with RLHF. Safety evaluations included agentic and computer use scenarios, prompt injection resistance, and alignment under unusual and extreme conditions. On some alignment measures, Sonnet 4.6 showed the strongest results Anthropic has recorded for any Claude model. Claude Sonnet 4.6 is the right choice for teams that need near-flagship coding and computer use performance at a lower cost per token. The 79.6% SWE-bench score and 72.5% OSWorld score place it well ahead of older Opus-class models. Teams running high-volume agentic coding loops that previously required Opus 4.5 can often downgrade to Sonnet 4.6 without a measurable quality drop. The model is also the right tool for long-document analysis tasks, given the 1M token context window at no premium. It is a poor fit for tasks requiring deep graduate-level scientific reasoning (GPQA 74.1% versus Opus 4.6's 91.3%), for real-time voice applications (no audio I/O), and for teams that need on-device deployment or air-gapped inference (closed weights, API-only). Gemini 3.1 Pro (80.6% SWE-bench) and GPT-5.2 (73.8% GPQA) are the primary alternatives to evaluate on those specific benchmarks. Training data for Claude Sonnet 4.6 includes a curated mix of public web text, licensed datasets, and synthetic reasoning traces. Anthropic does not train on API inputs by default; inputs are retained for up to 30 days for safety and abuse monitoring and then deleted unless flagged. Enterprise customers can request a zero-retention arrangement. The model is deployed under Anthropic's Responsible Scaling Policy. SOC 2 Type II, HIPAA-eligible, and GDPR-compliant configurations are available through the Anthropic API and via AWS Bedrock's data governance controls. Anthropic's trust center URL is anthropic.com/transparency. Claude Sonnet 4.6 replaced Claude Sonnet 4.5 as the default model on claude.ai for Free and Pro users on its launch date. The older Claude Sonnet 4 model (claude-sonnet-4-20250514) and Claude Opus 4 (claude-opus-4-20250514) are deprecated and will be retired on June 15, 2026; Anthropic's official migration target for Sonnet 4 users is Sonnet 4.6. Claude Sonnet 3.7 and Haiku 3.5 have already been retired. The next Sonnet-class model is expected to be Sonnet 4.8, with no announced release date. Sonnet 4.6 is the current production standard for mid-tier Claude deployments as of May 2026.
Pricing
$3.00 per 1M input tokens, $15.00 per 1M output tokens. Prompt caching: 5-minute cache write $3.75/MTok, 1-hour cache write $6.00/MTok, cache reads $0.30/MTok. Batch API: $1.50 input / $7.50 output per 1M tokens with 300K output ceiling via beta header. Regional endpoints add 10% premium.
Key Features
- Adaptive Thinking: Replaces fixed extended thinking with a dynamic system that allocates reasoning tokens based on task complexity and the configured effort level (low/medium/high/max). The model averages 246 thinking tokens per question on standard tasks, reducing overhead on simple queries while scaling up for hard ones.
- 1M Token Context Window: Generally available as of March 13, 2026, at standard per-token pricing. No beta header or long-context premium required. Accepts up to 600 images or PDFs per request alongside text.
- Computer Use: Supports GUI navigation, spreadsheet manipulation, and multi-step web form completion. Scores 72.5% on OSWorld-Verified, matching Opus 4.6 and more than doubling GPT-5.2's 38.2% score.
- Prompt Caching: Caches system prompts, documents, and conversation history. Cache reads cost $0.30 per million tokens, a 90% reduction from the $3.00 base input rate. The 1-hour TTL supports long-running agent sessions without re-processing context on every turn.
- 300K Batch Output: The Message Batches API supports up to 300,000 output tokens per call via the output-300k-2026-03-24 beta header, paired with a 50% price reduction. Suited for large code generation, documentation, and report synthesis tasks.
Pros
- 79.6% SWE-bench Verified, within 1.2 points of Opus 4.6 at 60% of the price, making it the default choice for high-volume coding agents.
- 72.5% OSWorld computer use score with Opus-level prompt injection resistance, the first time that safety threshold reached a Sonnet-class model.
- 1M token context window at no surcharge, allowing full-codebase or multi-document analysis without chunking or vector retrieval.
Cons
- GPQA Diamond 74.1% trails Opus 4.6 by 17 points; scientific reasoning tasks in chemistry, biology, or advanced physics still require the flagship model.
- No native audio I/O; voice products must add a separate transcription and synthesis layer.
- Closed weights; teams requiring on-device, air-gapped, or fine-tuned deployment cannot use this model.
Benchmarks
- math: 89
- arc agi 2: 60.4
- gpqa diamond: 74.1
- swe bench verified: 79.6
Frequently Asked Questions
What is Claude Sonnet 4.6 and who built it?
Claude Sonnet 4.6 is a mid-tier large language model built by Anthropic and released on February 17, 2026. It sits in the Claude 4 model family between the budget-oriented Claude Haiku 4.5 and the research-grade Claude Opus 4.6, and it replaced Claude Sonnet 4.5 as the default model on claude.ai for Free and Pro users on launch day. The model uses a dense transformer architecture with an undisclosed parameter count. On the most important coding benchmark, SWE-bench Verified, it scores 79.6% -- within 1.2 points of Claude Opus 4.6 (80.8%) at 60% of the price. On computer use (OSWorld), it scores 72.5%, nearly matching Opus 4.6 (72.7%) and far ahead of GPT-5.2 (38.2%). The model was specifically designed to close the performance gap between Anthropic's mid-tier and flagship tiers on the workloads most development teams rely on: agentic coding, long-document analysis, and GUI automation. Pricing starts at $3.00 per million input tokens and $15.00 per million output tokens, with a 1-million-token context window included at no surcharge.
How much does Claude Sonnet 4.6 cost per 1M tokens?
Claude Sonnet 4.6 costs $3.00 per million input tokens and $15.00 per million output tokens on the Anthropic API, confirmed on Anthropic's official pricing page as of May 2026. Prompt caching reduces input costs significantly: writing to a 5-minute cache costs $3.75 per million tokens, writing to a 1-hour cache costs $6.00 per million tokens, and cache reads cost $0.30 per million tokens -- a 90% reduction from the base input rate. The Batch API offers a 50% discount, bringing prices to $1.50 input and $7.50 output per million tokens, with asynchronous result delivery. To illustrate real workload costs: summarizing a 100,000-token research paper costs roughly $0.30; running a daily coding agent at 1 million input tokens and 200,000 output tokens costs approximately $6.00; processing 1,000 customer support turns at an average of 2,000 input and 500 output tokens each costs roughly $9.50. Regional endpoints on AWS Bedrock and Google Vertex AI add a 10% premium over global routing. The pricing is identical to Claude Sonnet 4.5, maintaining Anthropic's $3/$15 per million tokens across four Sonnet generations.
What is Claude Sonnet 4.6's context window and max output?
Claude Sonnet 4.6 has a 1-million-token context window that became generally available on March 13, 2026, with no beta header required and no long-context price premium. A 900,000-token request bills at the same per-token rate as a 9,000-token request. The synchronous Messages API supports up to 64,000 output tokens per call, making it suitable for generating long documents, code files, and reports. For larger generation tasks, the Message Batches API supports up to 300,000 output tokens per call via the output-300k-2026-03-24 beta header, paired with the standard 50% Batch API discount. The model accepts up to 600 images or PDFs in a single request alongside text. Anthropic has not published a public needle-in-haystack recall benchmark for Sonnet 4.6, though the 1M context window is architecturally shared with Opus 4.6. The model also features adaptive context compaction, which summarizes older conversation turns as context fills, enabling sustained long-horizon agentic sessions without manual truncation. Claude Haiku 4.5, by contrast, has a 200K token context window, and Claude Opus 4.7 has a 1M window with up to 128K output tokens.
How does Claude Sonnet 4.6 compare to Claude Opus 4.6 and GPT-5.2 on benchmarks?
Claude Sonnet 4.6 and Claude Opus 4.6 are nearly tied on coding and computer use: SWE-bench Verified is 79.6% for Sonnet versus 80.8% for Opus (a gap of 1.2 points), and OSWorld is 72.5% versus 72.7% (essentially identical). The meaningful gap between the two models appears in scientific reasoning: GPQA Diamond is 74.1% for Sonnet versus 91.3% for Opus, a 17-point difference that favors Opus for tasks in chemistry, biology, advanced physics, and graduate-level problem solving. Against GPT-5.2, Sonnet 4.6 leads clearly on computer use (72.5% versus 38.2% OSWorld) and is comparable on coding (79.6% versus approximately 78% SWE-bench). On GPQA Diamond, GPT-5.2 (73.8%) and Sonnet 4.6 (74.1%) are nearly identical. Gemini 3.1 Pro sits at 80.6% on SWE-bench, slightly ahead of Sonnet 4.6. On MATH, Sonnet 4.6 scores 89%, up sharply from 62% on Sonnet 4.5. The benchmark numbers reported above for Sonnet 4.6 and Opus 4.6 come from independent and vendor sources published in February-March 2026; GPT-5.2 scores are third-party estimates and should be treated with some caution.
Is Claude Sonnet 4.6 open source or proprietary?
Claude Sonnet 4.6 is fully proprietary. Anthropic has not released the model weights, and there is no self-hosted deployment option. Access is API-only through four platforms: the Anthropic API directly (api.anthropic.com), AWS Bedrock (model ID: anthropic.claude-sonnet-4-6), Google Vertex AI (model ID: claude-sonnet-4-6), and Microsoft Foundry. On Bedrock, the model supports global cross-region routing and geo cross-region routing across US, EU, AU, and JP geographies. On Vertex, global, multi-region, and regional endpoints are available. Authentication uses API keys for the Anthropic API and IAM credentials for Bedrock and Vertex. Regional and multi-region endpoints carry a 10% price premium. SDKs are available in Python, TypeScript, Java, Go, and Ruby; Go and Ruby do not support Microsoft Foundry. Commercial use is governed by Anthropic's Commercial Terms of Service. Teams that require on-device deployment, air-gapped inference, or the ability to fine-tune the base weights should instead evaluate open-weights models such as Meta's Llama 4 or Mistral's open-licensed models.
What modalities does Claude Sonnet 4.6 support?
Claude Sonnet 4.6 accepts text, images, PDFs, and tool-calls as input, and produces text and tool-calls as output. Vision is fully live: the model processes images and PDFs natively in the API without external OCR preprocessing. Up to 600 images or PDFs can be included in a single request, with individual images up to 8,000x8,000 pixels. There is no native audio input or output; teams building voice products must add a separate ASR layer for transcription and a TTS layer for speech synthesis. Video input is not supported. Function calling uses Anthropic's standard tool_use schema with support for parallel tool calls, structured JSON output, and the tool_choice parameter for enforcing specific tool invocation. Computer use is supported via a dedicated tool, enabling GUI navigation, spreadsheet manipulation, and multi-step web form completion, with a 72.5% OSWorld score. Adaptive thinking provides chain-of-thought reasoning with configurable effort levels. Compared to Google Gemini 3.1 Pro, which supports native audio and video input, Sonnet 4.6's modality coverage is narrower, but its computer use capability is significantly stronger.
Does Claude Sonnet 4.6 train on user data?
Anthropic does not train Claude Sonnet 4.6 on API inputs by default. API inputs and outputs are retained for up to 30 days for safety and abuse monitoring, and then deleted unless a specific retention flag is applied. Enterprise customers can request a zero-retention arrangement, in which inputs are not stored after processing. This zero-retention option is available through both the direct Anthropic API and AWS Bedrock. Anthropic holds SOC 2 Type II certification, is HIPAA-eligible for qualifying use cases, and is GDPR-compliant. The EU AI Act classifies Sonnet 4.6 as a general-purpose AI with systemic risk obligations. Anthropic's full compliance documentation is available at anthropic.com/transparency. On AWS Bedrock, data residency and governance controls are managed separately at the infrastructure level. On Google Vertex AI, data handling follows Google's enterprise data processing addendum. For US-only inference routing via the inference_geo parameter on the direct API, a 1.1x pricing multiplier applies on all token categories.
Who is Claude Sonnet 4.6 best for and who should avoid it?
Claude Sonnet 4.6 is the best choice for teams running agentic coding loops at scale: its 79.6% SWE-bench Verified score is near-Opus quality at 60% of Opus pricing, and the 1M-token context window enables full-codebase operations without chunking. It is also the leading model for computer use and GUI automation pipelines, with a 72.5% OSWorld score that far exceeds GPT-5.2's 38.2%. Teams doing enterprise document comprehension or long-document RAG benefit from the 1M context at no premium and from Sonnet 4.6's near-Opus score on OfficeQA. Teams that should avoid Sonnet 4.6 include those requiring deep scientific reasoning: the 17-point GPQA gap versus Opus 4.6 (74.1% vs 91.3%) means Opus is the correct choice for chemistry, biology, and graduate-level problem solving. Real-time voice applications are ruled out by the absence of native audio I/O; teams should look at models with native audio support such as GPT-4o Audio. Teams with air-gapped or on-device deployment requirements should evaluate open-weights models like Llama 4 or Mistral, since Sonnet 4.6 is API-only with no self-hosting path.