Claude Sonnet 4.6: 1M Token Context, Features & Benchmarks

Summary: Released February 17, 2026, Claude Sonnet 4.6 delivers Opus-level intelligence at Sonnet pricing. Key features: 1 million token context window (GA March 13, 2026), adaptive and extended thinking modes, 72.5% OSWorld computer use score (vs Opus 4.6's 72.7%), 79.6% SWE-bench Verified, and first place on Finance Agent v1.1 and GDPval-AA benchmarks. Priced at $3 per million input tokens.

What Is Claude Sonnet 4.6?

Claude Sonnet 4.6 is Anthropic's mid-tier frontier model, released February 17, 2026. It delivers a 1 million token context window, adaptive and extended thinking modes, a 72.5% OSWorld computer use score, and Opus-level performance on financial and office benchmarks — all at $3 per million input tokens. This guide covers every major feature, benchmark, and practical use case worth knowing before you build with it or switch to it.

The Pricing Shift That Changes the Calculation

Every AI model release promises more capability. Claude Sonnet 4.6 does something more specific and commercially significant: it delivers performance that was previously exclusive to Opus-class models at a price that has not changed from Sonnet 4.5.

The pricing is $3 per million input tokens and $15 per million output tokens, identical to its predecessor. The capability is a different story. In Anthropic's own internal testing, developers using Claude Code preferred Sonnet 4.6 over Opus 4.5, the previous frontier model, 59% of the time. They cited fewer hallucinations, better instruction following, less overengineering, and more consistent execution of multi-step tasks. A mid-tier model, preferred over the last frontier. That is the headline.

The 1M Token Context Window

When Sonnet 4.6 launched, the 1 million token context window was in beta. On March 13, 2026, it became generally available for both Claude Opus 4.6 and Sonnet 4.6 at standard pricing: no extra charge, no waitlist.

To make that number concrete:

A 1M token window holds approximately 750,000 words, roughly 10 full-length novels
It can contain an entire large codebase, including all dependencies, tests, and documentation, in a single request
It supports up to 600 images or PDF pages per request, up from previous limits
It fits dozens of research papers, quarterly reports, or lengthy legal contracts simultaneously

Raw window size is only half the story. Previous long-context models suffered severe degradation in the middle of their input, a well-documented failure mode where the model loses track of information not near the beginning or end of the context. Opus 4.6 scores 78.3% on the MRCR v2 benchmark at 1 million tokens, placing it first among all leading models tested. Sonnet 4.6 tracks closely.

For Claude Code users specifically, this enables full-codebase loading for dependency reasoning that retrieval-augmented generation (RAG) routinely misses. The practical impact is not just a bigger context window — it is context the model reasons effectively across, which is a fundamentally different engineering challenge.

Hybrid Reasoning: Adaptive and Extended Thinking

Sonnet 4.6 introduces a two-mode reasoning architecture that changes how the model allocates its internal processing depending on what a task actually requires.

Adaptive Thinking

Adaptive thinking lets Claude automatically decide when and how deeply to reason before responding. Simple queries get fast responses. Complex multi-step problems trigger extended internal reasoning chains. The model adjusts dynamically without requiring the user to specify a reasoning mode. On the API, it can be enabled with:

thinking: { "type": "adaptive" }

In the Claude.ai web interface and Claude Code, adaptive thinking is on by default for all users, including free tier.

Extended Thinking

Extended thinking gives developers explicit control over the model's reasoning depth. When enabled, Claude runs a visible chain-of-thought reasoning process before producing its final output. This is useful for high-stakes domains — legal analysis, financial modelling, complex debugging — where transparency of reasoning matters as much as the conclusion.

Anthropic's guidance is clear: Sonnet 4.6 performs strongly even with extended thinking off. For exploratory use, adaptive mode is the right default. For maximum accuracy on the hardest problems, extended thinking at maximum effort remains the strongest configuration.

Context Compaction: Extending Context Beyond the Window

Alongside the 1M window, Anthropic shipped context compaction in beta: a feature that automatically summarizes older portions of a conversation as it approaches context limits. The practical effect is that effective context length extends significantly beyond 1 million tokens for long-running sessions, because the model compresses what it has already processed into a compact representation rather than truncating.

In the Humanity's Last Exam evaluation, Claude models running with context compaction had compaction triggered at 50,000 tokens, with up to 3 million total tokens permitted across the full session. For BrowseComp, the ceiling was 10 million total tokens. This is not a theoretical limit. It is an architecture designed for production-grade agentic workflows that run for hours across large document sets.

Context compaction is now included in the free tier.

Computer Use: The OSWorld Benchmark Leap

Sixteen months of progress on the OSWorld benchmark shows how quickly Anthropic's computer use capabilities have advanced:

Model · OSWorld-Verified Score

Early Claude Sonnet (Oct 2024) · ~35%

Claude Sonnet 4.5 · ~58%

Claude Sonnet 4.6 · 72.5%

Claude Opus 4.6 · 72.7%

GPT-5.3 Codex · 64.7%

OSWorld presents hundreds of real software tasks across Chrome, LibreOffice, VS Code, and more, running on a simulated computer with no special APIs or purpose-built connectors. The model sees the screen and interacts via mouse and keyboard, the same way a person would.

Sonnet 4.6's 72.5% score places it effectively tied with the Opus flagship, and 8 full percentage points ahead of OpenAI's Codex. In real-world production, early enterprise users are reporting human-level capability on tasks including:

handling complex multi-sheet spreadsheets
Filling out multi-step web forms with conditional logic
Coordinating actions across multiple open browser tabs
Insurance submission intake and first notice of loss workflows (94% accuracy reported by one enterprise customer)

Prompt injection resistance has also improved substantially. Attacks where malicious actors hide instructions in web content to hijack an agent's behavior were a genuine concern for production computer-use deployments. Sonnet 4.6 now performs on par with Opus 4.6 in safety evaluations on this vector.

Coding: Why Developers Are Choosing It Over the Previous Flagship

The Claude Code preference data is the most commercially significant signal in this release:

70% of Claude Code users preferred Sonnet 4.6 over Sonnet 4.5
59% preferred it over Opus 4.5, the previous frontier model

The stated reasons point directly at what makes long-session coding frustrating with earlier models:

Reads full context before modifying code, rather than making local edits without understanding adjacent logic
Consolidates shared logic instead of duplicating it across files
Completes tasks rather than deferring or summarizing
Fewer false claims of success
More reliable follow-through across multi-step tasks

On SWE-bench Verified, Sonnet 4.6 scores 79.6%, competitive with the top cluster of models from OpenAI, Google, and Anthropic itself. With a prompt modification, Anthropic reported 80.2% on the same benchmark.

One enterprise customer, a code review platform, summarized it plainly: "Claude Sonnet 4.6 has meaningfully closed the gap with Opus on bug detection, letting us run more reviewers in parallel, catch a wider variety of bugs, and do it all without increasing cost."

Full Benchmark Reference

Benchmark · Claude Sonnet 4.6 · Claude Opus 4.6 · GPT-5.3 Codex

OSWorld-Verified (computer use) · 72.5% · 72.7% · 64.7%

SWE-bench Verified (coding) · 79.6% · ~80% · ~80%

Terminal-Bench 2.0 · 59.1% · 65.4% · 75.1%

Finance Agent v1.1 · 63.3% (1st) · 60.1% · —

GDPval-AA (office tasks) · 1633 Elo (1st) · 1606 · —

ARC-AGI-2 (max effort) · 62% · Higher · —

OfficeQA (doc comprehension) · Matches Opus · Best · —

The two wins that matter most commercially are Finance Agent v1.1 and GDPval-AA, where Sonnet 4.6 outperforms its own flagship. For financial services and enterprise document workflows, a Sonnet-priced model that beats Opus on the benchmarks that matter to those buyers is a structurally significant pricing event.

API Updates: What Is Now Generally Available

Alongside the model itself, Anthropic shipped a substantial set of API capability upgrades. As of Sonnet 4.6's release, the following are generally available:

Code execution — agents can write and run code mid-task
Memory — persistent state across agent sessions
Programmatic tool calling — structured, reliable tool invocation at scale
Tool search — dynamic discovery of available tools within an agent workflow
Tool use examples — in-context examples for more reliable tool-use behavior

The web search and fetch tools received an important upgrade: they now automatically write and execute code to filter and process search results, keeping only the relevant content in context. This improves both response quality and token efficiency. The model no longer dumps entire web pages into context when only three paragraphs are relevant.

Claude in Excel: MCP Connectors for Financial Workflows

For financial analysts and enterprise teams using Claude in Excel, Sonnet 4.6 introduces MCP connector support directly inside the Excel add-in. Supported integrations at launch include S&P Global, LSEG, Daloopa, PitchBook, Moody's, and FactSet.

If you have already configured MCP connectors in Claude.ai, those connections work automatically inside Excel with no additional setup. This is available on Pro, Max, Team, and Enterprise plans. Financial analysts can ask Claude to pull live market data, company financials, and research directly into their models without leaving Excel or running a separate API integration.

Long-Horizon Planning: The Vending-Bench Result

Anthropic used an unusual evaluation to showcase Sonnet 4.6's long-context reasoning: the Vending-Bench Arena, which pits AI models against each other in a simulated business competition run over time.

Sonnet 4.6 developed a strategy none of its competitors matched. It invested heavily in infrastructure for the first ten simulated months, spending significantly more than competing models, then pivoted sharply to maximizing profitability in the final stretch. The timing of that pivot allowed it to finish well ahead of the competition.

This demonstrates that the model can reason across extended time horizons, maintain a plan through adversarial conditions, update its strategy based on evolving context, and execute a late-phase pivot. These are exactly the properties that separate an agent that completes a three-step task from one that runs a multi-week autonomous workflow. Sonnet 4.6 almost tripled Sonnet 4.5's average gains over the same simulated year.

Free Tier: What Everyone Gets Now

The release of Sonnet 4.6 as the default model for Free plan users is a meaningful capability expansion. Free tier now includes:

File creation — generate and export documents, code, and structured data
Connectors — MCP-based integrations with Google Workspace, Slack, and others
Skills — task-specific Claude capabilities previously behind a paywall
Context compaction — automatically extends effective session length

Features previously gated behind Pro or higher are now in the free product by default.

Architecture: Dense Transformer, Not MoE

For developers evaluating Sonnet 4.6 for production deployment, one architectural detail is worth noting: Anthropic identifies Sonnet 4.6 as a dense transformer architecture, a deliberate departure from the Mixture of Experts (MoE) approach adopted by several competing frontier models.

In practice, inference behavior is more consistent and predictable than sparse-activation architectures. There is no routing variability between expert subnetworks that can cause inconsistent outputs on semantically similar inputs. For agentic workflows where reliability matters more than raw throughput, this is an architectural advantage worth factoring into model selection.

Sonnet 4.6 vs Opus 4.6: When to Use Which

The performance gap between Sonnet 4.6 and Opus 4.6 is now narrow enough in most domains that the choice is primarily economic, not capability-driven.

Use Case · Recommended Model · Reason

Everyday coding and PR review · Sonnet 4.6 · Near-identical SWE-bench, 5x cheaper

Financial analysis and OfficeQA · Sonnet 4.6 · Outperforms Opus on Finance Agent v1.1

Computer use automations · Sonnet 4.6 · 72.5% vs 72.7% — effectively tied

Multi-agent orchestration · Opus 4.6 · Stronger at coordinating agent networks

Codebase refactoring (full repo) · Opus 4.6 · 128k max output vs Sonnet's 64k

Expert-level novel reasoning · Opus 4.6 · GPQA and ARC-AGI-2 advantage at depth

Long-context document retrieval · Opus 4.6 · 78.3% MRCR v2 vs Sonnet's slightly lower score

How to Access Claude Sonnet 4.6

Claude Sonnet 4.6 is available immediately across every Anthropic surface:

Claude.ai — default model on Free, Pro, Max, Team, and Enterprise plans
0 — default for Pro and Team; Opus 4.6 default for higher tiers
Anthropic API — model ID: claude-sonnet-4-6
Amazon Bedrock and Google Cloud Vertex AI — with platform-specific pricing
Claude in Excel — with MCP connector support on Pro and above
0, Roo Code, and other IDE integrations — full support at launch

Related Guides

How to Choose the Right LLM for Your Use Case — full model selection framework covering Claude, ChatGPT, Gemini, and more
Claude Code vs Cursor: Which AI Coding Agent in 2026? — head-to-head benchmark breakdown for coding workflows

Final Thoughts

Claude Sonnet 4.6 is not an incremental update. It is a pricing and capability reset that shifts the decision calculus for every team currently paying Opus-tier prices for Opus-tier results.

A 1M token context window that retrieves reliably. Adaptive thinking that eliminates the reasoning-mode guessing game. Computer use that matches the flagship within rounding error. Financial and office benchmarks where Sonnet beats its own Opus model. All of it at $3 per million input tokens.

The model that was supposed to sit below the frontier has, in most production domains, become the frontier. What that means for pricing strategy, model selection, and inference cost planning is a conversation that is just beginning. Teams weighing ChatGPT, Gemini, or Microsoft Copilot alongside Claude will find the full decision framework in our LLM selection guide.

> Related: If you're hitting Claude's usage walls faster than expected, read our analysis of why Claude Max limits expire by Wednesday — the quiet tightening of capacity that most Pro users haven't noticed.

Frequently Asked Questions

What is Claude Sonnet 4.6's context window size?

Claude Sonnet 4.6 has a 1 million token context window, which became generally available on March 13, 2026, at no additional cost. One million tokens holds approximately 750,000 words, an entire large codebase with dependencies and tests, or up to 600 images and PDF pages in a single request. Context compaction further extends effective session length beyond 1 million tokens for long-running agentic workflows.

How does Claude Sonnet 4.6 compare to Opus 4.6?

Claude Sonnet 4.6 and Opus 4.6 are nearly identical on most production benchmarks. Sonnet 4.6 scores 72.5% on OSWorld (Opus scores 72.7%), 79.6% on SWE-bench Verified, and outperforms Opus on Finance Agent v1.1 (63.3% vs 60.1%) and GDPval-AA (1633 Elo vs 1606). Opus 4.6 retains advantages in multi-agent orchestration, full-repo refactoring (128k max output vs Sonnet's 64k), and ARC-AGI-2. Sonnet 4.6 costs approximately 5x less.

What is adaptive thinking in Claude Sonnet 4.6?

Adaptive thinking is a reasoning mode in Claude Sonnet 4.6 that automatically determines how deeply to reason before responding. Simple queries get fast responses; complex multi-step tasks trigger extended internal reasoning chains. It requires no configuration from the user and is enabled by default in Claude.ai and Claude Code for all plan tiers including free. On the API it is enabled via: thinking: { type: 'adaptive' }.

What is Claude Sonnet 4.6's OSWorld score?

Claude Sonnet 4.6 scores 72.5% on OSWorld-Verified, the benchmark for AI computer use across real software tasks in Chrome, LibreOffice, VS Code, and other applications. This is effectively tied with Claude Opus 4.6 (72.7%) and 8 percentage points ahead of GPT-5.3 Codex (64.7%). Enterprise users have reported 94% accuracy on insurance submission intake workflows using Sonnet 4.6's computer use capabilities.

How much does Claude Sonnet 4.6 cost?

Claude Sonnet 4.6 is priced at $3 per million input tokens and $15 per million output tokens via the Anthropic API, unchanged from Sonnet 4.5. The 1 million token context window is included at standard pricing with no surcharge. Claude Sonnet 4.6 is also available as the default model on the free tier of Claude.ai, and on Pro, Max, Team, and Enterprise plans.