Qwen3.7-Max Review: 1M Context and 92.4 GPQA Diamond (2026)

Qwen3.7-Max: Alibaba's May 2026 agent model with 1M context, 92.4 GPQA Diamond, SWE-Pro 60.6, priced at $2.50/$7.50 per 1M tokens for reasoning tasks.

Qwen3.7-Max is Alibaba's proprietary reasoning agent model released May 19, 2026, offering a 1,000,000-token context window, 92.4 GPQA Diamond, and 60.6 on SWE-Pro. Standard API pricing on Alibaba Cloud DashScope is $2.50 per 1M input tokens and $7.50 per 1M output tokens, with cached reads at $0.25 per 1M.

Qwen3.7-Max, released May 19, 2026 by Alibaba's Qwen team, is a proprietary reasoning and agent model with a 1 million token context window. It scores 92.4 on GPQA Diamond and 60.6 on SWE-Pro, outpacing several frontier competitors on long-horizon autonomous coding tasks. Pricing is $2.50 per 1M input tokens and $7.50 per 1M output tokens on Alibaba Cloud DashScope, with cached reads at $0.25 per 1M.

Provider: Alibaba Cloud · Family: Qwen3.7

Context window: 1,000,000 tokens · Max output: 65,536

Input modalities: text, tool-calls · Output: text, tool-calls

About Qwen3.7-Max

Qwen3.7-Max is Alibaba Cloud's proprietary flagship reasoning and agent model, released May 19, 2026, one day ahead of its official announcement at the Alibaba Cloud Summit in Hangzhou. Built by the Qwen research team under Alibaba Group, the model sits at the top of the Qwen3.7 series, which also includes the multimodal Qwen3.7-Plus released June 3, 2026. The architecture is Mixture-of-Experts with an undisclosed but estimated parameter count exceeding 1 trillion, following the MoE lineage of its predecessor Qwen3-235B-A22B. It was designed specifically as an "Agent Frontier" model for long-horizon autonomous execution, multi-step reasoning, and tool-use workflows rather than general-purpose chat. On reasoning benchmarks, Qwen3.7-Max scores 92.4 on GPQA Diamond, 97.1 on HMMT 2026 Feb, and 90 on IMOAnswerBench, placing it among the strongest reasoning models in mid-2026. On agentic coding it achieves 60.6 on SWE-Pro and 78.3 on SWE-Multilingual, beating Qwen3.6-Max and matching or exceeding several proprietary competitors at a fraction of the cost. Artificial Analysis ranks it at 56.6 on its composite Intelligence Index with output speed of 113.1 tokens per second. LM Arena placed the Qwen3.7-Max-Preview at approximately Elo 1,475, ranked 13th overall and 7th for math as of May 2026. Compared to GPT-5.5 and Claude Opus 4.7, Qwen3.7-Max wins on cost efficiency per reasoning task while trailing on general assistant tasks and modality coverage. Qwen3.7-Max ships with a 1,000,000-token context window and a maximum output of 65,536 tokens per request. Alibaba Cloud DashScope lists a maximum input of 991,800 tokens after internal formatting overhead. This is a fourfold increase over Qwen3-Max's 262,144-token limit, enabling processing of large codebases, lengthy legal documents, or multi-session agent transcripts in a single call. Prompt caching applies to repeated system prompt content at $0.25 per 1M cached input tokens, a 90% discount that makes repeat-context agent loops significantly cheaper. Qwen3.7-Max is a text-only model: it accepts text and structured tool call inputs and produces text and tool call outputs, but does not support vision, audio, video, or image inputs. For multimodal workflows, Alibaba released Qwen3.7-Plus on June 3, 2026, which adds image and video understanding, deep reasoning, self-programming, tool invocation, verification testing, and autonomous iteration. Qwen3.7-Max supports native function calling with OpenAI-compatible tool schemas, structured JSON output, parallel tool calls, and an extended thinking mode for deeper chain-of-thought. The model targets sustained autonomous execution across hundreds or thousands of steps, including code writing, debugging, and office workflow automation. Standard pricing on Alibaba Cloud DashScope is $2.50 per 1M input tokens and $7.50 per 1M output tokens. Alibaba ran a 50% launch promotion from May 19 through June 22, 2026, at $1.25 and $3.75 per 1M respectively. Cached input reads cost $0.25 per 1M tokens, making long-context reuse very efficient. Summarizing a 200K-token document costs roughly $0.50 input plus $0.08 output at standard rates; a daily coding agent processing 1M tokens in and 100K tokens out costs about $3.25; a 1,000-turn customer support pipeline at 3K in and 800 out per turn runs roughly $8.10 per day. Compared to Claude Opus 4.7 at $15/M output or GPT-5 at $30-75/M, Qwen3.7-Max is among the most affordable frontier-tier options for output-heavy agentic workloads. Qwen3.7-Max is available via Alibaba Cloud Model Studio (DashScope API), OpenRouter (qwen/qwen3.7-max), Together AI, Fireworks AI, and ModelScope, all live from May 19, 2026. Together AI and Fireworks provide US-region hosting for teams with data residency requirements outside China. The model ID on DashScope is qwen3.7-max and authentication requires a DashScope API key obtained through Alibaba Cloud account creation. There are no open weights, no self-hosting option, and no fine-tuning support for Qwen3.7-Max. Qwen3.7-Max uses RLHF and instruction-tuning alignment, with a hallucination rate of 22.9% on TruthfulQA-style evaluations. The model abstains on more than 50% of questions it previously attempted in prior versions, reflecting a conservative posture on ambiguous inputs. Alibaba has not published a standalone system card for Qwen3.7-Max, though the official Qwen3.7 blog post at qwen.ai provides technical details and safety notes. The model refuses CSAM, weapons manufacturing instructions, and malware generation by default. Qwen3.7-Max is best for teams building text-based long-horizon autonomous agents, coding agents processing large codebases, and scientific reasoning pipelines where GPQA Diamond performance of 92.4 and competition math scores matter. Its SWE-Multilingual score of 78.3 makes it a strong pick for multilingual software engineering. Teams building multimodal applications requiring vision or audio should use Qwen3.7-Plus instead. Organizations with strict air-gapped or on-premise requirements cannot use Qwen3.7-Max (no open weights); Qwen3-32B under Apache 2.0 is the best open alternative. Teams prioritizing human preference alignment over raw reasoning scores may prefer Claude Opus 4.7 or GPT-5.5, which rank higher on LM Arena overall.

Pricing

$2.50 per 1M input, $7.50 per 1M output, $0.25 per 1M cached input (90% discount). A 50% launch promotion ran May 19 through June 22, 2026, at $1.25/$3.75 per 1M. Max output is 65,536 tokens per request.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is Qwen3.7-Max and who built it?

Qwen3.7-Max is a proprietary large language model built by Alibaba Cloud's Qwen team and released on May 19, 2026, with the official announcement at the Alibaba Cloud Summit in Hangzhou the following day. It is the flagship text-and-reasoning model of the Qwen3.7 series, sitting above the multimodal Qwen3.7-Plus released June 3, 2026. The architecture is Mixture-of-Experts with an estimated parameter count exceeding 1 trillion, following the MoE lineage of Qwen3-235B-A22B. Alibaba positioned it as an 'Agent Frontier' model designed for long-horizon autonomous execution, multi-step reasoning, and sustained tool-use workflows. On benchmarks, it scores 92.4 on GPQA Diamond, 60.6 on SWE-Pro, and 78.3 on SWE-Multilingual, placing it competitively among frontier models for reasoning and coding at mid-tier pricing. The context window is 1,000,000 tokens with up to 65,536 output tokens per request. Standard pricing is $2.50 per 1M input tokens and $7.50 per 1M output tokens on Alibaba Cloud DashScope.

How much does Qwen3.7-Max cost per 1M tokens?

Qwen3.7-Max is priced at $2.50 per 1M input tokens and $7.50 per 1M output tokens at standard rates on Alibaba Cloud DashScope, live from May 19, 2026. Alibaba ran a 50% launch promotion from May 19 through June 22, 2026, at $1.25 input and $3.75 output per 1M. Cached input reads cost $0.25 per 1M tokens, a 90% discount over the standard input rate, making prompt caching highly cost-effective for long-context agent loops. Summarizing a 200K-token document costs roughly $0.50 input plus $0.08 output at standard rates. A daily coding agent processing 1M tokens in and 100K out costs approximately $3.25 per day. A customer support pipeline at 1,000 turns averaging 3K input and 800 output tokens per turn costs roughly $8.10 per day. Compared to Claude Opus 4.7 at $15/M output and GPT-5 at $30-75/M, Qwen3.7-Max is one of the most affordable frontier-tier options for output-heavy agentic workloads.

What is Qwen3.7-Max's context window and max output?

Qwen3.7-Max supports a context window of 1,000,000 tokens with a maximum output of 65,536 tokens per request, confirmed by Alibaba Cloud DashScope documentation from May 2026. Effective max input is 991,800 tokens after internal formatting overhead. This is a fourfold increase over Qwen3-Max's 262,144-token window, enabling processing of full codebases, multi-session agent transcripts, and long legal documents in a single API call. Prompt caching applies to repeated system prompt content at $0.25 per 1M cached tokens, a 90% discount over standard input pricing. There is no separate extended-context tier; all API calls access the full 1M context at the same per-token rate. Compared to Claude Opus 4.7 at 200K and GPT-5 at 128K, the 1M window is a meaningful differentiator for large-scale agentic workflows. Long-context recall accuracy has not been independently benchmarked by third parties at the time of writing.

How does Qwen3.7-Max compare on benchmarks vs Claude Opus 4.7?

On GPQA Diamond, Qwen3.7-Max scores 92.4, placing it above Claude Opus 4.7 (approximately 78-80) on this reasoning benchmark. For competition-level math, Qwen3.7-Max achieves 97.1 on HMMT 2026 Feb and 90 on IMOAnswerBench, putting it among the strongest models available for hard reasoning at its price point. On agentic coding, it scores 60.6 on SWE-Pro and 78.3 on SWE-Multilingual, though Claude Opus 4.7 leads on standard SWE-bench Verified. On LM Arena as of May 2026, Qwen3.7-Max-Preview held approximately Elo 1,475 (ranked 13th), while Claude Opus 4.7 ranks higher in overall human preference. The Artificial Analysis Intelligence Index places Qwen3.7-Max at 56.6. At $7.50/M output versus $15/M for Claude Opus 4.7, Qwen3.7-Max delivers competitive reasoning at roughly half the output cost. Teams that primarily care about GPQA Diamond or competition math will find Qwen3.7-Max leads; teams prioritizing general assistant quality, tool orchestration reliability, and safety alignment will likely prefer Claude Opus 4.7.

Is Qwen3.7-Max open source or proprietary?

Qwen3.7-Max is fully proprietary and API-only: Alibaba has not released weights, architecture blueprints, or training code for this specific model. This distinguishes it from Alibaba's open-weight Qwen3 models (Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, Qwen3-0.6B) released under Apache 2.0 on Hugging Face and Ollama. Access to Qwen3.7-Max requires an API key from one of four providers: Alibaba Cloud DashScope, OpenRouter (qwen/qwen3.7-max), Together AI, or Fireworks AI. There are no self-hosting options, no VRAM requirements to consider, and no fine-tuning support for the Max variant. Alibaba has not indicated a timeline for open-weight release of Qwen3.7-Max. Teams needing open weights for self-hosting or compliance reasons should use Qwen3-32B (Apache 2.0) or Qwen3-235B-A22B on Hugging Face. Commercial use is permitted under Alibaba Cloud DashScope Terms of Service.

What modalities does Qwen3.7-Max support?

Qwen3.7-Max is a text-only model: it accepts text input and structured tool call inputs, and produces text output and tool call outputs. It does not support image, audio, video, or PDF inputs. For multimodal needs within the Qwen3.7 family, Alibaba released Qwen3.7-Plus on June 3, 2026, which adds image and video understanding, deep reasoning, self-programming, tool invocation, verification testing, and autonomous iteration. Qwen3.7-Max supports native function calling with OpenAI-compatible tool schemas, structured JSON output, and parallel tool calls. Extended thinking mode is available per request, generating visible chain-of-thought before the final answer. Computer use or screen-reading capabilities are not documented for this model. Audio workflows require a separate ASR model to transcribe speech before passing text to Qwen3.7-Max. The model is purpose-built for text-based agentic loops, not multimodal pipelines.

Does Qwen3.7-Max train on user data?

Alibaba Cloud's standard DashScope API terms do not explicitly guarantee zero data retention, and users should review the current DashScope privacy policy before sending sensitive data. Alibaba has indicated that API inputs are not used to train production models under its standard enterprise agreement, but this is not identical to a formally certified zero-retention policy. Enterprise customers can negotiate custom data handling agreements with Alibaba Cloud for stricter retention controls and audit logging. Data sent to Together AI or Fireworks AI endpoints is subject to those providers' separate privacy policies, which generally offer stronger data isolation and US-region residency. Qwen3.7-Max does not carry SOC 2 Type II, ISO 27001, or HIPAA certifications directly through DashScope at launch; Together AI and Fireworks offer their own compliance coverage that may satisfy these requirements. Teams with GDPR or HIPAA obligations should route through Together AI or Fireworks and verify their current certification status. The EU AI Act classification for Qwen3.7-Max has not been formally published by Alibaba as of May 2026.

Who is Qwen3.7-Max best for and who should avoid it?

Qwen3.7-Max is best for teams building text-based long-horizon autonomous agents, especially coding agents processing large codebases that need 1M context without chunking at affordable output pricing. Its 92.4 GPQA Diamond and 97.1 HMMT 2026 Feb scores make it a strong pick for scientific reasoning, quantitative analysis, and competition-level math pipelines. SWE-Multilingual at 78.3 makes it a strong choice for multilingual software engineering workflows. Teams building multimodal applications requiring vision, audio, or video input should use Qwen3.7-Plus instead, as Qwen3.7-Max will not process non-text inputs. Organizations with strict air-gapped or on-premise deployment requirements cannot use Qwen3.7-Max; Qwen3-32B under Apache 2.0 is the best open alternative. Real-time voice assistant teams will find the 2.72s TTFT latency and absence of audio I/O prohibitive. Teams prioritizing human preference alignment, safety certification, or general assistant polish should evaluate Claude Opus 4.7 or GPT-5.5, which rank higher on LM Arena overall despite higher output costs.

Visit Qwen3.7-Max Official Page