Name: Qwen3.7-Max Review: 1M Context and 92.4 GPQA Diamond (2026)
Brand: Alibaba Cloud
Price: 2.50 USD
Availability: InStock

Question 1

What is Qwen3.7-Max and who built it?

Accepted Answer

Qwen3.7-Max is a proprietary large language model built by Alibaba Cloud's Qwen team and released on May 19, 2026, with the official announcement at the Alibaba Cloud Summit in Hangzhou the following day. It is the flagship text-and-reasoning model of the Qwen3.7 series, sitting above the multimodal Qwen3.7-Plus released June 3, 2026. The architecture is Mixture-of-Experts with an estimated parameter count exceeding 1 trillion, following the MoE lineage of Qwen3-235B-A22B. Alibaba positioned it as an 'Agent Frontier' model designed for long-horizon autonomous execution, multi-step reasoning, and sustained tool-use workflows. On benchmarks, it scores 92.4 on GPQA Diamond, 60.6 on SWE-Pro, and 78.3 on SWE-Multilingual, placing it competitively among frontier models for reasoning and coding at mid-tier pricing. The context window is 1,000,000 tokens with up to 65,536 output tokens per request. Standard pricing is $2.50 per 1M input tokens and $7.50 per 1M output tokens on Alibaba Cloud DashScope.

Question 2

How much does Qwen3.7-Max cost per 1M tokens?

Accepted Answer

Qwen3.7-Max is priced at $2.50 per 1M input tokens and $7.50 per 1M output tokens at standard rates on Alibaba Cloud DashScope, live from May 19, 2026. Alibaba ran a 50% launch promotion from May 19 through June 22, 2026, at $1.25 input and $3.75 output per 1M. Cached input reads cost $0.25 per 1M tokens, a 90% discount over the standard input rate, making prompt caching highly cost-effective for long-context agent loops. Summarizing a 200K-token document costs roughly $0.50 input plus $0.08 output at standard rates. A daily coding agent processing 1M tokens in and 100K out costs approximately $3.25 per day. A customer support pipeline at 1,000 turns averaging 3K input and 800 output tokens per turn costs roughly $8.10 per day. Compared to Claude Opus 4.7 at $15/M output and GPT-5 at $30-75/M, Qwen3.7-Max is one of the most affordable frontier-tier options for output-heavy agentic workloads.

Question 3

What is Qwen3.7-Max's context window and max output?

Accepted Answer

Qwen3.7-Max supports a context window of 1,000,000 tokens with a maximum output of 65,536 tokens per request, confirmed by Alibaba Cloud DashScope documentation from May 2026. Effective max input is 991,800 tokens after internal formatting overhead. This is a fourfold increase over Qwen3-Max's 262,144-token window, enabling processing of full codebases, multi-session agent transcripts, and long legal documents in a single API call. Prompt caching applies to repeated system prompt content at $0.25 per 1M cached tokens, a 90% discount over standard input pricing. There is no separate extended-context tier; all API calls access the full 1M context at the same per-token rate. Compared to Claude Opus 4.7 at 200K and GPT-5 at 128K, the 1M window is a meaningful differentiator for large-scale agentic workflows. Long-context recall accuracy has not been independently benchmarked by third parties at the time of writing.

Question 4

How does Qwen3.7-Max compare on benchmarks vs Claude Opus 4.7?

Accepted Answer

On GPQA Diamond, Qwen3.7-Max scores 92.4, placing it above Claude Opus 4.7 (approximately 78-80) on this reasoning benchmark. For competition-level math, Qwen3.7-Max achieves 97.1 on HMMT 2026 Feb and 90 on IMOAnswerBench, putting it among the strongest models available for hard reasoning at its price point. On agentic coding, it scores 60.6 on SWE-Pro and 78.3 on SWE-Multilingual, though Claude Opus 4.7 leads on standard SWE-bench Verified. On LM Arena as of May 2026, Qwen3.7-Max-Preview held approximately Elo 1,475 (ranked 13th), while Claude Opus 4.7 ranks higher in overall human preference. The Artificial Analysis Intelligence Index places Qwen3.7-Max at 56.6. At $7.50/M output versus $15/M for Claude Opus 4.7, Qwen3.7-Max delivers competitive reasoning at roughly half the output cost. Teams that primarily care about GPQA Diamond or competition math will find Qwen3.7-Max leads; teams prioritizing general assistant quality, tool orchestration reliability, and safety alignment will likely prefer Claude Opus 4.7.

Question 5

Is Qwen3.7-Max open source or proprietary?

Accepted Answer

Qwen3.7-Max is fully proprietary and API-only: Alibaba has not released weights, architecture blueprints, or training code for this specific model. This distinguishes it from Alibaba's open-weight Qwen3 models (Qwen3-32B, Qwen3-14B, Qwen3-8B, Qwen3-4B, Qwen3-1.7B, Qwen3-0.6B) released under Apache 2.0 on Hugging Face and Ollama. Access to Qwen3.7-Max requires an API key from one of four providers: Alibaba Cloud DashScope, OpenRouter (qwen/qwen3.7-max), Together AI, or Fireworks AI. There are no self-hosting options, no VRAM requirements to consider, and no fine-tuning support for the Max variant. Alibaba has not indicated a timeline for open-weight release of Qwen3.7-Max. Teams needing open weights for self-hosting or compliance reasons should use Qwen3-32B (Apache 2.0) or Qwen3-235B-A22B on Hugging Face. Commercial use is permitted under Alibaba Cloud DashScope Terms of Service.

Question 6

What modalities does Qwen3.7-Max support?

Accepted Answer

Qwen3.7-Max is a text-only model: it accepts text input and structured tool call inputs, and produces text output and tool call outputs. It does not support image, audio, video, or PDF inputs. For multimodal needs within the Qwen3.7 family, Alibaba released Qwen3.7-Plus on June 3, 2026, which adds image and video understanding, deep reasoning, self-programming, tool invocation, verification testing, and autonomous iteration. Qwen3.7-Max supports native function calling with OpenAI-compatible tool schemas, structured JSON output, and parallel tool calls. Extended thinking mode is available per request, generating visible chain-of-thought before the final answer. Computer use or screen-reading capabilities are not documented for this model. Audio workflows require a separate ASR model to transcribe speech before passing text to Qwen3.7-Max. The model is purpose-built for text-based agentic loops, not multimodal pipelines.

Question 7

Does Qwen3.7-Max train on user data?

Accepted Answer

Alibaba Cloud's standard DashScope API terms do not explicitly guarantee zero data retention, and users should review the current DashScope privacy policy before sending sensitive data. Alibaba has indicated that API inputs are not used to train production models under its standard enterprise agreement, but this is not identical to a formally certified zero-retention policy. Enterprise customers can negotiate custom data handling agreements with Alibaba Cloud for stricter retention controls and audit logging. Data sent to Together AI or Fireworks AI endpoints is subject to those providers' separate privacy policies, which generally offer stronger data isolation and US-region residency. Qwen3.7-Max does not carry SOC 2 Type II, ISO 27001, or HIPAA certifications directly through DashScope at launch; Together AI and Fireworks offer their own compliance coverage that may satisfy these requirements. Teams with GDPR or HIPAA obligations should route through Together AI or Fireworks and verify their current certification status. The EU AI Act classification for Qwen3.7-Max has not been formally published by Alibaba as of May 2026.

Question 8

Who is Qwen3.7-Max best for and who should avoid it?

Accepted Answer

Qwen3.7-Max is best for teams building text-based long-horizon autonomous agents, especially coding agents processing large codebases that need 1M context without chunking at affordable output pricing. Its 92.4 GPQA Diamond and 97.1 HMMT 2026 Feb scores make it a strong pick for scientific reasoning, quantitative analysis, and competition-level math pipelines. SWE-Multilingual at 78.3 makes it a strong choice for multilingual software engineering workflows. Teams building multimodal applications requiring vision, audio, or video input should use Qwen3.7-Plus instead, as Qwen3.7-Max will not process non-text inputs. Organizations with strict air-gapped or on-premise deployment requirements cannot use Qwen3.7-Max; Qwen3-32B under Apache 2.0 is the best open alternative. Real-time voice assistant teams will find the 2.72s TTFT latency and absence of audio I/O prohibitive. Teams prioritizing human preference alignment, safety certification, or general assistant polish should evaluate Claude Opus 4.7 or GPT-5.5, which rank higher on LM Arena overall despite higher output costs.

Qwen3.7-Max Review: 1M Context and 92.4 GPQA Diamond (2026)

About Qwen3.7-Max

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions