Name: Qwen3.7-Plus: 1M Context Vision Agent at $0.40/M (2026)
Brand: Alibaba Cloud
Price: 0.40 USD
Availability: InStock

Question 1

What is Qwen3.7-Plus and who built it?

Accepted Answer

Qwen3.7-Plus is a multimodal agent model built by the Qwen team at Alibaba Cloud, previewed May 20, 2026 at the Alibaba Cloud Summit in Hangzhou and shipped to general availability on June 1, 2026 via the Bailian platform, marketed internationally as Model Studio. It is a mixture-of-experts transformer sized to sit below the flagship Qwen3.7-Max in Alibaba's lineup, trading some raw text-reasoning ceiling for native vision and video understanding at a much lower price. It was designed to give agents 'eyes': the ability to read UI screenshots, charts, and video frames and act on what they see, rather than reasoning over text alone. It scores 39 on the Artificial Analysis Intelligence Index, above the 16-point average for tracked models. Alibaba built it specifically to compete for high-volume, budget-sensitive agent workloads rather than to top pure-reasoning leaderboards, a role Qwen3.7-Max is aimed at instead. It carries a 1M-token context window and $0.40/$1.60 per 1M token pricing.

Question 2

How much does Qwen3.7-Plus cost per 1M tokens?

Accepted Answer

Qwen3.7-Plus is priced at $0.40 per 1M input tokens and $1.60 per 1M output tokens, with a lower cached-input rate around $0.08 per 1M tokens for repeat-context calls. That is roughly one-sixth the per-token cost of its sibling Qwen3.7-Max, which lists at $2.50/$7.50 per 1M tokens for the same 1M context ceiling. A concrete example: a support pipeline reading 2,000 UI screenshots a day, at roughly 500 input tokens of image context and 200 output tokens per response, runs under $1 a day at these rates. A 500K-input/50K-output-token daily coding agent workload costs roughly $0.28 a day. No batch API discount tier has been publicly disclosed for this variant, unlike some competitor models. There is no self-hosting option, so there is no infrastructure cost alternative to the per-token rate; the model is API-only through Alibaba Cloud Bailian and third-party gateways like OpenRouter and Fireworks.

Question 3

What is Qwen3.7-Plus's context window and max output?

Accepted Answer

Qwen3.7-Plus carries a 1,000,000-token context window, matching its Qwen3.7-Max sibling. That is large enough to hold an entire multimodal document set, hours of screen-recording transcript, or a long multi-turn agent session in a single call without external retrieval. Alibaba has not published an independently verified needle-in-haystack recall result specifically for this variant, so long-context recall reliability above 500K tokens is unverified rather than confirmed. Max output token limits have not been precisely disclosed by Alibaba for this model; third-party API docs list output caps in the tens of thousands of tokens depending on the gateway. Compared to competitors, this 1M window matches Qwen3.7-Max and exceeds most competing multimodal models in its price bracket, most of which cap out around 128K-200K tokens. Document handling includes multi-image and video-frame inputs processed alongside text in the same context.

Question 4

How does Qwen3.7-Plus compare on benchmarks vs Qwen3.7-Max?

Accepted Answer

Qwen3.7-Plus scores 39 on the Artificial Analysis Intelligence Index, well below Qwen3.7-Max's 56.6 on the same index, reflecting that Plus trades reasoning ceiling for multimodal capability and lower price. Qwen3.7-Max separately posts 92.4 on GPQA Diamond and 60.6% on SWE-bench Pro; Alibaba has not published an equivalent sub-benchmark table for Plus, so a like-for-like SWE-bench or GPQA comparison between the two variants is not currently possible from public data. In practice this means Max should be the pick for hard reasoning and coding tasks judged purely on published benchmarks, while Plus is the pick when the task genuinely needs vision or video input and the budget doesn't support Max's $2.50/$7.50 rate. Neither variant currently discloses a fully independent, third-party-verified benchmark suite covering both models on identical tasks, which is a real gap for anyone trying to make a rigorous side-by-side technical decision. The intelligence-index gap (39 vs 56.6) is the clearest public signal of the tradeoff.

Question 5

Is Qwen3.7-Plus open source or proprietary?

Accepted Answer

Qwen3.7-Plus is proprietary and API-only. There is no downloadable checkpoint, no Hugging Face weights release, and no open license attached to this variant, unlike many smaller Qwen models (such as Qwen3.6-35B-A3B) that Alibaba has released with open weights. It ships exclusively through Alibaba Cloud's Bailian platform (Model Studio internationally) and through third-party API gateways including OpenRouter, Fireworks AI, and Together AI. This closed-weight approach is a departure from Alibaba's historical open-source-first strategy for the Qwen line and mirrors the same decision made for the Qwen3.7-Max flagship released alongside it. Commercial use is governed entirely by Alibaba Cloud's Model Studio API terms rather than a redistributable license file, since there are no weights to redistribute. Teams needing a self-hostable or fine-tunable Qwen model should look to the openly licensed Qwen3.6 series instead.

Question 6

What modalities does Qwen3.7-Plus support?

Accepted Answer

Qwen3.7-Plus accepts text, static images, and video as input, and produces only text as output; it does not generate images or video. Vision is treated as a first-class input channel: UI screenshots, charts, handwritten pages, and video frames can be reasoned about directly rather than requiring a separate OCR or captioning step first. Its agent capabilities include deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration inside a single loop, letting it read a screen, decide on an action, call a tool, and check the result without a human step in between. It supports both Alibaba's native Bailian/DashScope API and the Anthropic Messages API protocol, which lets it plug into Claude Code and other tools already built for that schema. There is no audio input or output support disclosed for this variant. Compared to Qwen3.7-Max, which is text-only, the addition of vision and video is the entire reason Plus exists as a separate model rather than a cheaper tier of the same one.

Question 7

Does Qwen3.7-Plus train on user data?

Accepted Answer

Alibaba has not published a detailed, specific data retention or training-on-inputs policy for the Qwen3.7-Plus API at the time of writing, so this should be treated as unverified rather than confirmed either way. The Bailian platform (Model Studio) provides built-in safety guardrails that constrain what an autonomous agent session can do operationally, but no quantified refusal-rate, red-team partner list, or system card has been made public for this specific variant, unlike the fuller disclosure some Western frontier labs publish. No SOC 2, ISO 27001, HIPAA, or GDPR compliance certification has been publicly confirmed for this model as of this writing. Data residency is available through both Alibaba's China region (cn-beijing) and its international Singapore-based endpoint (ap-southeast-1), giving some regional data-handling choice even without a full compliance disclosure. Enterprise teams with strict data-governance requirements should request Alibaba's current terms directly before committing sensitive workloads, since the public documentation gap here is real.

Question 8

Who is Qwen3.7-Plus best for and who should avoid it?

Accepted Answer

Qwen3.7-Plus is best for teams building screen-reading or GUI automation agents, high-volume visual QA and UI-regression pipelines, and multimodal document or video-transcript analysis, where its 1M-token context and native vision/video input pay off at a sixth of Qwen3.7-Max's per-token cost. It should be avoided by teams making a hard technical decision purely on published reasoning benchmarks (SWE-bench, AIME, MMLU-Pro), since Alibaba has only fully disclosed those numbers for the Max variant, not Plus. It is also a poor fit for latency-sensitive real-time applications, since Artificial Analysis rates it as comparatively slow and verbose relative to peers at its price point, and for teams needing self-hosted or air-gapped deployment, since it ships closed-weight and API-only with no downloadable checkpoint. Competitors worth considering instead: Qwen3.7-Max for pure text-reasoning ceiling at higher cost, or GPT-5 mini / Gemini 2.5 Flash for teams wanting a comparably priced vision model with fuller independent benchmark disclosure.

Qwen3.7-Plus: 1M Context Vision Agent at $0.40/M (2026)

About Qwen3.7-Plus

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions