Qwen3.7-Plus: 1M Context Vision Agent at $0.40/M (2026)

Qwen3.7-Plus is Alibaba's 2026 multimodal agent model: 1M-token context, native vision and video input, $0.40/$1.60 per 1M tokens, AA Intelligence Index 39.

Qwen3.7-Plus is Alibaba Cloud's multimodal agent model, released June 1, 2026, with a 1M-token context window, native vision and video input, and a 39 score on the Artificial Analysis Intelligence Index. It costs $0.40 input / $1.60 output per 1M tokens, about one-sixth Qwen3.7-Max's rate, and supports the Anthropic API protocol for agent tooling like Claude Code.

Qwen3.7-Plus, released by Alibaba Cloud on June 1, 2026, is a multimodal agent model with a 1M-token context window and a 39 score on the Artificial Analysis Intelligence Index. It costs $0.40 per 1M input tokens and $1.60 per 1M output tokens, roughly a sixth of Qwen3.7-Max, and reads images and video as native input for screen-reading and tool-calling agents.

Provider: Alibaba Cloud · Family: Qwen3.7

Context window: 1,000,000 tokens · Max output: 32,000

Input modalities: text, image, video, tool-calls · Output: text, tool-calls

About Qwen3.7-Plus

Qwen3.7-Plus is Alibaba Cloud's multimodal agent model, unveiled in preview at the Alibaba Cloud Summit in Hangzhou on May 20, 2026 and shipped to general availability on Alibaba Cloud's Bailian platform (marketed internationally as Model Studio) on June 1-2, 2026. It is the perception-and-action sibling to the text-only flagship Qwen3.7-Max, built by the same Qwen team inside Alibaba Group. Where Max is optimized for raw text reasoning and coding throughput, Plus is designed around a single mandate: give an agent eyes. It accepts text, static images, and video as input and reasons over all three in the same context, while still only producing text as output. On benchmarks, Alibaba and Artificial Analysis report Qwen3.7-Plus scoring 39 on the Artificial Analysis Intelligence Index v4.1, a composite spanning GDPval-AA v2, Terminal-Bench 2.1, SciCode, Humanity's Last Exam, GPQA Diamond, and AA-LCR. That places it well above the tracked-model average of 16 and among the leading models in its price bracket, though Artificial Analysis also flags it as comparatively slow and verbose relative to peers at the same cost. Alibaba has not published a full breakdown by individual sub-benchmark (SWE-bench, AIME, MMLU-Pro) specifically for the Plus variant at the time of writing, unlike the more heavily benchmarked Max sibling which posts 92.4 on GPQA Diamond and 60.6% on SWE-bench Pro. The defining feature is that vision and video are first-class inputs, not an afterthought bolted onto a text model. Feed it a UI screenshot, a chart, a handwritten page, or a clip of a browser session, and it can extract structure, answer questions about the content, or feed what it sees into a longer chain of tool-calling logic. This is read-only visual understanding: the model does not generate images or video, only reasons about them. The model carries a 1,000,000-token context window, matching Qwen3.7-Max, letting an agent hold an entire multi-hour screen-recording transcript or a large multimodal document set in a single call without external retrieval. On capabilities, Qwen3.7-Plus is pitched squarely at agent workloads: deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration inside a single agent loop. It reads screens, navigates applications, writes code from visual templates or mockups, and invokes external tools without a human in the loop for each step. It supports the Anthropic Messages API protocol in addition to Alibaba's native Bailian/DashScope API, which lets it slot into Claude Code and other third-party coding tools that already speak that protocol. Pricing lists at $0.40 per 1M input tokens and $1.60 per 1M output tokens, with a lower cached-input rate around $0.08 per 1M tokens on repeat-context calls. That is roughly one-sixth the per-token cost of Qwen3.7-Max's $2.50/$7.50 rate card, which is the entire strategic point of the Plus tier: put multimodal agent capability within reach of budget-sensitive, high-volume pipelines rather than reasoning-per-token showcases. A worked example: a support pipeline reading 2,000 screenshots a day (roughly 500 tokens of image context each, 200 output tokens per response) runs well under $1/day at these rates, versus several times that on Max. Qwen3.7-Plus is available through Alibaba Cloud's Bailian/Model Studio platform and through third-party gateways including Fireworks, Together AI, and OpenRouter. Like Qwen3.7-Max, it ships closed-weight and API-only: there is no downloadable checkpoint and no open license attached to this variant, a departure from Alibaba's historical open-weight releases for smaller Qwen models. The Bailian platform layers built-in guardrails around autonomous tool use, capping what an agent session can do without an operational limit being hit, though Alibaba has not published a standalone system card with quantified refusal-rate or red-team figures for this specific model at time of writing. Who should reach for it: teams building screen-reading or GUI automation agents, visual QA pipelines, or multimodal document processing at volume, where Qwen3.7-Max's text-only input or GPT-5/Claude pricing would be overkill on cost. Teams that need heavily-published, independently verified reasoning benchmarks (SWE-bench Verified, AIME, MMLU-Pro) for a hard technical decision should look at Qwen3.7-Max or a competitor with fuller public benchmark disclosure instead, since Plus's public evaluation surface is thinner than its sibling's.

Pricing

$0.40 per 1M input tokens, $1.60 per 1M output tokens, roughly $0.08 per 1M cached input tokens. About one-sixth Qwen3.7-Max's per-token rate for the same 1M context window.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is Qwen3.7-Plus and who built it?

Qwen3.7-Plus is a multimodal agent model built by the Qwen team at Alibaba Cloud, previewed May 20, 2026 at the Alibaba Cloud Summit in Hangzhou and shipped to general availability on June 1, 2026 via the Bailian platform, marketed internationally as Model Studio. It is a mixture-of-experts transformer sized to sit below the flagship Qwen3.7-Max in Alibaba's lineup, trading some raw text-reasoning ceiling for native vision and video understanding at a much lower price. It was designed to give agents 'eyes': the ability to read UI screenshots, charts, and video frames and act on what they see, rather than reasoning over text alone. It scores 39 on the Artificial Analysis Intelligence Index, above the 16-point average for tracked models. Alibaba built it specifically to compete for high-volume, budget-sensitive agent workloads rather than to top pure-reasoning leaderboards, a role Qwen3.7-Max is aimed at instead. It carries a 1M-token context window and $0.40/$1.60 per 1M token pricing.

How much does Qwen3.7-Plus cost per 1M tokens?

Qwen3.7-Plus is priced at $0.40 per 1M input tokens and $1.60 per 1M output tokens, with a lower cached-input rate around $0.08 per 1M tokens for repeat-context calls. That is roughly one-sixth the per-token cost of its sibling Qwen3.7-Max, which lists at $2.50/$7.50 per 1M tokens for the same 1M context ceiling. A concrete example: a support pipeline reading 2,000 UI screenshots a day, at roughly 500 input tokens of image context and 200 output tokens per response, runs under $1 a day at these rates. A 500K-input/50K-output-token daily coding agent workload costs roughly $0.28 a day. No batch API discount tier has been publicly disclosed for this variant, unlike some competitor models. There is no self-hosting option, so there is no infrastructure cost alternative to the per-token rate; the model is API-only through Alibaba Cloud Bailian and third-party gateways like OpenRouter and Fireworks.

What is Qwen3.7-Plus's context window and max output?

Qwen3.7-Plus carries a 1,000,000-token context window, matching its Qwen3.7-Max sibling. That is large enough to hold an entire multimodal document set, hours of screen-recording transcript, or a long multi-turn agent session in a single call without external retrieval. Alibaba has not published an independently verified needle-in-haystack recall result specifically for this variant, so long-context recall reliability above 500K tokens is unverified rather than confirmed. Max output token limits have not been precisely disclosed by Alibaba for this model; third-party API docs list output caps in the tens of thousands of tokens depending on the gateway. Compared to competitors, this 1M window matches Qwen3.7-Max and exceeds most competing multimodal models in its price bracket, most of which cap out around 128K-200K tokens. Document handling includes multi-image and video-frame inputs processed alongside text in the same context.

How does Qwen3.7-Plus compare on benchmarks vs Qwen3.7-Max?

Qwen3.7-Plus scores 39 on the Artificial Analysis Intelligence Index, well below Qwen3.7-Max's 56.6 on the same index, reflecting that Plus trades reasoning ceiling for multimodal capability and lower price. Qwen3.7-Max separately posts 92.4 on GPQA Diamond and 60.6% on SWE-bench Pro; Alibaba has not published an equivalent sub-benchmark table for Plus, so a like-for-like SWE-bench or GPQA comparison between the two variants is not currently possible from public data. In practice this means Max should be the pick for hard reasoning and coding tasks judged purely on published benchmarks, while Plus is the pick when the task genuinely needs vision or video input and the budget doesn't support Max's $2.50/$7.50 rate. Neither variant currently discloses a fully independent, third-party-verified benchmark suite covering both models on identical tasks, which is a real gap for anyone trying to make a rigorous side-by-side technical decision. The intelligence-index gap (39 vs 56.6) is the clearest public signal of the tradeoff.

Is Qwen3.7-Plus open source or proprietary?

Qwen3.7-Plus is proprietary and API-only. There is no downloadable checkpoint, no Hugging Face weights release, and no open license attached to this variant, unlike many smaller Qwen models (such as Qwen3.6-35B-A3B) that Alibaba has released with open weights. It ships exclusively through Alibaba Cloud's Bailian platform (Model Studio internationally) and through third-party API gateways including OpenRouter, Fireworks AI, and Together AI. This closed-weight approach is a departure from Alibaba's historical open-source-first strategy for the Qwen line and mirrors the same decision made for the Qwen3.7-Max flagship released alongside it. Commercial use is governed entirely by Alibaba Cloud's Model Studio API terms rather than a redistributable license file, since there are no weights to redistribute. Teams needing a self-hostable or fine-tunable Qwen model should look to the openly licensed Qwen3.6 series instead.

What modalities does Qwen3.7-Plus support?

Qwen3.7-Plus accepts text, static images, and video as input, and produces only text as output; it does not generate images or video. Vision is treated as a first-class input channel: UI screenshots, charts, handwritten pages, and video frames can be reasoned about directly rather than requiring a separate OCR or captioning step first. Its agent capabilities include deep reasoning, self-programming, tool invocation, verification and testing, and autonomous iteration inside a single loop, letting it read a screen, decide on an action, call a tool, and check the result without a human step in between. It supports both Alibaba's native Bailian/DashScope API and the Anthropic Messages API protocol, which lets it plug into Claude Code and other tools already built for that schema. There is no audio input or output support disclosed for this variant. Compared to Qwen3.7-Max, which is text-only, the addition of vision and video is the entire reason Plus exists as a separate model rather than a cheaper tier of the same one.

Does Qwen3.7-Plus train on user data?

Alibaba has not published a detailed, specific data retention or training-on-inputs policy for the Qwen3.7-Plus API at the time of writing, so this should be treated as unverified rather than confirmed either way. The Bailian platform (Model Studio) provides built-in safety guardrails that constrain what an autonomous agent session can do operationally, but no quantified refusal-rate, red-team partner list, or system card has been made public for this specific variant, unlike the fuller disclosure some Western frontier labs publish. No SOC 2, ISO 27001, HIPAA, or GDPR compliance certification has been publicly confirmed for this model as of this writing. Data residency is available through both Alibaba's China region (cn-beijing) and its international Singapore-based endpoint (ap-southeast-1), giving some regional data-handling choice even without a full compliance disclosure. Enterprise teams with strict data-governance requirements should request Alibaba's current terms directly before committing sensitive workloads, since the public documentation gap here is real.

Who is Qwen3.7-Plus best for and who should avoid it?

Qwen3.7-Plus is best for teams building screen-reading or GUI automation agents, high-volume visual QA and UI-regression pipelines, and multimodal document or video-transcript analysis, where its 1M-token context and native vision/video input pay off at a sixth of Qwen3.7-Max's per-token cost. It should be avoided by teams making a hard technical decision purely on published reasoning benchmarks (SWE-bench, AIME, MMLU-Pro), since Alibaba has only fully disclosed those numbers for the Max variant, not Plus. It is also a poor fit for latency-sensitive real-time applications, since Artificial Analysis rates it as comparatively slow and verbose relative to peers at its price point, and for teams needing self-hosted or air-gapped deployment, since it ships closed-weight and API-only with no downloadable checkpoint. Competitors worth considering instead: Qwen3.7-Max for pure text-reasoning ceiling at higher cost, or GPT-5 mini / Gemini 2.5 Flash for teams wanting a comparably priced vision model with fuller independent benchmark disclosure.

Visit Qwen3.7-Plus Official Page