Qwen: Alibaba Open-Source LLM May 2026
Last updated: 2026-05-19
Alibaba's AI assistant with 203M monthly active users, supporting 119+ languages, multimodal input, and models up to 1M token context windows.
Qwen is Alibaba's open-source LLM series, available free under Apache 2.0 for commercial use. Qwen 3.6, released May 2026, features Mixture-of-Experts architecture delivering 2x throughput versus Qwen 2.5, with a 1M token context window, 119+ language support, and six model size tiers. Ideal for enterprises deploying custom AI without per-token licensing costs.
About Qwen
Qwen (short for Tongyi Qianwen) is Alibaba Cloud's family of large language models and chat assistant, first launched in beta in April 2023 and opened to the public in September 2023. It reached 203 million monthly active users by February 2026, a 554% spike in a single month, and became the most-downloaded open-weight model family on Hugging Face with over 700 million downloads by January 2026, surpassing Meta's Llama in cumulative downloads.
The model family spans 0.5 billion to over 1 trillion parameters, with both dense and mixture-of-experts (MoE) architectures. The flagship Qwen3-235B model activates only 22 billion parameters per generation step, keeping inference costs low while delivering competitive results. A standout design choice is the hybrid thinking mode: users can toggle between fast non-thinking mode for quick answers and a slower deliberate reasoning mode for complex math, code, or analysis tasks. The Qwen 3.6 Plus Preview extends context to 1 million tokens and matches GPT-5 mini on SWE-bench Verified at 72.4.
Qwen Chat is the consumer-facing product, available on web at chat.qwen.ai, and via native apps for iOS, Android, Windows, and macOS. It handles text chat, document processing, image understanding, image generation, video understanding, web search, and code execution in a single interface. The underlying Qwen2.5-Coder model was trained on 5.5 trillion tokens and supports 92 programming languages.
API access runs through Alibaba Cloud's Model Studio (DashScope), which also offers an OpenAI-compatible endpoint. Qwen-Flash costs $0.10 per million input tokens, Qwen-Plus costs $0.40 per million input tokens, and Qwen-Max starts at $1.20 per million input tokens. All new API accounts get 1 million free tokens per model valid for 90 days. Over 90,000 enterprises have adopted Qwen models via Model Studio.
Qwen models are released under Apache 2.0, letting developers self-host or fine-tune without licensing restrictions. The Qwen Agent framework provides tooling for building multi-step AI workflows. Alibaba released Qwen 3.6-Plus on April 2, 2026, adding stronger coding and agent capabilities, continuing a rapid release cadence that has kept Qwen competitive against Western frontier models despite US chip export restrictions.
Pricing
Free tier: 1M tokens per model for 90 days after activating Model Studio. API pricing: Qwen-Flash $0.10/M input, $0.40/M output. Qwen-Plus $0.40/M input, $1.20/M output (non-thinking). Qwen-Max $1.20/M input, $6.00/M output (0-32K). 50% batch discount available. Qwen Chat consumer app is free to use.
Key Features
- Hybrid Thinking Mode: Users can switch between fast non-thinking mode for quick answers and a slow deliberate reasoning mode for complex tasks, giving direct control over the speed-vs-depth tradeoff in a single toggle.
- 1M Token Context Window: Qwen 3.6 Plus Preview supports a 1-million-token context window with up to 65,536 output tokens, enabling full-codebase analysis or book-length document processing in a single call.
- Multimodal Input: Text, Image, Audio, Video: Qwen Chat handles text, image understanding via Qwen2.5-VL, audio input via Qwen2.5-Audio (speech, music, natural sounds), and video understanding, all within the same chat interface.
- 119+ Language Support: Qwen3 models support 119 languages and dialects, with Qwen3.5 extending coverage to 201 languages, making it one of the broadest multilingual LLM families available.
- Coding Across 92 Programming Languages: Qwen2.5-Coder, trained on 5.5 trillion tokens, supports 92 programming languages and achieved 69.6% on SWE-bench Verified, placing it among the top open-weight coding models.
- Qwen 3.6 MoE Architecture with 2x Throughput: Qwen 3.6 uses mature Mixture-of-Experts sparse activation with AITemplate kernel fusion, delivering approximately 2x inference throughput compared to Qwen 2.5 on comparable hardware.
- OpenAI-Compatible API: Alibaba's DashScope provides an OpenAI-compatible Chat Completions endpoint, so existing OpenAI SDK integrations can switch to Qwen models by changing the base URL and API key.
- Qwen 3.6: 6 Size Tiers, Apache 2.0: Qwen 3.6 was released in May 2026 across six model size tiers under Apache 2.0, enabling royalty-free commercial deployment for businesses matching compute budgets to task complexity.
Pros
- Most-downloaded open-weight model family globally with 700M Hugging Face downloads by January 2026, giving it a large community, broad fine-tune availability, and active third-party support.
- API pricing is significantly cheaper than frontier alternatives: Qwen-Flash at $0.10/M input tokens is roughly 40x cheaper than GPT-4o, making it practical for high-volume production workloads.
- Apache 2.0 license means no vendor lock-in for self-hosted deployments, unlike Llama's custom license that restricts commercial use above 700M monthly active users.
- Hybrid thinking mode lets developers choose reasoning depth per query, which is a concrete cost-saving lever not available in GPT-4o or base Claude models.
- Qwen 3.6 Plus Preview ties GPT-5 mini on SWE-bench Verified at 72.4, showing frontier coding performance without frontier pricing.
Cons
- Qwen models refuse to answer questions about topics that conflict with the Chinese government's political positions, including Taiwan's government status and certain historical events, which limits use in journalism or political research.
- Image generation quality lags behind dedicated image models like Stable Diffusion or DALL-E 3, based on independent reviewer tests in 2025.
- Debugging and modifying existing codebases is weaker than writing new code from scratch, with reviewers noting fumbled refactors that a tool like Claude Sonnet handles better.
- The best API features and free-tier quotas are primarily available in the Singapore region; Global deployment has no free quota and Chinese Mainland endpoints are restricted to China-registered accounts.
Visit Qwen Official Website