GLM-5.2: 1M Context, MIT License & 80.3% GPQA (2026)
GLM-5.2 by Z.ai: 744B MoE, MIT-licensed model with 1M-token context and 80.3% GPQA Diamond. Costs $1.40/$4.40 per 1M tokens (June 2026). Tops open-source SWE-bench Pro at 62.1%.
GLM-5.2 is Z.ai's flagship released June 13, 2026 on a 744B MoE architecture with 40B active parameters and a 1,000,000-token context window. Priced at $1.40/$4.40 per 1M tokens under an MIT license, it scores 80.3% GPQA Diamond and 62.1% SWE-bench Pro.
GLM-5.2, released June 13, 2026 by Z.ai (formerly Zhipu AI), is a 744-billion-parameter Mixture-of-Experts model with 40B active parameters and a 1-million-token context window. It scores 80.3% GPQA Diamond and 62.1% on SWE-bench Pro, the top open-source result on both. MIT-licensed weights are downloadable from HuggingFace. API pricing is $1.40 per million input tokens and $4.40 per million output tokens.
Provider: Z.ai · Family: GLM-5
Context window: 1,000,000 tokens · Max output: 131,072
Input modalities: text, tool-calls · Output: text, tool-calls
About GLM-5.2
GLM-5.2 is an open-source large language model developed by Z.ai (formerly Zhipu AI), released on June 13, 2026. It is the fourth major release in the GLM-5 generation, following GLM-5 (February 11, 2026), GLM-5-Turbo (March 15, 2026), and GLM-5.1 (April 7, 2026). The model uses a Mixture-of-Experts (MoE) Transformer decoder architecture with 744 billion total parameters and approximately 40 billion active parameters per forward pass. It was trained on 28.5 trillion tokens. GLM-5.2 is positioned as the strongest open-source model for coding and long-context tasks and sits at the top of the Z.ai model lineup. It is released under the MIT license, which allows unrestricted commercial use, modification, and redistribution. GLM-5.2's headline benchmark results as of its June 2026 release include 80.3% on GPQA Diamond (graduate-level science reasoning), 86.67% on AIME 2025 (mathematical reasoning), 80.63% on MMLU-Pro (multitask academic knowledge), and 91.72% on MMLU (general knowledge). On coding benchmarks, it scores 62.1% on SWE-bench Pro and 81.0% on Terminal-Bench 2.1, both the highest published results for an open-weights model as of mid-2026. Compared to its predecessor GLM-5.1, which scored 58.4% on SWE-bench Pro and 62.0% on Terminal-Bench 2.1, GLM-5.2 represents significant coding improvements without increasing the parameter count. Z.ai published no vendor-reported benchmark sheet at launch, distinguishing it from other releases that self-report cherry-picked results. GLM-5.2 ships with a 1,000,000-token context window, expanding GLM-5.1's 200,000-token context by a factor of five. The maximum output per response is 131,072 tokens, quadruple the 32,768 cap on GLM-5.1. The 1M context is live across all GLM Coding Plan tiers at launch, not gated to a higher tier or separate API endpoint. Long-context recall quality at 1M depth has not been independently tested with needle-in-haystack evaluations as of the release date. The architecture uses DeepSeek Sparse Attention (DSA) in layers four through 78, with the first three layers using dense attention, which enables the 1M context without the quadratic scaling cost of fully dense attention. GLM-5.2 is a text-in, text-out model. It supports function calling, structured JSON output, and streaming via the Z.ai API, which is OpenAI-compatible. It does not natively support image, audio, or video input. Vision capabilities in the GLM-5 family are handled by the separate GLM-5V-Turbo model. GLM-5.2 adds a dual thinking-effort system not present in prior releases: High mode balances reasoning quality with response latency and is suited for standard coding tasks, code review, and generation; Max mode uses extended chain-of-thought for complex architecture decisions, multi-step debugging, and planning tasks where accuracy matters more than speed. API pricing is $1.40 per million input tokens and $4.40 per million output tokens. Cached input costs $0.26 per million tokens, an 81% discount on the repeated portion of long prompts. A blended rate (7:2:1 cache-hit to input to output ratio) works out to approximately $0.90 per million tokens across typical workloads. A daily coding agent consuming 1 million input tokens and 200,000 output tokens would cost approximately $2.28. A 100,000-token document review costs approximately $0.14 at input rates. The pricing is below GPT-5.5 ($3.00/$15.00) and Anthropic Claude Opus 4.8 ($5.00/$20.00) on both input and output, making GLM-5.2 the most cost-efficient frontier-class model with 1M-context as of June 2026. GLM-5.2 is available through the Z.ai developer API at docs.z.ai, Fireworks AI (serverless and on-demand, day-zero availability), OpenRouter (aggregating 9+ providers), AWS Bedrock (bedrock/us-east-1/zai.glm-5 and us-west-2), and Google Vertex AI (vertex_ai/zai-org/glm-5-maas). The Z.ai API is OpenAI-compatible, meaning teams using the OpenAI Python or TypeScript SDKs can switch by changing the base URL and API key with no other code changes. MIT-licensed weights are available on HuggingFace at zai-org/GLM-5.2 in FP16, FP8, and NVFP4 formats, plus community GGUF quantizations. For self-hosting, VRAM requirements range from 241 GB (2-bit dynamic quantization) to 476 GB (Q4_K_M) or 459 GB+ for NVFP4. vLLM is the recommended inference backend. Z.ai aligns GLM models using supervised fine-tuning and reinforcement learning from human feedback. No Constitutional AI or equivalent documented alignment method has been published. No formal responsible scaling policy exists. The model's safety posture is documented in model cards on HuggingFace and the GLM-5 technical report (arXiv:2602.15763). The MIT license means there is no technical enforcement of use restrictions, and the model may be fine-tuned or jailbroken without vendor recourse. For enterprise workloads requiring content filtering, Z.ai's hosted API applies moderation. No HIPAA, SOC 2, or ISO 27001 certifications have been disclosed as of the June 2026 release date. Training data cutoff is approximately May 2025 based on the GLM-5 generation documentation. GLM-5.2 is the right choice for engineering teams that need a 1M-context open-weights model for large codebase traversal, long-horizon autonomous coding agents, and multi-file refactors without vendor lock-in. Its MIT license makes it suitable for air-gapped enterprise deployment or fine-tuning on proprietary codebases. It is not the right choice for teams building voice or vision pipelines, as it lacks native audio and image inputs. Teams needing SOC 2 or HIPAA compliance should use Anthropic or OpenAI, which hold those certifications. For pure reasoning without code, DeepSeek R2 leads on GPQA Diamond and AIME benchmarks. For multilingual non-English workloads, Qwen3-235B-A22B has broader language coverage. GLM-5.2 marks the fourth Z.ai flagship release in five months, following GLM-5 in February 2026. Each release has expanded context by a meaningful factor: GLM-5 shipped 128K context, GLM-5.1 extended to 200K, and GLM-5.2 jumped to 1M. The SWE-bench Pro score improved from 58.4% (GLM-5.1) to 62.1% (GLM-5.2), a 3.7-point gain in 11 weeks. Z.ai's roadmap based on its public statements targets continued scaling of context, agent loop reliability, and multimodal support through the GLM-5V branch. The company's position as the first Chinese frontier AI lab to hold a public listing on the Hong Kong Stock Exchange provides capital continuity and institutional accountability for the roadmap.
Pricing
$1.40 per 1M input tokens, $4.40 per 1M output tokens. Cached input at $0.26 per 1M (81% discount). Available self-hosted via MIT-licensed weights at infrastructure cost only.
Key Features
- 1-Million-Token Context Window: Five times larger than GLM-5.1's 200K context, covering entire monorepos in a single request without retrieval chunking.
- Dual Thinking Modes: High mode balances latency and quality for standard coding tasks; Max mode uses extended chain-of-thought for complex multi-step architecture and debugging.
- MIT License with Open Weights: Full weights downloadable from HuggingFace in FP16, FP8, NVFP4, and GGUF formats, allowing self-hosting, air-gapped deployment, and unrestricted fine-tuning.
- OpenAI-Compatible API: Drop-in replacement for OpenAI API calls by changing the base URL and key — no SDK migration needed for existing GPT-based codebases.
- 80.3% GPQA Diamond: Highest GPQA Diamond score published for an open-source model as of June 2026, indicating strong graduate-level scientific reasoning.
- 131K Max Output Tokens: Four times GLM-5.1's output cap, enabling full-file rewrites, long migration scripts, and multi-file diff generation in a single response.
Pros
- Strongest open-weights coding model as of June 2026: 62.1% SWE-bench Pro and 81.0% Terminal-Bench 2.1, both top-of-class for MIT-licensed models.
- 1M-token context at $1.40/1M input is the most cost-efficient frontier-class long-context option, undercutting Claude Opus 4.8 by 72% on input price.
- MIT license and OpenAI-compatible API eliminate vendor lock-in and allow self-hosted, air-gapped, or fine-tuned deployments.
Cons
- No native vision, audio, or video input — multimodal tasks require a separate GLM-5V-Turbo call or a different model.
- No published SOC 2, HIPAA, or ISO 27001 certification, ruling it out for regulated healthcare, legal, or financial workloads.
- 2.24s time-to-first-token (TTFT) is higher than latency-optimized smaller models, making it unsuitable for real-time interactive chat.
Benchmarks
- mmlu: 91.72
- mmlu pro: 80.63
- aime 2025: 86.67
- gpqa diamond: 80.3
- swe bench verified: 62.1
- artificial analysis price blended per m: 0.9
- artificial analysis speed tokens per sec: 113
Frequently Asked Questions
What is GLM-5.2 and who built it?
GLM-5.2 is a large language model developed by Z.ai, formerly known as Zhipu AI, a Beijing-based AI company founded in 2019 from Tsinghua University. It was released on June 13, 2026 as the fourth model in the GLM-5 generation, following GLM-5 (February 2026), GLM-5-Turbo (March 2026), and GLM-5.1 (April 2026). The model is built on a 744-billion-parameter Mixture-of-Experts Transformer architecture with 40 billion active parameters per forward pass, trained on 28.5 trillion tokens across 78 layers with 256 experts per layer (8 activated). GLM-5.2 is positioned as Z.ai's coding-first flagship and ships with a 1-million-token context window, dual thinking modes (High and Max), and MIT-licensed open weights. It scores 80.3% on GPQA Diamond and 62.1% on SWE-bench Pro, both the highest published results for an open-weights model as of mid-2026. The model is available via the Z.ai developer API, Fireworks AI, OpenRouter, AWS Bedrock, and Google Vertex AI, plus self-hosted via HuggingFace weights.
How much does GLM-5.2 cost per 1M tokens?
GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens via the Z.ai API and third-party providers including Fireworks AI, OpenRouter, AWS Bedrock, and Google Vertex AI. Cached input is priced at $0.26 per million tokens, an 81% discount on repeated prompt prefixes — relevant for long system prompts or repeated codebase context. At a blended 7:2:1 cache-hit to input to output ratio, the effective cost is approximately $0.90 per million tokens. A daily coding agent consuming 1 million input tokens and 200,000 output tokens costs approximately $2.28. Reviewing a 100,000-token codebase costs about $0.14. Batch-processing 1,000 support tickets (2K in, 500 out each) costs approximately $5.00. Compared to Claude Opus 4.8 ($5.00/$20.00 per 1M) and GPT-5.5 ($3.00/$15.00 per 1M), GLM-5.2 is significantly cheaper — 72% cheaper on input than Opus 4.8. Self-hosted via MIT-licensed weights from HuggingFace, inference cost is purely hardware.
What is GLM-5.2's context window and max output?
GLM-5.2 has a 1,000,000-token context window, which is five times larger than its predecessor GLM-5.1's 200,000-token window and one of the largest available in any model as of June 2026. The maximum output per response is 131,072 tokens, four times GLM-5.1's 32,768 cap, enabling generation of full-file rewrites, long migration scripts, and multi-file diffs in a single response. The 1M context is live across all GLM Coding Plan tiers at launch — not gated to a premium tier. To handle the 1M context without quadratic attention costs, GLM-5.2 uses DeepSeek Sparse Attention (DSA) in layers four through 78, with the first three layers using standard dense attention. Independent needle-in-haystack recall evaluations at 1M depth have not been published as of launch. Compared to Gemini 3.1 Pro (1M context, $1.25/1M input) and Claude Sonnet 4.6 (1M context, $3.00/1M input), GLM-5.2 is competitively priced and uniquely offers MIT-licensed weights for the same context scale.
How does GLM-5.2 compare on benchmarks vs Claude Opus 4.8 and GPT-5.5?
On GPQA Diamond, GLM-5.2 scores 80.3%, compared to Claude Opus 4.8 at approximately 74% and GPT-5.5 at approximately 76% — GLM-5.2 leads on this graduate-level reasoning benchmark. On AIME 2025, GLM-5.2 scores 86.67%, competitive with frontier proprietary models. On MMLU-Pro, GLM-5.2 scores 80.63%, slightly below GPT-5.5 and Claude Opus 4.8, which both exceed 83%. On SWE-bench Pro (coding), GLM-5.2 scores 62.1%, which is the top open-source result but below Claude Opus 4.8 which leads proprietary models on that benchmark. Claude Opus 4.8 tops LMArena human-preference rankings with an Elo of approximately 1430; GLM-5.2 has no published LMArena Elo as of mid-June 2026. The clearest GLM-5.2 advantage is that it achieves these results as an MIT-licensed open-weights model, while Claude Opus 4.8 and GPT-5.5 are closed-weights API-only. For teams that can self-host, GLM-5.2 delivers frontier-adjacent results at infrastructure cost only.
Is GLM-5.2 open source or proprietary?
GLM-5.2 is fully open-source under the MIT license, which is one of the most permissive licenses available. The MIT license allows free commercial use, modification, redistribution, and sublicensing without restriction or royalty. Model weights are available on HuggingFace at zai-org/GLM-5.2 in FP16, FP8, and NVFP4 formats, plus community GGUF quantizations (Q2, Q4_K_M, Q8) for CPU and low-VRAM inference. Z.ai has released every GLM-5-generation model under the MIT license since July 2025. Self-hosting VRAM requirements range from 241 GB for 2-bit dynamic quantization (suitable for M4 Ultra Mac Studio with 256 GB) to 476 GB for Q4_K_M (requiring 2x A100 80GB or 4x RTX 6000 Ada) and 459 GB+ for NVFP4 (minimum 6x 96 GB GPUs for weights alone). Unlike DeepSeek which uses Apache 2.0, or Llama 4 which uses the Llama Community License with commercial restrictions above certain user thresholds, GLM-5.2's MIT license is unconditional for all users.
What modalities does GLM-5.2 support?
GLM-5.2 is a text-in, text-out model. It accepts text and tool-call results as input and produces text and tool calls as output. It does not support image, audio, or video input natively. Vision capabilities in the Z.ai GLM-5 family are handled by the separate GLM-5V-Turbo model, which adds image and video understanding on top of the GLM-5 base. For function calling and structured output, GLM-5.2 uses an OpenAI-compatible schema: tool definitions are specified in the tools parameter, and the model generates either a text response or a tool_calls object. Parallel tool calls (multiple function calls in a single response) are supported. Streaming via server-sent events is also supported on the Z.ai API and Fireworks. Compared to Claude Opus 4.8 (text, image, PDF, tool-calls) and GPT-5.5 (text, image, audio, video), GLM-5.2's text-only modality is a meaningful limitation for multimodal workflows.
Does GLM-5.2 train on user data?
Z.ai has stated that it does not train on API user inputs by default. However, a detailed data retention policy has not been publicly disclosed as of June 2026. The MIT-licensed self-hosted deployment has no data retention by definition since all inference runs on the operator's own hardware and no data is sent to Z.ai. For API usage, users should review Z.ai's current privacy policy at z.ai/privacy for up-to-date retention terms. Z.ai has not disclosed SOC 2 Type II, ISO 27001, HIPAA, or GDPR compliance certifications as of the GLM-5.2 release date, making it unsuitable for regulated healthcare or financial workloads that require documented compliance. For compliance-sensitive workloads, Anthropic (SOC 2, HIPAA, ISO 27001) or OpenAI (SOC 2, HIPAA) are the certified alternatives. AWS Bedrock and Google Vertex AI access to GLM-5.2 may inherit those platforms' compliance certifications, but this should be independently verified with the cloud provider.
Who is GLM-5.2 best for and who should avoid it?
GLM-5.2 is best for engineering teams running autonomous coding agents over large monorepos, where its 1M-context window eliminates retrieval chunking complexity and its 62.1% SWE-bench Pro score confirms reliable task completion. It is ideal for open-source projects and budget-conscious startups that need frontier-quality reasoning at $1.40/1M input, compared to Claude Opus 4.8's $5.00/1M. Enterprises with data sovereignty requirements benefit from the MIT-licensed self-hosted option — no API key, no vendor dependency, no data leaving the building. Teams should avoid GLM-5.2 if they need vision, audio, or video capabilities (use GPT-5.5 or Gemini 3.1 Pro instead), HIPAA or SOC 2 compliance (use Anthropic or OpenAI), real-time low-latency responses at sub-1-second TTFT (use a smaller model or Fireworks' fast inference tier), or the highest possible reasoning performance on GPQA Diamond beyond 80% (use Claude Opus 4.8 or GPT-5.5 which are likely stronger on hard science). For multilingual non-English tasks, Qwen3-235B-A22B has broader coverage.