Name: GLM-5.2: 1M Context, MIT License & 80.3% GPQA (2026)
Brand: Z.ai
Price: 1.40 USD
Availability: InStock

Question 1

What is GLM-5.2 and who built it?

Accepted Answer

GLM-5.2 is a large language model developed by Z.ai, formerly known as Zhipu AI, a Beijing-based AI company founded in 2019 from Tsinghua University. It was released on June 13, 2026 as the fourth model in the GLM-5 generation, following GLM-5 (February 2026), GLM-5-Turbo (March 2026), and GLM-5.1 (April 2026). The model is built on a 744-billion-parameter Mixture-of-Experts Transformer architecture with 40 billion active parameters per forward pass, trained on 28.5 trillion tokens across 78 layers with 256 experts per layer (8 activated). GLM-5.2 is positioned as Z.ai's coding-first flagship and ships with a 1-million-token context window, dual thinking modes (High and Max), and MIT-licensed open weights. It scores 80.3% on GPQA Diamond and 62.1% on SWE-bench Pro, both the highest published results for an open-weights model as of mid-2026. The model is available via the Z.ai developer API, Fireworks AI, OpenRouter, AWS Bedrock, and Google Vertex AI, plus self-hosted via HuggingFace weights.

Question 2

How much does GLM-5.2 cost per 1M tokens?

Accepted Answer

GLM-5.2 costs $1.40 per million input tokens and $4.40 per million output tokens via the Z.ai API and third-party providers including Fireworks AI, OpenRouter, AWS Bedrock, and Google Vertex AI. Cached input is priced at $0.26 per million tokens, an 81% discount on repeated prompt prefixes — relevant for long system prompts or repeated codebase context. At a blended 7:2:1 cache-hit to input to output ratio, the effective cost is approximately $0.90 per million tokens. A daily coding agent consuming 1 million input tokens and 200,000 output tokens costs approximately $2.28. Reviewing a 100,000-token codebase costs about $0.14. Batch-processing 1,000 support tickets (2K in, 500 out each) costs approximately $5.00. Compared to Claude Opus 4.8 ($5.00/$20.00 per 1M) and GPT-5.5 ($3.00/$15.00 per 1M), GLM-5.2 is significantly cheaper — 72% cheaper on input than Opus 4.8. Self-hosted via MIT-licensed weights from HuggingFace, inference cost is purely hardware.

Question 3

What is GLM-5.2's context window and max output?

Accepted Answer

GLM-5.2 has a 1,000,000-token context window, which is five times larger than its predecessor GLM-5.1's 200,000-token window and one of the largest available in any model as of June 2026. The maximum output per response is 131,072 tokens, four times GLM-5.1's 32,768 cap, enabling generation of full-file rewrites, long migration scripts, and multi-file diffs in a single response. The 1M context is live across all GLM Coding Plan tiers at launch — not gated to a premium tier. To handle the 1M context without quadratic attention costs, GLM-5.2 uses DeepSeek Sparse Attention (DSA) in layers four through 78, with the first three layers using standard dense attention. Independent needle-in-haystack recall evaluations at 1M depth have not been published as of launch. Compared to Gemini 3.1 Pro (1M context, $1.25/1M input) and Claude Sonnet 4.6 (1M context, $3.00/1M input), GLM-5.2 is competitively priced and uniquely offers MIT-licensed weights for the same context scale.

Question 4

How does GLM-5.2 compare on benchmarks vs Claude Opus 4.8 and GPT-5.5?

Accepted Answer

On GPQA Diamond, GLM-5.2 scores 80.3%, compared to Claude Opus 4.8 at approximately 74% and GPT-5.5 at approximately 76% — GLM-5.2 leads on this graduate-level reasoning benchmark. On AIME 2025, GLM-5.2 scores 86.67%, competitive with frontier proprietary models. On MMLU-Pro, GLM-5.2 scores 80.63%, slightly below GPT-5.5 and Claude Opus 4.8, which both exceed 83%. On SWE-bench Pro (coding), GLM-5.2 scores 62.1%, which is the top open-source result but below Claude Opus 4.8 which leads proprietary models on that benchmark. Claude Opus 4.8 tops LMArena human-preference rankings with an Elo of approximately 1430; GLM-5.2 has no published LMArena Elo as of mid-June 2026. The clearest GLM-5.2 advantage is that it achieves these results as an MIT-licensed open-weights model, while Claude Opus 4.8 and GPT-5.5 are closed-weights API-only. For teams that can self-host, GLM-5.2 delivers frontier-adjacent results at infrastructure cost only.

Question 5

Is GLM-5.2 open source or proprietary?

Accepted Answer

GLM-5.2 is fully open-source under the MIT license, which is one of the most permissive licenses available. The MIT license allows free commercial use, modification, redistribution, and sublicensing without restriction or royalty. Model weights are available on HuggingFace at zai-org/GLM-5.2 in FP16, FP8, and NVFP4 formats, plus community GGUF quantizations (Q2, Q4_K_M, Q8) for CPU and low-VRAM inference. Z.ai has released every GLM-5-generation model under the MIT license since July 2025. Self-hosting VRAM requirements range from 241 GB for 2-bit dynamic quantization (suitable for M4 Ultra Mac Studio with 256 GB) to 476 GB for Q4_K_M (requiring 2x A100 80GB or 4x RTX 6000 Ada) and 459 GB+ for NVFP4 (minimum 6x 96 GB GPUs for weights alone). Unlike DeepSeek which uses Apache 2.0, or Llama 4 which uses the Llama Community License with commercial restrictions above certain user thresholds, GLM-5.2's MIT license is unconditional for all users.

Question 6

What modalities does GLM-5.2 support?

Accepted Answer

GLM-5.2 is a text-in, text-out model. It accepts text and tool-call results as input and produces text and tool calls as output. It does not support image, audio, or video input natively. Vision capabilities in the Z.ai GLM-5 family are handled by the separate GLM-5V-Turbo model, which adds image and video understanding on top of the GLM-5 base. For function calling and structured output, GLM-5.2 uses an OpenAI-compatible schema: tool definitions are specified in the tools parameter, and the model generates either a text response or a tool_calls object. Parallel tool calls (multiple function calls in a single response) are supported. Streaming via server-sent events is also supported on the Z.ai API and Fireworks. Compared to Claude Opus 4.8 (text, image, PDF, tool-calls) and GPT-5.5 (text, image, audio, video), GLM-5.2's text-only modality is a meaningful limitation for multimodal workflows.

Question 7

Does GLM-5.2 train on user data?

Accepted Answer

Z.ai has stated that it does not train on API user inputs by default. However, a detailed data retention policy has not been publicly disclosed as of June 2026. The MIT-licensed self-hosted deployment has no data retention by definition since all inference runs on the operator's own hardware and no data is sent to Z.ai. For API usage, users should review Z.ai's current privacy policy at z.ai/privacy for up-to-date retention terms. Z.ai has not disclosed SOC 2 Type II, ISO 27001, HIPAA, or GDPR compliance certifications as of the GLM-5.2 release date, making it unsuitable for regulated healthcare or financial workloads that require documented compliance. For compliance-sensitive workloads, Anthropic (SOC 2, HIPAA, ISO 27001) or OpenAI (SOC 2, HIPAA) are the certified alternatives. AWS Bedrock and Google Vertex AI access to GLM-5.2 may inherit those platforms' compliance certifications, but this should be independently verified with the cloud provider.

Question 8

Who is GLM-5.2 best for and who should avoid it?

Accepted Answer

GLM-5.2 is best for engineering teams running autonomous coding agents over large monorepos, where its 1M-context window eliminates retrieval chunking complexity and its 62.1% SWE-bench Pro score confirms reliable task completion. It is ideal for open-source projects and budget-conscious startups that need frontier-quality reasoning at $1.40/1M input, compared to Claude Opus 4.8's $5.00/1M. Enterprises with data sovereignty requirements benefit from the MIT-licensed self-hosted option — no API key, no vendor dependency, no data leaving the building. Teams should avoid GLM-5.2 if they need vision, audio, or video capabilities (use GPT-5.5 or Gemini 3.1 Pro instead), HIPAA or SOC 2 compliance (use Anthropic or OpenAI), real-time low-latency responses at sub-1-second TTFT (use a smaller model or Fireworks' fast inference tier), or the highest possible reasoning performance on GPQA Diamond beyond 80% (use Claude Opus 4.8 or GPT-5.5 which are likely stronger on hard science). For multilingual non-English tasks, Qwen3-235B-A22B has broader coverage.

GLM-5.2: 1M Context, MIT License & 80.3% GPQA (2026)

About GLM-5.2

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions