Kimi K2.7-Code: 256K Context & Open Weights (2026)
Kimi K2.7-Code is Moonshot AI's open-weight 1T-parameter coding model (June 2026), with 256K context, $0.95/$4.00 per million tokens, and Modified MIT license.
Kimi K2.7-Code is Moonshot AI's coding-specialist model (released June 12, 2026), a 1T-parameter/32B-active MoE with a 256K-token context window and Modified MIT open weights. Priced at $0.95/$4.00 per million input/output tokens (same as K2.6 but with about 30% fewer thinking tokens), it reports a 21.8% gain on Moonshot's Kimi Code Bench v2.
Kimi K2.7-Code, released June 12, 2026 by Moonshot AI, is a 1-trillion-parameter Mixture-of-Experts coding model with 32B active parameters and a 256K-token context window. It costs $0.95 per million input tokens and $4.00 per million output tokens. Moonshot reports a 21.8% gain on its Kimi Code Bench v2 over Kimi K2.6, with about 30% fewer thinking tokens, though no independent SWE-bench scores exist yet.
Provider: Moonshot AI · Family: Kimi K2
Context window: 262,144 tokens
Input modalities: text, image, video, tool-calls, code · Output: text, tool-calls, code
About Kimi K2.7-Code
Kimi K2.7-Code is a coding-specialist large language model released by Moonshot AI on June 12, 2026, the fifth major release in the Kimi K2 family in under a year. It is built on the same architecture as Kimi K2.5 and Kimi K2.6: a Mixture-of-Experts (MoE) transformer with 1 trillion total parameters across 384 experts, of which 8 experts (32 billion parameters) activate per token via Multi-head Latent Attention. The base architecture was pre-trained on 15.5 trillion tokens. K2.7-Code's specific job is long-horizon, multi-step software engineering: codebase analysis, debugging, refactoring, and tool-calling inside agentic coding loops, sitting at the top of Moonshot's lineup for coding work alongside the more general-purpose K2.6. Moonshot's own benchmark numbers show substantial gains over K2.6: +21.8% on Kimi Code Bench v2, +11.0% on Program Bench, and +31.5% on the multi-language MLS Bench Lite, alongside roughly 30% fewer reasoning tokens per task. These are proprietary Moonshot benchmarks, not the standard public suites. As of June 12, 2026, no independent third party has published SWE-bench Verified, SWE-bench Pro, GPQA Diamond, AIME 2025, MMLU-Pro, or ARC-AGI 2 scores for K2.7-Code, and some practitioners have publicly questioned whether the proprietary benchmark gains translate to real-world tasks. Prior K2.6 numbers (independently reported around parity with Claude and Gemini on SWE-bench Verified, 90.5% on GPQA Diamond, and 96.4% on AIME 2026) give a sense of the family's general tier, but should not be assumed to carry over directly to K2.7-Code's coding-tuned weights. The model supports a 256K-token (262,144) context window, unchanged from K2.5 and K2.6, enough to load large codebases, long agent transcripts, or extended multi-file diffs in a single call. K2.7-Code runs in an always-on, interleaved thinking mode with preserve_thinking enabled by default across turns, which Moonshot says is what drives the roughly 30% reduction in thinking-token usage versus K2.6 on equivalent tasks, since the model carries reasoning state forward instead of re-deriving it each turn. No separate extended-context tier has been announced; the 256K window is the standard offering across the API and self-hosted deployments. K2.7-Code is natively multimodal, accepting text, image, and video input through Moonshot's MoonViT vision encoder (roughly 400 million parameters), useful for coding tasks that involve screenshots, diagrams, or UI mockups. It supports function calling and multi-step tool use, including MCP-based environments, with interleaved thinking preserved across tool calls for coherent agent sessions. Output is text and tool-calls; there is no audio input or output. Computer-use style screen control is not documented for this release. Pricing on the official Moonshot API is $0.95 per million input tokens ($0.19 per million on a cache hit) and $4.00 per million output tokens, the same headline rate as K2.6, with the effective cost per completed task lower due to the thinking-token reduction. OpenRouter lists a third-party route at roughly $0.75 per million input and $3.50 per million output tokens. Because the weights are open under a Modified MIT license, teams can also self-host the full model for the cost of compute alone, with no per-token fee. A 200K-token codebase-context call costs about $0.19 on the official API; a full day of agentic coding (roughly 2 million input tokens and 400,000 output tokens) costs around $3.50. The model is available through the Moonshot API platform (OpenAI-compatible endpoint), OpenRouter, Cloudflare Workers AI, and Vercel's AI Gateway, with Together, Fireworks, and DeepInfra expected to add support within days of launch, following the pattern of prior K2 releases. For self-hosting, the full weight set is roughly 340GB and is published on Hugging Face under moonshotai/Kimi-K2.7-Code, with native INT4 quantization (the same approach used for Kimi K2 Thinking) and community GGUF builds. FP16 inference of the 32B active parameters needs roughly 64GB or more of VRAM; quantized builds reduce that requirement for smaller multi-GPU setups. Moonshot has not published a safety system card, jailbreak-resistance evaluation, or named third-party red-team partners for any K2 release, including K2.7-Code. The model's safety posture should be treated as permissive and largely undocumented: there is no disclosed content-filter API or moderation endpoint, and refusal behavior has not been independently characterized. Moonshot's privacy policy for the hosted API and Kimi app states that user prompts and uploaded content may be used to improve and train future models, with no documented opt-out or zero-retention enterprise tier, and data is processed in China. K2.7-Code is best suited for teams building self-hosted or budget-conscious agentic coding tools who want a frontier-scale open-weight model with a long context window and low per-token cost, and who are comfortable running their own evaluations rather than relying on vendor benchmark claims. It is a weaker choice for regulated industries that need SOC 2, HIPAA, or GDPR documentation, or for teams that specifically need independently verified SWE-bench Verified or GPQA Diamond scores to justify a model choice; Anthropic's Claude 4.x models, OpenAI's GPT-5.x Codex line, and Alibaba's Qwen3-Coder are the more documented alternatives on those axes. Training data details specific to K2.7-Code's coding fine-tuning have not been disclosed beyond the inherited 15.5-trillion-token pre-training corpus shared with K2.5 and K2.6; the training cutoff date for this release has not been independently confirmed. K2.7-Code reuses the K2.5/K2.6 architecture, context window, and Modified MIT license without changes, so the release is positioned as a fine-tuning and data update rather than a new base model. Moonshot's release cadence (K2 in July 2025, K2 Thinking in November 2025, K2.5 in January 2026, K2.6 in April 2026, K2.7-Code in June 2026) suggests another K2 release is likely within two to three months, continuing the pattern of incremental coding and agentic improvements on the same underlying architecture.
Pricing
Moonshot direct API: $0.95 per 1M input tokens ($0.19 per 1M on a cache hit), $4.00 per 1M output tokens, the same headline price as K2.6. OpenRouter offers a third-party route at roughly $0.75/$3.50 per 1M input/output. Self-hosting the open weights (Modified MIT license) has no per-token fee, only compute cost.
Key Features
- 1T/32B-Active MoE Architecture: 1 trillion total parameters across 384 experts, with 32 billion activated per token via 8 routed experts and Multi-head Latent Attention for efficient inference.
- 256K Context Window: Handles up to 262,144 tokens, enough to load large codebases or long agent transcripts in a single call.
- Always-On Interleaved Thinking: Runs in a forced reasoning mode with preserve_thinking enabled across turns for coherent multi-step coding agent sessions, using roughly 30% fewer thinking tokens than K2.6.
- Native Multimodal Input: Accepts text, image, and video input through the MoonViT ~400M-parameter vision encoder for coding tasks involving screenshots or diagrams.
- Modified MIT Open Weights: Full 1T-parameter weights downloadable from Hugging Face under a Modified MIT license for free self-hosting and commercial use.
Pros
- Reports +21.8% on Moonshot's Kimi Code Bench v2 versus K2.6, the largest coding gain in the K2 series to date.
- Cuts thinking-token usage by roughly 30% versus K2.6, lowering effective cost per agentic coding task at the same $0.95/$4.00 per million token price.
- Open weights under a Modified MIT license let teams self-host the full 1T-parameter model for free, unlike closed competitors.
Cons
- No independent third-party benchmark scores (SWE-bench Verified, GPQA Diamond, AIME 2025) published as of June 2026, only Moonshot's own proprietary benchmarks.
- No published safety system card, SOC 2, or GDPR certification; the hosted API processes data in China with no opt-out from model training.
- Self-hosting the full weights requires roughly 340GB of storage and 64GB+ VRAM for FP16 inference of the 32B active parameters.
Benchmarks
- program bench delta pct vs k2 6: 11
- mls bench lite delta pct vs k2 6: 31.5
- kimi code bench v2 delta pct vs k2 6: 21.8
- thinking token reduction pct vs k2 6: 30
Frequently Asked Questions
What is Kimi K2.7-Code and who built it?
Kimi K2.7-Code is a coding-specialist large language model released by Moonshot AI on June 12, 2026, the fifth major release in the Kimi K2 family in under a year. It uses a Mixture-of-Experts architecture with 1 trillion total parameters across 384 experts, of which 32 billion parameters (8 experts) activate per token, the same architecture as Kimi K2.5 and Kimi K2.6. The model is built for long-horizon, multi-step software engineering tasks: codebase analysis, debugging, refactoring, and agentic tool-calling. Moonshot reports a 21.8% gain on its proprietary Kimi Code Bench v2 versus K2.6, plus an 11.0% gain on Program Bench and a 31.5% gain on MLS Bench Lite. It also uses roughly 30% fewer reasoning tokens per task than K2.6. The model is positioned as Moonshot's top coding-focused release, sitting alongside the more general-purpose K2.6 in the Kimi lineup. It is named after Moonshot's Kimi consumer assistant brand.
How much does Kimi K2.7-Code cost per 1M tokens?
On Moonshot's official API, Kimi K2.7-Code costs $0.95 per million input tokens, dropping to $0.19 per million on a cache hit, and $4.00 per million output tokens, the same headline price as the prior K2.6 release. OpenRouter offers a third-party routed option at roughly $0.75 per million input tokens and $3.50 per million output tokens. Because K2.7-Code uses about 30% fewer thinking tokens than K2.6 for equivalent tasks, the effective cost per completed task is lower even at the same per-token rate. As worked examples: loading a 200,000-token codebase context for one review costs about $0.19, a full day of agentic coding using roughly 2 million input and 400,000 output tokens costs about $3.50, and a 1,000-turn code-review bot averaging 3,000 input and 800 output tokens per turn costs about $6.05. Since the model is released under a Modified MIT license, teams can also self-host the weights for free and pay only for their own compute, with no per-token API fee. No provisioned-throughput tier has been announced for K2.7-Code specifically.
What is Kimi K2.7-Code's context window and max output?
Kimi K2.7-Code supports a 256K-token context window, exactly 262,144 tokens, unchanged from Kimi K2.5 and K2.6. This is large enough to load substantial codebases, long agent conversation histories, or multi-file diffs in a single request. Moonshot has not published a separate maximum output token limit specific to K2.7-Code, though the prior K2.6 release used a per-step generation limit within its overall 262,144-token window. There is no separate extended-context tier; the 256K window is the standard offering across both the hosted API and self-hosted deployments. The model's always-on interleaved thinking mode preserves reasoning state across turns within that window, which Moonshot credits for the roughly 30% reduction in thinking-token usage versus K2.6 on equivalent tasks. Compared to closed competitors, 256K sits below context windows like Gemini 2.5 Pro's 1M+ tokens but above many standard 128K-context models. Document handling for multi-file codebases works by concatenating files into the context window rather than through a separate retrieval mechanism.
How does Kimi K2.7-Code compare on benchmarks vs other coding models?
As of June 2026, Moonshot has only published proprietary benchmark deltas for K2.7-Code: a 21.8% improvement on its own Kimi Code Bench v2, an 11.0% improvement on Program Bench, and a 31.5% improvement on MLS Bench Lite, all measured against the prior K2.6 release, plus roughly 30% fewer thinking tokens per task. No independent third party has run SWE-bench Verified, SWE-bench Pro, GPQA Diamond, AIME 2025, MMLU-Pro, or ARC-AGI 2 on K2.7-Code, and some practitioners have publicly questioned whether the proprietary gains hold up on real-world tasks. For reference, the prior K2.6 model was independently reported as roughly tied with Claude and Gemini on SWE-bench Verified, and scored 90.5% on GPQA Diamond and 96.4% on AIME 2026, but those numbers belong to K2.6, not K2.7-Code, and should not be assumed to carry over. Until independent evaluations appear, teams comparing K2.7-Code to models like Qwen3-Coder, DeepSeek's coding variants, Claude 4.x, or GPT-5.x Codex should run their own benchmarks on representative tasks rather than relying on vendor-reported deltas. The absence of standard-suite scores is itself a meaningful gap for any model claiming frontier coding performance in 2026.
Is Kimi K2.7-Code open source or proprietary?
Kimi K2.7-Code is open-weight: Moonshot AI publishes the full 1-trillion-parameter model on Hugging Face at moonshotai/Kimi-K2.7-Code under a Modified MIT License, which permits commercial use and self-hosting, including for large-scale deployments, with attribution requirements. The full weight set is roughly 340GB. Moonshot also ships a native INT4 quantization (the same approach used for Kimi K2 Thinking), and community quantized builds (including GGUF formats from groups like Unsloth) are available for smaller hardware. For self-hosting, FP16 inference of the 32 billion active parameters needs roughly 64GB or more of VRAM, with quantized builds reducing that requirement for multi-GPU consumer or workstation setups. For users who don't want to self-host, the same model is available through Moonshot's hosted API (OpenAI-compatible), OpenRouter, Cloudflare Workers AI, and Vercel's AI Gateway. There are no closed-only variants of K2.7-Code; the hosted API and the downloadable weights are the same model.
What modalities does Kimi K2.7-Code support?
Kimi K2.7-Code accepts text, image, and video input through Moonshot's MoonViT vision encoder, a roughly 400-million-parameter component shared across the K2.5/K2.6/K2.7 family, which is useful for coding tasks involving screenshots, UI mockups, or diagrams. Output is text and tool-calls; the model does not support audio input or audio output. It supports native function calling and structured output, with multi-step tool use including MCP-based environments, and preserves interleaved reasoning across tool calls via its always-on thinking mode (preserve_thinking). There is no documented computer-use or screen-control capability for this release, unlike some agentic models from other vendors. Parallel tool calls are supported as part of its agentic coding design, though Moonshot has not published exact concurrency limits. Compared to K2 Thinking, which emphasized browsing (60.2% on BrowseComp), K2.7-Code's modality focus is coding and multimodal code-adjacent inputs rather than general web browsing, and web browsing is not listed as a supported capability.
Does Kimi K2.7-Code train on user data?
When used through Moonshot's hosted API or the Kimi consumer app, yes: Moonshot's privacy policy states that user prompts and uploaded content may be used to improve and train future models, and that personal information may be shared with service providers and affiliates. There is no documented opt-out mechanism or zero-retention enterprise tier for the hosted API as of June 2026. Data sent to the hosted API and Kimi app is processed in China. Moonshot has not published SOC 2, ISO 27001, or HIPAA certifications, and has no disclosed trust center or EU AI Act classification. Third-party AI governance reviewers have recommended that EU-based or regulated organizations avoid sending personal data to the hosted API and instead self-host the open-weight model on their own infrastructure, where Moonshot's data-use policy does not apply. Independent security researchers have also reported issues including an exposed database containing chat logs and API keys and hardcoded encryption keys in Kimi's mobile app, which is a further reason regulated teams favor self-hosting. Self-hosted deployments are entirely under the operator's own data governance.
Who is Kimi K2.7-Code best for and who should avoid it?
Kimi K2.7-Code is best for teams building agentic coding tools who want a frontier-scale, open-weight model with a 256K-token context window at a fraction of closed-model API prices, especially teams that plan to self-host since the Modified MIT license permits free commercial use of the full 1T-parameter weights. It's a good fit for cost-sensitive, high-volume coding agents, since the roughly 30% reduction in thinking tokens versus K2.6 lowers effective cost per task at the same $0.95/$4.00 per million token rate. It also suits developers comfortable running their own evaluations, since Moonshot has only published proprietary benchmark deltas, not standard-suite scores. Teams should avoid K2.7-Code if they operate in regulated industries needing SOC 2, HIPAA, or GDPR documentation, since none exist and the hosted API processes data in China with no training opt-out; Anthropic's Claude 4.x or OpenAI's GPT-5.x Codex are better documented choices there. It's also a weaker pick for teams that need independently verified frontier reasoning benchmarks (SWE-bench Verified, GPQA Diamond) to justify a procurement decision, and for teams without the infrastructure to self-host a 340GB, 64GB+ VRAM model if they want to avoid sending data to a Chinese-hosted API.