Name: Kimi K2.7-Code: 256K Context & Open Weights (2026)
Brand: Moonshot AI
Price: 0.95 USD
Availability: InStock

Question 1

What is Kimi K2.7-Code and who built it?

Accepted Answer

Kimi K2.7-Code is a coding-specialist large language model released by Moonshot AI on June 12, 2026, the fifth major release in the Kimi K2 family in under a year. It uses a Mixture-of-Experts architecture with 1 trillion total parameters across 384 experts, of which 32 billion parameters (8 experts) activate per token, the same architecture as Kimi K2.5 and Kimi K2.6. The model is built for long-horizon, multi-step software engineering tasks: codebase analysis, debugging, refactoring, and agentic tool-calling. Moonshot reports a 21.8% gain on its proprietary Kimi Code Bench v2 versus K2.6, plus an 11.0% gain on Program Bench and a 31.5% gain on MLS Bench Lite. It also uses roughly 30% fewer reasoning tokens per task than K2.6. The model is positioned as Moonshot's top coding-focused release, sitting alongside the more general-purpose K2.6 in the Kimi lineup. It is named after Moonshot's Kimi consumer assistant brand.

Question 2

How much does Kimi K2.7-Code cost per 1M tokens?

Accepted Answer

On Moonshot's official API, Kimi K2.7-Code costs $0.95 per million input tokens, dropping to $0.19 per million on a cache hit, and $4.00 per million output tokens, the same headline price as the prior K2.6 release. OpenRouter offers a third-party routed option at roughly $0.75 per million input tokens and $3.50 per million output tokens. Because K2.7-Code uses about 30% fewer thinking tokens than K2.6 for equivalent tasks, the effective cost per completed task is lower even at the same per-token rate. As worked examples: loading a 200,000-token codebase context for one review costs about $0.19, a full day of agentic coding using roughly 2 million input and 400,000 output tokens costs about $3.50, and a 1,000-turn code-review bot averaging 3,000 input and 800 output tokens per turn costs about $6.05. Since the model is released under a Modified MIT license, teams can also self-host the weights for free and pay only for their own compute, with no per-token API fee. No provisioned-throughput tier has been announced for K2.7-Code specifically.

Question 3

What is Kimi K2.7-Code's context window and max output?

Accepted Answer

Kimi K2.7-Code supports a 256K-token context window, exactly 262,144 tokens, unchanged from Kimi K2.5 and K2.6. This is large enough to load substantial codebases, long agent conversation histories, or multi-file diffs in a single request. Moonshot has not published a separate maximum output token limit specific to K2.7-Code, though the prior K2.6 release used a per-step generation limit within its overall 262,144-token window. There is no separate extended-context tier; the 256K window is the standard offering across both the hosted API and self-hosted deployments. The model's always-on interleaved thinking mode preserves reasoning state across turns within that window, which Moonshot credits for the roughly 30% reduction in thinking-token usage versus K2.6 on equivalent tasks. Compared to closed competitors, 256K sits below context windows like Gemini 2.5 Pro's 1M+ tokens but above many standard 128K-context models. Document handling for multi-file codebases works by concatenating files into the context window rather than through a separate retrieval mechanism.

Question 4

How does Kimi K2.7-Code compare on benchmarks vs other coding models?

Accepted Answer

As of June 2026, Moonshot has only published proprietary benchmark deltas for K2.7-Code: a 21.8% improvement on its own Kimi Code Bench v2, an 11.0% improvement on Program Bench, and a 31.5% improvement on MLS Bench Lite, all measured against the prior K2.6 release, plus roughly 30% fewer thinking tokens per task. No independent third party has run SWE-bench Verified, SWE-bench Pro, GPQA Diamond, AIME 2025, MMLU-Pro, or ARC-AGI 2 on K2.7-Code, and some practitioners have publicly questioned whether the proprietary gains hold up on real-world tasks. For reference, the prior K2.6 model was independently reported as roughly tied with Claude and Gemini on SWE-bench Verified, and scored 90.5% on GPQA Diamond and 96.4% on AIME 2026, but those numbers belong to K2.6, not K2.7-Code, and should not be assumed to carry over. Until independent evaluations appear, teams comparing K2.7-Code to models like Qwen3-Coder, DeepSeek's coding variants, Claude 4.x, or GPT-5.x Codex should run their own benchmarks on representative tasks rather than relying on vendor-reported deltas. The absence of standard-suite scores is itself a meaningful gap for any model claiming frontier coding performance in 2026.

Question 5

Is Kimi K2.7-Code open source or proprietary?

Accepted Answer

Kimi K2.7-Code is open-weight: Moonshot AI publishes the full 1-trillion-parameter model on Hugging Face at moonshotai/Kimi-K2.7-Code under a Modified MIT License, which permits commercial use and self-hosting, including for large-scale deployments, with attribution requirements. The full weight set is roughly 340GB. Moonshot also ships a native INT4 quantization (the same approach used for Kimi K2 Thinking), and community quantized builds (including GGUF formats from groups like Unsloth) are available for smaller hardware. For self-hosting, FP16 inference of the 32 billion active parameters needs roughly 64GB or more of VRAM, with quantized builds reducing that requirement for multi-GPU consumer or workstation setups. For users who don't want to self-host, the same model is available through Moonshot's hosted API (OpenAI-compatible), OpenRouter, Cloudflare Workers AI, and Vercel's AI Gateway. There are no closed-only variants of K2.7-Code; the hosted API and the downloadable weights are the same model.

Question 6

What modalities does Kimi K2.7-Code support?

Accepted Answer

Kimi K2.7-Code accepts text, image, and video input through Moonshot's MoonViT vision encoder, a roughly 400-million-parameter component shared across the K2.5/K2.6/K2.7 family, which is useful for coding tasks involving screenshots, UI mockups, or diagrams. Output is text and tool-calls; the model does not support audio input or audio output. It supports native function calling and structured output, with multi-step tool use including MCP-based environments, and preserves interleaved reasoning across tool calls via its always-on thinking mode (preserve_thinking). There is no documented computer-use or screen-control capability for this release, unlike some agentic models from other vendors. Parallel tool calls are supported as part of its agentic coding design, though Moonshot has not published exact concurrency limits. Compared to K2 Thinking, which emphasized browsing (60.2% on BrowseComp), K2.7-Code's modality focus is coding and multimodal code-adjacent inputs rather than general web browsing, and web browsing is not listed as a supported capability.

Question 7

Does Kimi K2.7-Code train on user data?

Accepted Answer

When used through Moonshot's hosted API or the Kimi consumer app, yes: Moonshot's privacy policy states that user prompts and uploaded content may be used to improve and train future models, and that personal information may be shared with service providers and affiliates. There is no documented opt-out mechanism or zero-retention enterprise tier for the hosted API as of June 2026. Data sent to the hosted API and Kimi app is processed in China. Moonshot has not published SOC 2, ISO 27001, or HIPAA certifications, and has no disclosed trust center or EU AI Act classification. Third-party AI governance reviewers have recommended that EU-based or regulated organizations avoid sending personal data to the hosted API and instead self-host the open-weight model on their own infrastructure, where Moonshot's data-use policy does not apply. Independent security researchers have also reported issues including an exposed database containing chat logs and API keys and hardcoded encryption keys in Kimi's mobile app, which is a further reason regulated teams favor self-hosting. Self-hosted deployments are entirely under the operator's own data governance.

Question 8

Who is Kimi K2.7-Code best for and who should avoid it?

Accepted Answer

Kimi K2.7-Code is best for teams building agentic coding tools who want a frontier-scale, open-weight model with a 256K-token context window at a fraction of closed-model API prices, especially teams that plan to self-host since the Modified MIT license permits free commercial use of the full 1T-parameter weights. It's a good fit for cost-sensitive, high-volume coding agents, since the roughly 30% reduction in thinking tokens versus K2.6 lowers effective cost per task at the same $0.95/$4.00 per million token rate. It also suits developers comfortable running their own evaluations, since Moonshot has only published proprietary benchmark deltas, not standard-suite scores. Teams should avoid K2.7-Code if they operate in regulated industries needing SOC 2, HIPAA, or GDPR documentation, since none exist and the hosted API processes data in China with no training opt-out; Anthropic's Claude 4.x or OpenAI's GPT-5.x Codex are better documented choices there. It's also a weaker pick for teams that need independently verified frontier reasoning benchmarks (SWE-bench Verified, GPQA Diamond) to justify a procurement decision, and for teams without the infrastructure to self-host a 340GB, 64GB+ VRAM model if they want to avoid sending data to a Chinese-hosted API.

Kimi K2.7-Code: 256K Context & Open Weights (2026)

About Kimi K2.7-Code

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions