Name: Kimi K2.7 Code HighSpeed: 260 Tok/s & $0.95/M (2026)
Brand: Moonshot AI
Price: 0.95 USD
Availability: InStock

Question 1

What is Kimi K2.7 Code HighSpeed and who built it?

Accepted Answer

Kimi K2.7 Code HighSpeed is a fast-serving mode of Kimi K2.7 Code, an open-weight coding model built by Moonshot AI, a Beijing lab founded in 2023. The base K2.7 Code model launched June 12, 2026 as a fine-tune of Moonshot's K2.6 checkpoint, and the HighSpeed serving mode followed on June 15, 2026, rolling out first to the Kimi Code Beta channel. It uses a Mixture-of-Experts transformer with 1 trillion total parameters and 32 billion activated per token, 61 layers, 384 experts with 8 routed plus 1 shared expert, and Multi-head Latent Attention for KV cache compression. It was built to push agentic coding and tool-use performance past K2.6 while cutting reasoning-token overhead by roughly 30%. It scores 62.0 on Moonshot's own Kimi Code Bench v2 and 81.1 on the third-party MCP Mark Verified tool-use benchmark, ahead of Claude Opus 4.8's 76.4 on that test. The model costs $0.95 per 1M input tokens and $4.00 per 1M output tokens on Moonshot's native API.

Question 2

How much does Kimi K2.7 Code HighSpeed cost per 1M tokens?

Accepted Answer

On Moonshot's native Kimi API, Kimi K2.7 Code HighSpeed costs $0.95 per 1M input tokens and $4.00 per 1M output tokens, with cache-hit input priced at $0.19 per 1M tokens. OpenRouter offers it cheaper, at $0.74 input and $3.50 output per 1M tokens, for teams that want third-party routing instead of a direct Moonshot account. A daily coding agent pushing roughly 1M input tokens and 200K output tokens costs about $1.75 on the native API. A 100K-token codebase review runs about $0.50. Because the weights are open under a Modified MIT License, teams can also self-host for the cost of GPU infrastructure alone, roughly 1TB of HBM for FP8 weights or about half that for the native INT4 build, with no per-token fee to Moonshot at all. There is no provisioned-throughput tier publicly listed.

Question 3

What is Kimi K2.7 Code HighSpeed's context window and max output?

Accepted Answer

Kimi K2.7 Code HighSpeed shares its 256K (262,144-token) context window with the standard K2.7 Code release, unchanged from the K2.6 base model. The per-step generation limit is 49,152 tokens, though the model can chain multiple generation steps within a single long-horizon agentic session up to the full 262,144-token context budget. Multi-head Latent Attention compresses the KV cache, which is what makes self-hosting at full context feasible: on an 8x H200 SXM5 node with roughly 1TB of HBM used for FP8 weights, about 128GB remains for KV cache at 256K context with small batch sizes. Moonshot has not published an independent needle-in-haystack or long-context recall evaluation for K2.7 Code specifically, so recall quality above 100K tokens is unverified rather than benchmarked. Document and multi-file handling relies on the same context budget as any other input; there is no separate extended-context tier.

Question 4

How does Kimi K2.7 Code HighSpeed compare on benchmarks vs GLM 5.2 and Claude Opus 4.8?

Accepted Answer

The comparison is uneven because Moonshot has not submitted K2.7 Code to any independently audited coding benchmark. GLM 5.2 leads on audited suites, scoring 62.1% on SWE-bench Pro versus roughly 58.6% for K2.6 (K2.7 Code's predecessor; no K2.7 number exists), and Claude Opus 4.8 leads Terminal-Bench 2.1 at 85.0 against GLM 5.2's 81.0. Where K2.7 Code does win is MCP Mark Verified, a tool-invocation benchmark, scoring 81.1 against Claude Opus 4.8's 76.4. On Moonshot's own Kimi Code Bench v2, K2.7 Code scores 62.0, up 21.8% from K2.6's 50.9, but that benchmark has no independent verification or cross-vendor comparison points. In practice this means K2.7 Code HighSpeed is a credible choice for tool-heavy agentic workflows where it has a measured edge, but a weaker choice anywhere a team needs an audited SWE-bench or GPQA number to justify the pick.

Question 5

Is Kimi K2.7 Code HighSpeed open source or proprietary?

Accepted Answer

Kimi K2.7 Code, and the HighSpeed serving mode built on it, is open-weights: Moonshot AI publishes the model weights on Hugging Face and GitHub under a Modified MIT License. The license permits commercial use and modification, with an attribution requirement that applies specifically to very large-scale commercial deployments, the same style of condition Moonshot has used since the original K2 release. Weights are available in FP8 and native INT4 formats, the latter trained with quantization-aware training rather than post-hoc quantization for better quality retention. Community GGUF conversions are also available via Unsloth on Hugging Face. Self-hosting the FP8 weights needs roughly 1TB of GPU memory, realistically an 8x H200 SXM5-class node, while the native INT4 build needs roughly half that. There is no separate closed-source tier: the HighSpeed serving optimization is a deployment-side change, not a different license.

Question 6

What modalities does Kimi K2.7 Code HighSpeed support?

Accepted Answer

Kimi K2.7 Code HighSpeed accepts text, image, and video input through a 400M-parameter MoonViT vision encoder built into the model, alongside native code and tool-call handling. Output is text, code, and structured tool calls; there is no audio input or output in either direction, so voice workflows need a separate ASR/TTS model paired in front of and behind it. Function calling and tool use are a specific strength: the model scores 81.1 on MCP Mark Verified across Notion, GitHub, Postgres, Filesystem, and Playwright environments, and preserves interleaved reasoning across multi-turn tool-calling sessions when preserve_thinking is set. Extended thinking runs on every request by default; there is no way to fully disable it, a request that tries gets silently rerouted to the K2.6 model instead. There is no documented web-browsing or code-execution sandbox built into the model itself.

Question 7

Does Kimi K2.7 Code HighSpeed train on user data?

Accepted Answer

Moonshot AI has not publicly disclosed a data retention or training-on-inputs policy specifically for Kimi K2.7 Code HighSpeed, and no system card exists for this release to check against. There is no published SOC 2 Type II, ISO 27001, HIPAA-eligibility, or GDPR-compliance statement for the API, and no stated EU AI Act classification. Because the weights are open under a Modified MIT License, the most concrete way to control data handling is to self-host: run the FP8 or INT4 weights on your own infrastructure and no request data leaves your environment at all. Teams with strict data-governance requirements and no appetite for self-hosting should treat the hosted API as undocumented on this axis and confirm directly with Moonshot before sending sensitive data.

Question 8

Who is Kimi K2.7 Code HighSpeed best for and who should avoid it?

Accepted Answer

It is best for teams running autonomous coding agents with heavy tool use, where its 81.1 MCP Mark Verified score and 180-260 token/sec HighSpeed throughput cut both latency and reasoning-token cost on long agentic loops. It also fits infrastructure teams that want to self-host an open-weight coding model, given the Modified MIT License and native INT4 quantization halving VRAM needs. Teams already using Kimi Code CLI as their agent framework get the tightest integration. It is a poor fit for procurement processes that require an independently audited SWE-bench or GPQA score, since none has been published for this release. It should also be avoided for regulated deployments needing documented safety posture, since no system card exists, and for general-purpose writing or chat use cases, where Moonshot itself recommends the base K2.6 model instead. Teams needing native audio should look elsewhere, since there is no audio I/O in either direction.

Kimi K2.7 Code HighSpeed: 260 Tok/s & $0.95/M (2026)

About Kimi K2.7 Code HighSpeed

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions