Kimi AI | Multimodal LLM with Agent Swarm & 256K Context

Kimi by Moonshot AI: frontier multimodal LLM with native vision, 256K context, Agent Swarm technology. MMLU 92.0, HumanEval 99.0. $0.60/$2.50 per M tokens.

Kimi is a native multimodal AI assistant developed by Moonshot AI, a Beijing-based company founded in March 2023. It features a 1 trillion parameter Mixture-of-Experts architecture with 32 billion activated parameters, supporting 256K token context windows and natively processing images and videos alongside text. The platform excels at reasoning (MMLU 92.0), coding (HumanEval 99.0), and multi-step autonomous tasks through Agent Swarm technology. Available as both a free consumer chat interface and a developer API priced at $0.60/$2.50 per million tokens—significantly undercutting competitors.

Pricing

Free tier (Adagio) with unlimited basic chat but limited agent usage. Consumer subscriptions: Andante $19/month (¥49 CNY), Moderato/Allegretto $39-79/month (¥99-199 CNY), Vivace $159-199/month. API pricing: $0.60 per million input tokens, $2.50 per million output tokens; cached input tokens $0.15/M (75% discount). Turbo model: $1.15/$8.00/M. Automatic $5 voucher when cumulative recharge reaches $5.

Frequently Asked Questions

What is Kimi K2.5 and how does it differ from K2?

Kimi K2.5 is the latest multimodal version of Kimi released in January 2026, featuring native vision capabilities through a 400M-parameter vision encoder called MoonViT-3D. K2.5 can process images and video natively, enabling visual coding from UI designs and autonomous visual task execution. Both K2 and K2.5 use a 1T parameter MoE architecture with 32B activated parameters and 256K context, but K2.5 excels at visual understanding tasks while K2 remains the standard model for text-only workloads.

How much does Kimi cost and what are the pricing tiers?

Kimi offers three pricing models: (1) Free consumer tier (Adagio) with unlimited basic chat but limited agent usage; (2) Consumer subscriptions ranging from Andante ($19/month) to Vivace ($159-199/month), each unlocking more agent quotas and research credits; (3) Developer API at $0.60/$2.50 per million input/output tokens with automatic 75% context caching discounts. Pay-as-you-go minimum is $1 recharge, with cumulative spend unlocking higher rate limits and a $5 bonus voucher at $5 recharge.

How does Kimi's Agent Swarm technology work?

Agent Swarm enables Kimi to decompose complex tasks into parallel sub-tasks executed by dynamically instantiated, domain-specific agents. Instead of processing tasks sequentially, the orchestrator agent delegates work to multiple specialized agents that run concurrently, dramatically speeding up multi-step workflows. This is particularly powerful for research, document analysis, coding tasks, and autonomous web browsing where multiple parallel operations can be executed simultaneously.

Is Kimi open-source and can I run it locally?

Yes, Kimi K2 and K2.5 are open-source under a Modified MIT license with weights publicly available on Hugging Face and GitHub. You can download, fine-tune, and run them locally using inference engines like vLLM or SGLang. However, the consumer chat interface and commercial API remain proprietary Moonshot AI services. Local deployment is ideal for researchers and teams with on-premise infrastructure requirements.

What makes Kimi cheaper than GPT-4 and Claude?

Kimi's low cost stems from its Mixture-of-Experts (MoE) architecture, which activates only 32 billion of its 1 trillion parameters per request, dramatically reducing computational overhead. Moonshot AI also employs aggressive pricing for rapid market adoption. The automatic context caching feature reduces input costs by 75% on repeated content, providing additional savings for applications processing similar documents or maintaining long conversation histories. Combined, these factors make Kimi 4-17x cheaper than GPT-5.4 while delivering competitive benchmark performance.

Can I use Kimi as a drop-in replacement for OpenAI's API?

Yes, Kimi's API is fully OpenAI-compatible, allowing you to use it as a drop-in replacement by simply changing the endpoint to api.moonshot.ai/v1 and providing your Moonshot API key. All OpenAI SDKs (Python, JavaScript, Go, etc.) work without modification. This compatibility eliminates migration friction and allows developers to test Kimi's performance and pricing with minimal integration effort.

What are Kimi's benchmark scores and how do they compare?

Kimi K2.5 achieves exceptional scores: MMLU 92.0 (general knowledge), HumanEval 99.0 (coding—highest on any leaderboard), MATH-500 98.0 (mathematics), GPQA Diamond 87.6, and Chatbot Arena 1447. These scores place it among the top frontier models and exceed many closed-source models from OpenAI and Google, while costing significantly less per token. K2 (text-only) scores 78.6 MMLU and 94.5 HumanEval, still competitive with most open-source models.