Pulsar 16B: AIME 87.22 Open Reasoning Model (2026)

Pulsar 16B by Multiverse Computing (June 2026): Apache 2.0 open model with 16B parameters, AIME 2025 score 87.22, GPQA Diamond 71.41, 4,808 tok/s on Blackwell.

Pulsar 16B is an open reasoning model released June 23, 2026 by Multiverse Computing, with 16.15B total parameters (3.1B active), AIME 2025 score 87.22, and GPQA Diamond 71.41, matching NVIDIA Nemotron 3 Nano 30B-class performance at half the parameter count. Available free under Apache 2.0 on Hugging Face in BF16, FP8, and NVFP4 formats, it delivers 4,808 tokens per second on NVIDIA Blackwell GPU with FP8 precision and a 1,000,000-token context window.

Pulsar 16B, released June 23, 2026 by Multiverse Computing, is a 16.15B-parameter open reasoning model compressed from NVIDIA Nemotron 3 Nano 30B using quantum-inspired tensor networks. It scores 87.22 on AIME 2025 and 71.41 on GPQA Diamond, matching 30B-class performance at half the parameter count. Released under Apache 2.0, it delivers 4,808 tokens per second on NVIDIA Blackwell in FP8 precision.

Provider: Multiverse Computing · Family: Pulsar

Context window: 1,000,000 tokens

Input modalities: text, tool-calls · Output: text, tool-calls

About Pulsar 16B

Pulsar 16B is an open reasoning model released on June 23, 2026 by Multiverse Computing, a Spanish AI infrastructure company headquartered in San Sebastian. Built in collaboration with NVIDIA, the model carries 16.15 billion total parameters with 3.1 billion active at any inference step. The architecture is a Hybrid Mamba2-Transformer with Mixture-of-Experts, compressed via Multiverse Computing's proprietary CompactifAI quantum-inspired tensor-network technology from the larger NVIDIA Nemotron 3 Nano 30B base model (31.6B total, 3.5B active). No retraining from scratch was involved; the compression preserved the original reasoning behavior, instruction-following, and tool-use interfaces intact. On the AIME 2025 math reasoning benchmark, Pulsar 16B scores 87.22, within a fraction of a point of the 30B uncompressed Nemotron base and 15 points ahead of OpenAI's gpt-oss-20B. On GPQA Diamond, the PhD-level science reasoning benchmark, the model reaches 71.41, tracking the Nemotron base closely and outpacing gpt-oss-20B's 58.88 by over 12 points. Pulsar 16B leads gpt-oss-20B by 14 points on instruction-following (IFBench) and by 11 points on function-calling (BFCL-v4). Across standard reasoning, knowledge, coding, and tool-use benchmarks, Pulsar 16B matches its 30B-class starting point and outperforms gpt-oss-20B on nearly every axis. Pulsar 16B inherits the 1,000,000-token context window from its NVIDIA Nemotron 3 Nano 30B base model. Multiverse Computing evaluated long-context recall using LongBench, AA-LCR, the RULER suite, and needle-in-a-haystack tasks at progressively longer spans. Needle retrieval remains essentially perfect on both sides of the 100K token mark. On harder RULER tasks at extended context, Pulsar 16B tracks the uncompressed base model closely, indicating the compression did not degrade long-context recall. The Nemotron base achieves RULER scores of 87.5% at 64K tokens, 82.92% at 128K, and 70.56% at 512K. Pulsar 16B supports text input and output with full function-calling and tool-use interfaces retained from the Nemotron base. The model uses the same prompt format, reasoning interface, and tool-calling schema as the Nemotron 3 Nano family, making it a drop-in replacement in pipelines already using Nemotron-class models. Vision and audio capabilities are not present in the base Pulsar 16B release; those modalities exist in the broader Nemotron 3 Omni family but are not included in this checkpoint. Structured output generation and parallel tool calls are supported natively. Because Pulsar 16B is released under the Apache 2.0 license, the weights are free to download and self-host. Inference costs reduce to hardware cost only. On an NVIDIA Blackwell GPU at 32 concurrent requests, the FP8 checkpoint delivers 4,808 tokens per second system throughput, a 43% improvement over the 30B base model's 3,363 tok/s. Time-to-first-token drops from 2.18 seconds to 1.24 seconds under the same hardware configuration. For teams preferring managed access, Multiverse Computing's CompactifAI API offers token-based pricing on AWS, Azure, and GCP, with costs reported up to 75% lower per token than equivalent frontier proprietary models for coding and reasoning tasks. Pulsar 16B is available on Hugging Face under the MultiverseComputingCAI organization in BF16, FP8, and NVFP4 precision formats. BF16 requires approximately 32 GB VRAM, FP8 approximately 16 GB, and NVFP4 approximately 8 GB (excluding KV cache overhead at large contexts). The model was built and validated on NVIDIA Blackwell-class accelerated computing hardware. For teams not managing their own GPU infrastructure, the CompactifAI API via AWS Marketplace uses SageMaker HyperPod for serverless scaling. On-device deployment in NVFP4 is feasible on consumer hardware for short-context tasks. Pulsar 16B inherits the safety alignment of the NVIDIA Nemotron 3 Nano base model. No separate system card specific to Pulsar 16B had been published as of June 2026. Multiverse Computing's technical documentation covers the compression methodology and evaluation setup but does not disclose specific refusal rates or red-teaming partnerships. The Apache 2.0 license permits modification of safety behavior, meaning users can adjust filtering settings. Organizations with strict compliance requirements should apply their own input/output filtering on top of the base model. The NVIDIA Nemotron base carries a configurable, enterprise-oriented safety posture. Pulsar 16B is the right choice for teams that need 30B-class reasoning on hardware or power budgets that cannot accommodate a 30B model. The 43% throughput improvement makes it well suited for high-volume agentic pipelines where cost per token and latency are critical. It fits pipelines already using the Nemotron prompt format. Teams should not choose it if they need vision or audio input without a separate preprocessing step, or if their deployment environment requires a vendor SLA, SOC 2 certification, or audit trail at the inference layer. For managed, compliance-covered inference, AWS Bedrock or Google Vertex with a certified proprietary model is the better option. Pulsar 16B was not trained from scratch; it is derived from NVIDIA Nemotron 3 Nano 30B through CompactifAI compression. The Nemotron 3 base models were trained on curated public web text and synthetic reasoning traces; the exact training cutoff has not been publicly confirmed by NVIDIA but is estimated to be early-to-mid 2025 based on public release timing. The Apache 2.0 license permits commercial use, modification, and redistribution without royalty obligations. For self-hosted deployments, no data is sent to Multiverse Computing. For CompactifAI API deployments, data governance terms apply per the commercial agreement. No SOC 2 or HIPAA certification specific to Pulsar 16B was announced at the time of writing. Pulsar 16B is the second major open model release from Multiverse Computing in 2026, following HyperNova 60B, a 50%-compressed version of GPT-OSS-120B released in early 2026. The Pulsar lineage targets the 15-20B active-parameter efficiency tier, while HyperNova targets the 55-65B range. Multiverse Computing's CompactifAI roadmap indicates future Pulsar variants will track NVIDIA Nemotron updates as the Nemotron family evolves. The company was reportedly pursuing a Series C of approximately EUR 500M at a EUR 1.5B valuation in early 2026, with the stated goal of scaling compressed AI across enterprise and edge environments globally.

Pricing

Weights are free to download and self-host under Apache 2.0. CompactifAI API offers managed token-based access; contact Multiverse Computing for pricing. Costs are reported up to 75% below comparable frontier proprietary models for coding and reasoning workloads.

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions

What is Pulsar 16B and who built it?

Pulsar 16B is an open reasoning model released on June 23, 2026 by Multiverse Computing, a Spanish AI infrastructure company founded in 2019 in San Sebastian. The model is built on a Hybrid Mamba2-Transformer with Mixture-of-Experts architecture, compressed from the NVIDIA Nemotron 3 Nano 30B base (31.6B total, 3.5B active parameters) using Multiverse Computing's proprietary CompactifAI quantum-inspired tensor network technology. The result is 16.15B total parameters with 3.1B active, with no retraining from scratch and preserved reasoning behavior. On AIME 2025, Pulsar 16B scores 87.22, within a fraction of the uncompressed 30B base and 15 points above OpenAI's gpt-oss-20B. On GPQA Diamond, it reaches 71.41, more than 12 points above gpt-oss-20B. The model was developed in collaboration with NVIDIA using Model Optimizer and Megatron Bridge libraries, and validated on NVIDIA Blackwell accelerated computing infrastructure. It sits in Multiverse Computing's Pulsar model family, targeting the 15-20B active-parameter efficiency tier below the HyperNova 60B line.

How much does Pulsar 16B cost per 1M tokens?

Pulsar 16B is released under the Apache 2.0 license, which means the weights are completely free to download and self-host. There is no per-token charge for self-hosted deployments; the only cost is the hardware running the model. On an NVIDIA Blackwell GPU in FP8 precision, you get 4,808 tokens per second, making the effective cost per 1M tokens roughly the hourly hardware cost divided by throughput. On a single RTX 4090 (approximately $0.80 per hour on spot), 1M tokens costs under $0.20 in hardware time at peak throughput. For teams that do not want to manage GPU infrastructure, Multiverse Computing offers the CompactifAI API with token-based pricing, available via AWS Marketplace and the CompactifAI portal. The company reports this API runs up to 75% below the cost of comparable frontier proprietary models for coding and reasoning workloads. Specific CompactifAI API rates are not publicly listed as of June 2026; contact Multiverse Computing directly for a quote. There is no documented free API trial period.

What is Pulsar 16B's context window and max output?

Pulsar 16B inherits a 1,000,000-token context window from its NVIDIA Nemotron 3 Nano 30B base model. Multiverse Computing validated long-context recall using LongBench, AA-LCR, RULER suite variants, and needle-in-a-haystack tasks at progressively longer spans. Needle retrieval remains essentially perfect on both sides of the 100K token mark, and the model tracks the uncompressed 30B base closely on harder RULER tasks at extended lengths. For reference, the Nemotron 3 Nano base achieves RULER scores of 87.5% at 64K tokens, 82.92% at 128K, and 70.56% at 512K; Pulsar 16B matches these curves closely. The 4,808 tok/s throughput headline is measured at short-to-medium context lengths; real-world throughput drops significantly at context lengths above 256K tokens due to attention memory scaling. Maximum output tokens have not been separately documented for Pulsar 16B as of June 2026. Compared with proprietary competitors, the 1M context window matches or exceeds GPT-4o and Claude Haiku 4.5 at no per-token cost for self-hosted workloads.

How does Pulsar 16B compare on benchmarks vs gpt-oss-20B?

Pulsar 16B outperforms gpt-oss-20B on every major benchmark category reported in the June 2026 launch announcement. On AIME 2025 math reasoning, Pulsar 16B scores 87.22 versus gpt-oss-20B's 72.22 (a gap of approximately 15 points). On GPQA Diamond science reasoning, Pulsar 16B reaches 71.41 against gpt-oss-20B's 58.88, a 12.5-point lead. On instruction-following (IFBench), Pulsar 16B leads by 14 points, and on function-calling (BFCL-v4), it leads by 11 points. These results are noteworthy because gpt-oss-20B has 20B parameters versus Pulsar 16B's 16.15B, meaning Pulsar 16B achieves stronger results at fewer parameters due to the 30B knowledge preserved through compression. Versus the uncompressed Nemotron 3 Nano 30B base, Pulsar 16B is within a fraction of a point on AIME 2025 and GPQA Diamond. No SWE-bench Verified or ARC-AGI 2 scores have been published for Pulsar 16B as of June 2026. The benchmark numbers are verified on NVIDIA infrastructure; independent third-party confirmation is pending at time of writing.

Is Pulsar 16B open source or proprietary?

Pulsar 16B is fully open source under the Apache 2.0 license. The weights are downloadable at no cost from Hugging Face under the MultiverseComputingCAI organization. Apache 2.0 permits commercial use, modification, fine-tuning, redistribution, and integration into proprietary products without royalty obligations. There are no usage restrictions or community license terms that limit production deployments, making it one of the more permissive licenses in the open-weights model ecosystem. The model is available in three precision formats: BF16 (approximately 32 GB VRAM), FP8 (approximately 16 GB VRAM), and NVFP4 (approximately 8 GB VRAM). Unlike Meta's Llama license or Mistral's weight-access terms, Apache 2.0 places no cap on commercial usage or number of users. The underlying NVIDIA Nemotron 3 Nano 30B base model is also available on Hugging Face under a separate NVIDIA license; Pulsar 16B's Apache 2.0 applies to the compressed checkpoint specifically. NVIDIA's Model Optimizer and Megatron Bridge libraries used in the compression workflow are separately licensed by NVIDIA.

What modalities does Pulsar 16B support?

Pulsar 16B supports text input and text output only in its initial June 2026 release. Vision, audio, and video input are not included in this checkpoint. The model inherits function-calling and structured output generation capabilities from the NVIDIA Nemotron 3 Nano base, supporting parallel tool calls and JSON-structured outputs natively. Function definitions follow the Nemotron 3 Nano tool-calling schema rather than the OpenAI function-calling convention, which requires a prompt-format migration for pipelines built on GPT-family models. The broader Nemotron 3 family includes multimodal variants (the Nemotron 3 Omni series) that support vision, audio, and video, but those capabilities were not included in the Pulsar 16B compression workflow. Multiverse Computing has not announced a multimodal Pulsar variant as of June 2026. For vision understanding in the same infrastructure, teams can pair Pulsar 16B with a separate vision encoder or use NVIDIA's Nemotron 3 Omni models alongside the text-reasoning checkpoint.

Does Pulsar 16B train on user data?

For self-hosted deployments of Pulsar 16B, no data is sent to Multiverse Computing or NVIDIA by default. The Apache 2.0 license covers the weights as a standalone artifact; running inference locally involves no external data transmission. Multiverse Computing does not train on self-hosted users' inputs because there is no telemetry layer baked into the weights. For CompactifAI API deployments, data governance is governed by the Multiverse Computing commercial agreement. The company has not published a comprehensive data retention policy specific to Pulsar 16B as of June 2026; enterprise teams should request a data processing agreement before using the managed API. No SOC 2 Type II, ISO 27001, or HIPAA certification has been announced for the Pulsar 16B checkpoint or the CompactifAI API as of June 2026. Teams in regulated industries should default to self-hosted deployment with their own data isolation controls until vendor certification is confirmed.

Who is Pulsar 16B best for and who should avoid it?

Pulsar 16B is best for ML engineering teams running high-throughput agentic pipelines on NVIDIA hardware who need 30B-class reasoning at 16B parameters and the cost structure of open weights. The 4,808 tok/s FP8 throughput and 43% latency improvement over the 30B base make it well suited for high-volume reasoning calls where cost per token matters. Research teams that need AIME 87.22 or GPQA 71.41 performance within a 16 GB VRAM FP8 budget benefit directly. Organizations already using the Nemotron 3 Nano prompt format can migrate with minimal pipeline changes. Teams should avoid Pulsar 16B if they need vision or audio input; the model is text-only and requires a separate vision preprocessor for multimodal tasks, where NVIDIA Nemotron 3 Omni is the correct choice. Compliance-heavy deployments in healthcare or finance requiring SOC 2 certification, vendor SLAs, or audit trails should choose AWS Bedrock or Azure OpenAI with a certified proprietary model. Teams whose pipelines rely on ChatML or Llama-3 prompt templates face a migration burden due to the Nemotron-specific extra_id format; Llama 4 or Qwen3-14B may be a lower-friction alternative at a comparable parameter count.

Visit Pulsar 16B Official Page