Name: Bosun v1.1: Open-Source Knowledge Graph Reranker (2026)
Brand: Hanno Labs
Availability: InStock

Question 1

What is Bosun v1.1 and who built it?

Accepted Answer

Bosun v1.1 is a programmable judge model released June 11, 2026 by Hanno Labs, an AI research lab focused on causal intelligence and knowledge systems. It is a LoRA fine-tune of Qwen3-Reranker, available in two sizes: 0.6B (Bosun-XS) and 4B (Bosun-4B). Bosun is designed to evaluate whether connections in an agent's knowledge graph are warranted -- supported by evidence, non-redundant, and still factually current. Instead of using fixed criteria, Bosun is programmed at inference time by a natural-language instruction defining the specific rule to apply, allowing it to be reprogrammed per batch without fine-tuning. Version 1.1 added directional and typed-edge judgment (supersession, depends-on, supports, contradicts), expanding on the symmetric judgment of v1.0. The model outputs a calibrated probability P = sigmoid(logit_yes - logit_no) in the range 0 to 1. It is released under Apache 2.0 with no hosted API and no commercial license requirement.

Question 2

How much does Bosun cost?

Accepted Answer

Bosun is completely free. Both Bosun-XS (0.6B) and Bosun-4B are released under Apache 2.0 and available for free download from HuggingFace with no per-token fee, no rate limit, and no commercial license requirement. The only cost is your own infrastructure. GGUF builds at Q4_K_M quantization require approximately 0.5 GB RAM for Bosun-XS and 2.5 GB RAM for Bosun-4B, both runnable on CPU without a GPU. For a team processing 10,000 knowledge graph edges per run on a modern CPU, the electricity cost is negligible. For GPU-accelerated batch inference on a leased A100, running 100,000 pairs through Bosun-4B costs roughly 0.30 USD in cloud compute time. Compare this to calling a frontier LLM API at 3 to 15 USD per million tokens for equivalent judgment workloads, where costs accumulate quickly at scale. Apache 2.0 allows modification and redistribution without any vendor royalty.

Question 3

What is Bosun's context window and how does it handle long document pairs?

Accepted Answer

Bosun is built on Qwen3-Reranker, which has an effective maximum sequence length of approximately 8,192 tokens per pair. This covers the combined length of the instruction, the fixed query string, and the two findings being compared. Bosun is designed for paragraph-length comparisons, not multi-page document analysis. If you supply findings that together exceed roughly 8,000 tokens, the Qwen3 tokenizer will truncate the input silently, degrading accuracy without raising an error. There is no sliding window or long-context mode available. For knowledge graph curation, where individual facts are typically one to five sentences each, 8,192 tokens is more than adequate. For comparing long documents such as research papers or legal contracts, a frontier LLM with a 200,000-token context window is more appropriate. The model produces only a scalar score as output, so there is no max output token limit to consider.

Question 4

How does Bosun compare to using GPT or Gemini as a knowledge graph judge?

Accepted Answer

Bosun outperforms Gemini 3.1 Flash Lite on PAWS paraphrase detection (0.91 vs 0.81) and WarrantBench steerability (0.945 vs 0.575), making it the better choice when judgments must reliably flip when instructions are negated -- the core requirement for graph edge curation. On FollowIR instruction-following retrieval, Bosun-4B scores +17.9 p-MRR (first place) compared to Gemini's capped 12.0. On e-CARE causal direction, Bosun-4B (0.85) nearly matches Gemini (0.86). However, Gemini outperforms Bosun on ANLI adversarial NLI (0.74 vs 0.57 for Bosun-4B), so for ANLI-heavy workloads a frontier LLM API is more accurate. The cost difference is substantial: Bosun runs free on CPU, while calling Gemini or GPT at scale costs real money per token. For most knowledge graph and RAG filtering workloads, Bosun's instruction-following accuracy exceeds what a general-purpose API model offers at zero ongoing cost after setup.

Question 5

Is Bosun open source?

Accepted Answer

Yes, Bosun is fully open-source under Apache 2.0. Both the 0.6B (Bosun-XS) and 4B (Bosun-4B) model weights are publicly available on HuggingFace at Hanno-Labs/bosun-xs and Hanno-Labs/bosun-4b. The Apache 2.0 license allows commercial use, modification, redistribution, and sublicensing with no restrictions beyond attribution. GGUF quantizations (f16, Q8_0, Q4_K_M) are available at Hanno-Labs/bosun-xs-GGUF and Hanno-Labs/bosun-4b-GGUF for use with llama.cpp, Ollama, and other local inference runtimes. The WarrantBench evaluation benchmark and dataset are also open-source at github.com/Hanno-Labs/warrantbench. There is no private enterprise version or closed-weight variant; the publicly released weights are the production weights Hanno Labs uses.

Question 6

What inputs and outputs does Bosun support?

Accepted Answer

Bosun accepts text inputs only and produces a single floating-point score as output. The input follows a three-part template: an instruction block defining the rule to apply, a fixed query string ('These two findings share the specified relationship'), and a document block containing FINDING A and FINDING B as text strings. The output is a probability P = sigmoid(logit_yes - logit_no) in the range 0 to 1, where values closer to 1 indicate the pair satisfies the supplied rule more strongly. Bosun does not support vision, audio, video, structured JSON output, natural language generation, function calling, or code execution. It is a pure scoring model. The instruction block accepts any natural-language rule: 'Finding B supersedes Finding A', 'These two facts contradict each other', 'Finding B depends on Finding A being true', and so on. Version 1.1 specifically added training for directional and typed-edge rules, improving accuracy on asymmetric relationships like supersession and dependency.

Question 7

Does Bosun train on user data?

Accepted Answer

No. Bosun is a self-hosted model with no vendor-operated API. Because all inference runs on your own infrastructure, no data is ever sent to Hanno Labs. There is no telemetry, no usage monitoring, no input logging, and no model training on your data. The Apache 2.0 license gives you full control over the model weights and all outputs. For regulated industries or air-gapped deployments, the GGUF builds work completely offline with no external network calls. Hanno Labs has not published SOC 2 Type II, ISO 27001, HIPAA, or GDPR certifications for Bosun, which is expected for a self-hosted open-source model since those certifications apply to vendor-operated services. Your organization's own compliance posture governs your deployment.

Question 8

Who is Bosun best for and who should avoid it?

Accepted Answer

Bosun is best for AI engineers building agent memory systems with knowledge graphs who need a fast, cheap, accurate pruning layer that removes stale or unsupported edges at scale. RAG pipeline engineers who need instruction-following reranking -- where the acceptance rule changes per query -- will find Bosun's +17.9 FollowIR score far more relevant than static embedding similarity. Open-source teams needing a free judge model with full data sovereignty and no API dependency are the core user group. Teams that should not use Bosun include those who need a generative or chat model (Bosun outputs a float, not text), those processing document pairs over 8,000 tokens (inputs will be truncated), teams needing high adversarial NLI accuracy (Gemini 3.1 Flash Lite scores 0.74 vs Bosun-4B's 0.57), and organizations without ML engineers who can manage a self-hosted PEFT or GGUF inference stack. For non-English text, performance is untested and likely degraded.

Bosun v1.1: Open-Source Knowledge Graph Reranker (2026)

About Bosun

Pricing

Key Features

Pros

Cons

Benchmarks

Frequently Asked Questions