Scale AI: $29B Data Engine for AI Labs (Founded 2016)

Scale AI is the $29B AI training data company founded in 2016. Powers RLHF, annotation, and model evaluation for frontier AI labs and US defense programs.

Scale AI, founded in San Francisco in 2016 by Alexandr Wang and Lucy Guo, is the primary AI training data company powering frontier model development. Its Data Engine covers annotation, RLHF, and model evaluation. Scale raised $15.9 billion total and is valued at $29 billion following Meta's 49% stake purchase in June 2025, with Jason Droege serving as Interim CEO.

Scale AI was founded in 2016 in San Francisco by Alexandr Wang and Lucy Guo to provide labeled training data to AI teams. The company raised $15.9 billion total, reaching a $29 billion valuation after Meta's $14.3 billion investment in June 2025. Scale's Data Engine handles annotation, RLHF, and model evaluation for clients including the US Department of Defense, with over 6,700 workers globally.

Founded: 2016 · HQ: San Francisco, CA, USA · Team: 1000-5000 · CEO: Jason Droege (Interim) · Funding: $15.9B total raised across 9 rounds (Series G lead: Meta $14.3B, June 2025; also Y Combinator, Dragoneer, Tiger Global, Index Ventures) · Valuation: $29B (Series G, June 2025, Meta 49% stake)

About Scale AI

Scale AI was founded in San Francisco in 2016 by Alexandr Wang, a 19-year-old MIT freshman, and Lucy Guo, a 22-year-old Carnegie Mellon student, who met during a Quora internship. Both dropped out to join Y Combinator's winter 2016 cohort, building a platform to solve a universal AI bottleneck: producing high-quality labeled training data at volume. Scale built an API-first service combining machine learning pre-annotation with expert human review, turning structured data annotation into a managed pipeline rather than a manual burden. Guo left Scale in 2018; Wang continued as CEO and built the company into the dominant AI training data provider before departing for Meta in June 2025. Scale's flagship product is the Data Engine, an end-to-end platform for the full AI training data lifecycle: collection, curation, annotation, RLHF (reinforcement learning from human feedback), and model evaluation. The platform supports text, images, video, 3D point clouds, audio, and sensor fusion data types, with applications spanning autonomous vehicles, robotics, and large language model training. The Data Engine is offered in two modes: Rapid, where Scale provides its software alongside a managed contractor workforce handling recruitment and quality control; and Self-Serve, where enterprise customers operate Scale's tooling with their own annotation teams. Both modes feed into the same quality pipeline, with Scale's AI pre-labeling tools accelerating work that human reviewers then verify and correct. Scale Donovan is the government and national defense product line, a secure interface for US military and intelligence customers that surfaces insights from classified and unclassified data sources. In April 2026, Scale acquired Illumina Computing Group (ICG) to deepen its defense analytics capabilities with ICG's specialized government tooling. Scale also offers test and evaluation services that combine human red-teamers with LLM-assisted techniques to identify risks and vulnerabilities in frontier AI models before deployment. The government segment has become Scale's highest-margin business and the clearest remaining competitive advantage following the Meta deal. In September 2025, Scale launched a global physical AI program, hiring contractors worldwide to record point-of-view demonstrations for companies training AI-powered robots. That same year, Scale won a $100 million contract from the US Department of Defense Chief Digital and Artificial Intelligence Office to deliver advanced AI tools to warfighters. The applications business more than doubled revenue in the second half of 2025 as Scale moved capabilities from its data division into deployable enterprise products. In 2026, Scale is investing in realistic reinforcement learning environments and agentic AI tooling for enterprise customers. Scale AI raised $15.9 billion in total funding across nine rounds from 62 investors including Y Combinator, Dragoneer Investment Group, Tiger Global Management, and Index Ventures. The defining event was Meta's June 2025 commitment: $14.3 billion for a 49% minority stake, valuing Scale at over $29 billion, the largest single private AI investment on record at that time. Meta had first invested in Scale's Series F round in May 2024. Scale remains a private company as of mid-2026 with no public IPO filing, and analysts estimate a potential public offering between 2027 and 2029. Scale earns revenue through managed annotation services priced per task (from cents for simple image classification to several dollars for complex 3D bounding boxes) and software platform subscriptions for enterprise self-serve customers. Government contracts, measured in tens to hundreds of millions of dollars per engagement, represent the highest-margin segment. Scale reported $2 billion in annual recurring revenue in 2025, with the data business turning profitable and the applications segment more than doubling in the second half of the year. Meta's 49% stake triggered Google, OpenAI, and Microsoft to reduce or end their Scale relationships over vendor-neutrality concerns, shifting the commercial balance toward government and defense customers. Jason Droege serves as Interim CEO, promoted from Chief Strategy Officer in June 2025 when founder Alexandr Wang joined Meta to lead Meta Superintelligence Labs. Droege joined Scale in September 2024 with over 20 years of technology leadership including senior roles at Uber Eats and Axon. Wang remains a director on Scale's board. Scale employs approximately 1,200 full-time staff in engineering, sales, and operations, with contractors bringing total headcount to about 6,693 as of May 2026. Full-time headcount declined roughly 15% year-over-year in 2025 as AI-assisted labeling tools automated more of the annotation pipeline. Scale's mission, updated in September 2025, is to develop reliable AI systems for the world's most important decisions. The company publishes work on data quality methodology, model evaluation benchmarks, and frontier model red-teaming, using human expert evaluators and LLM-assisted methods to find failure modes in production AI systems. Scale's test and evaluation practice positions the company as an independent quality auditor for some of the largest AI models in deployment. The 2026 expansion into physical AI training data and robotics extends this mandate into autonomous systems. Scale's main competitors are Appen (a publicly traded managed annotation service), Labelbox (an enterprise self-serve software platform), and Surge AI (a bootstrapped RLHF specialist that crossed $1 billion in ARR by 2025). Scale beats Appen on RLHF depth for frontier models and US government security clearances; Appen has lower costs on high-volume, lower-complexity tasks. Against Labelbox, Scale wins on managed workforce depth while Labelbox offers more self-serve flexibility. The clearest competitive advantage Scale holds today is the US defense vertical, where FedRAMP High certification, CDAO contracts, and government-cleared teams create switching costs that commercial competitors cannot replicate. Scale's April 2026 acquisition of Illumina Computing Group deepened its defense analytics portfolio, and the company projects its international business to double in 2026 through government partnerships in allied nations. With $15.9 billion raised and a $29 billion valuation, pressure for a liquidity event remains, though the Meta transaction gave shareholders significant liquidity and reduced near-term IPO urgency. Jason Droege's interim status is expected to resolve in 2026 or 2027 via a permanent appointment or new hire. The commercial losses of Google, OpenAI, and Microsoft make the defense and government segment critical to Scale's next growth chapter.

Mission

Develop reliable AI systems for the world's most important decisions.

Products

Compliance

SOC 2 Type II, ISO 27001:2022, HIPAA-eligible, FedRAMP High, UK Cyber Essentials

Links

Website · Twitter · LinkedIn · Blog

Frequently Asked Questions

What is Scale AI and what do they build?

Scale AI is an AI data infrastructure company founded in San Francisco in 2016 that builds the training data layer behind frontier AI models and government AI systems. The company's core platform, the Data Engine, handles every stage of the data preparation pipeline: collection, curation, annotation, RLHF (reinforcement learning from human feedback), and model evaluation. Scale supports text, images, video, 3D point clouds, audio, and sensor fusion data types across verticals including autonomous vehicles, robotics, and large language model development. A separate product, Scale Donovan, serves US government and defense customers with a secure interface for extracting insights from classified and unclassified data sources. In April 2026, Scale acquired Illumina Computing Group to expand its defense analytics capabilities. Scale reported $2 billion in annual recurring revenue in 2025 and is valued at $29 billion following Meta's $14.3 billion investment in June 2025. The platform is accessible at scale.com, with enterprise and government contracts negotiated directly with Scale's sales team.

Who founded Scale AI and who is the CEO?

Scale AI was co-founded in 2016 by Alexandr Wang and Lucy Guo after the pair met during a Quora internship in San Francisco. Wang was a 19-year-old MIT freshman and Guo was a 22-year-old Carnegie Mellon student; both dropped out to join Y Combinator's winter 2016 cohort. Guo left Scale in 2018, with both founders retaining equity that later made them billionaires as the company grew to a $29 billion valuation. Wang served as CEO from founding through June 2025, when Meta's $14.3 billion investment was paired with Wang's move to lead Meta Superintelligence Labs as Chief AI Officer. Jason Droege, who joined Scale in September 2024 as Chief Strategy Officer, was promoted to Interim CEO in June 2025. Droege brings over 20 years of technology leadership including senior roles at Uber Eats and Axon. Wang remains a director on the Scale board of directors as of mid-2026.

How much funding has Scale AI raised?

Scale AI has raised $15.9 billion in total funding across nine rounds from 62 investors. Early backers include Y Combinator from the winter 2016 cohort, followed by Dragoneer Investment Group, Tiger Global Management, and Index Ventures across subsequent series as Scale won clients including OpenAI, Google, Microsoft, and the US Department of Defense. Meta Platforms first invested in Scale's Series F round in May 2024, then committed to the transformative deal in June 2025: $14.3 billion for a 49% minority stake, valuing Scale at over $29 billion and setting a record as the largest single private AI investment at that time. The Meta transaction provided existing shareholders with significant liquidity and reduced near-term pressure for an IPO. Scale remains a private company as of mid-2026 with no public IPO filing. Analysts estimate a potential public offering between 2027 and 2029. Scale's 2025 ARR of $2 billion supports the case for a public company valuation at current levels.

What products does Scale AI make?

Scale AI's primary product is the Data Engine, available in two modes: Rapid (Scale provides a managed contractor workforce alongside its software, handling recruitment, quality control, and workflow) and Self-Serve (enterprise customers use Scale's tooling with their own annotation teams). The Data Engine covers the full machine learning data lifecycle: collection, curation, annotation, RLHF for language model fine-tuning, and model evaluation. Scale Donovan is the government and defense product: a secure, auditable interface for US military and intelligence customers to analyze classified and unclassified data, deployed under contracts with the Chief Digital and Artificial Intelligence Office. Scale also offers test and evaluation services, combining human red-teamers and LLM-assisted techniques to identify risks in AI models before deployment. In April 2026, Scale acquired Illumina Computing Group's defense analytics platform, adding government-specific tooling to the portfolio. Task pricing for the Data Engine ranges from cents per simple classification to several dollars for complex 3D bounding boxes, with enterprise and government contracts negotiated separately.

Where is Scale AI headquartered and how big is the team?

Scale AI is headquartered in San Francisco, California, where it has been based since founding in 2016. The company maintains government-cleared facilities for its Donovan and defense-focused teams separate from its main commercial offices. Full-time headcount is approximately 1,200 employees covering engineering, sales, operations, and management roles. Total headcount including contractors reaches about 6,693 as of May 2026, with the contractor workforce performing the bulk of actual annotation and labeling tasks. Full-time headcount declined roughly 15% year-over-year in 2025 as AI-assisted labeling tools automated more of the annotation pipeline. Scale is actively hiring for government, physical AI, and enterprise applications roles in 2026. The company projects its international business to double in 2026 through government partnerships in allied nations, suggesting headcount growth outside the US.

What is Scale AI's mission or research focus?

Scale AI's stated mission is to develop reliable AI systems for the world's most important decisions, a formulation updated in September 2025 to emphasize production-readiness as AI moves from research pilots to deployed products. The company publishes research on data quality methodology, model evaluation benchmarks, and red-teaming practices for frontier language models. Scale's test and evaluation practice functions as an independent auditor for some of the largest AI systems in production, using both human expert evaluators and automated LLM-based testing to identify failure modes and vulnerabilities. In 2026, Scale is expanding into physical AI training data through a global robotics data program that recruits contractors to produce point-of-view demonstrations for companies training AI-powered robots. The government mission through Scale Donovan connects the data work to national security applications, from intelligence analysis to warfighter decision support. Scale does not primarily identify as a research lab, but its data curation methodology and model evaluation standards influence how the industry measures AI system quality.

Is Scale AI compliant with SOC 2, GDPR, and HIPAA?

Scale AI holds SOC 2 Type II certification, confirming its security controls have been independently audited for availability, confidentiality, and processing integrity. The company is also certified under ISO 27001:2022 for information security management, with the certificate available from Scale's trust center at trust.scale.com. Scale is HIPAA-eligible for customers handling protected health information. FedRAMP High authorization enables Scale to handle classified and sensitive US government data on approved cloud infrastructure. UK Cyber Essentials certification covers Scale's British government work. GDPR compliance is addressed through Scale's data processing agreements for European enterprise customers. Scale does not train its own AI models on customer annotation data by default, and enterprise customers can negotiate zero-retention terms for sensitive data.

Who are Scale AI's main competitors?

Scale AI's primary competitors are Appen, Labelbox, and Surge AI in the AI training data market. Appen is a publicly traded managed annotation service with a model similar to Scale's Rapid tier; Scale beats Appen on RLHF quality for frontier models and US government security clearances, while Appen offers lower costs on high-volume, lower-complexity tasks. Labelbox is an enterprise software platform that lets teams manage their own annotation workflows; Scale wins on managed workforce depth while Labelbox wins on self-serve flexibility for teams that want full pipeline control. Surge AI, a bootstrapped San Francisco company, crossed $1 billion in ARR by 2025, competing specifically on RLHF and preference-tuning for language model providers. Meta's June 2025 investment triggered Google, OpenAI, and Microsoft to reduce or exit their Scale relationships over vendor-neutrality concerns about Meta's 49% stake. Scale's clearest competitive barrier today is the US defense vertical, where FedRAMP High certification, CDAO contracts, and government-cleared personnel create switching costs that commercial annotation competitors cannot match.