Name: Claude 1.0: Anthropic's First LLM (2023) | hokai.io
Brand: Anthropic

Question 1

What is Claude 1.0 and who built it?

Accepted Answer

Claude 1.0 is the founding language model of the Claude family, built by Anthropic and released on March 14, 2023 via limited API beta to approved developers. Anthropic is a San Francisco AI safety company founded in January 2021 by Dario Amodei, Daniela Amodei, and five other former OpenAI researchers. Claude 1.0 uses a dense Transformer architecture and was the first commercial deployment of Anthropic's Constitutional AI training method, which guides model behavior using a written set of principles during training rather than relying solely on human feedback at every step. Parameter count was never disclosed, which became standard practice across all Anthropic model generations. The model was positioned for general text tasks with a strong safety focus and was available only to selected API partners during its initial release period. It sat at the entry point of what would become a multi-generation lineup including Claude 2, Claude 3, and Claude 4 families.

Question 2

How much did Claude 1.0 cost per 1M tokens?

Accepted Answer

Exact historical per-million-token pricing for Claude 1.0 is no longer available in Anthropic's documentation following the model's retirement on November 6, 2024. During its active period in 2023, the model was priced on a per-token basis via the Anthropic API at rates competitive with GPT-3.5 class models. Anthropic made an unusual pricing decision when it expanded the context window to 100,000 tokens with claude-1.3 in May 2023, charging the same per-token rate as the original 9,000-token model. No free tier, batch discount, or prompt caching existed for the Claude 1 family. AWS Bedrock also offered the Claude 1 series at per-token pricing consistent with Anthropic's direct API rates during 2023. Since the model is fully retired, no cost is associated with current usage because no API access is available.

Question 3

What was Claude 1.0's context window and max output?

Accepted Answer

Claude 1.0 launched with a 9,000-token context window, equivalent to approximately 6,000 to 7,000 words or about 20 to 25 pages of text. This was the combined limit for input plus output in a single request. The model's specific max output token limit was not separately documented by Anthropic. The context window was expanded significantly to 100,000 tokens with the release of claude-1.3 in May 2023, an approximately eleven-fold increase that enabled ingesting full books, large codebases, or lengthy conversation histories. Anthropic notably applied the same per-token pricing to both the 9K and 100K context tiers when the expansion was announced. By comparison, the current Claude Opus 4.7 supports a 1,000,000-token context window, more than a hundred times the original Claude 1.0 limit.

Question 4

How did Claude 1.0 compare on benchmarks vs GPT-3.5?

Accepted Answer

Anthropic did not publish formal benchmark scores for Claude 1.0. No MMLU, HumanEval, GSM8K, or SWE-bench results were released for any version of the Claude 1 family in a public model card or technical report. Anthropic's first detailed benchmark disclosures came with Claude 2 in July 2023, which the company described as improved over the Claude 1 baseline, without citing specific Claude 1 figures. Contemporary evaluators in 2023 placed Claude 1.0 in a comparable tier to GPT-3.5 for general text tasks, with lower harmful output rates attributed to Constitutional AI training. No independent third-party benchmark scores from that period are widely cited. The absence of benchmark data makes direct numerical comparison to GPT-3.5 not possible from current sources.

Question 5

Was Claude 1.0 open source or proprietary?

Accepted Answer

Claude 1.0 was fully proprietary, with closed weights and API-only access. No model weights were made available for download, and no self-hosting option existed. Access required an Anthropic API key during the model's active period from March 2023 to November 2024. The model was available via the Anthropic direct API and, during 2023, via Amazon Bedrock as a managed cloud deployment. Google Vertex AI and Azure were not available for the Claude 1 series. Anthropic has committed to long-term preservation of model weights for retired models and has expressed intent to make past models accessible again in the future, but as of May 2026 no public access has been restored.

Question 6

What modalities did Claude 1.0 support?

Accepted Answer

Claude 1.0 supported only text input and text output. There was no support for images, audio, video, PDFs, or file uploads of any kind. The Claude 1 family across all versions (1.0, 1.1, 1.2, 1.3) remained strictly text-to-text throughout its operational period. Multimodal input capabilities first appeared in the Claude 3 family in March 2024, more than a year after Claude 1.0's release. Function calling and tool use were not part of the Claude 1 API feature set. The Messages API that introduced structured tool use came with Claude 2 and later generations. For any task requiring image analysis, code execution, file parsing, or audio processing, Claude 1.0 was not usable.

Question 7

Did Claude 1.0 train on user data?

Accepted Answer

Anthropic's policy for the Claude 1 series was that API inputs were not used to train production models by default. Standard data retention applied during the model's active period, with inputs retained under Anthropic's API terms for abuse monitoring purposes. Enterprise zero-retention options that became prominent with Claude 3 and Claude 4 were not a feature of the Claude 1 API offering. The Claude 1 generation predated Anthropic's formal SOC 2 Type II, ISO 27001, and HIPAA certifications, which were added in later product cycles as the company expanded into regulated enterprise markets. For research and historical analysis, the model's training data consisted of a proprietary mix of public web text, licensed datasets, and human feedback annotations, with a cutoff estimated at early 2023.

Question 8

Who was Claude 1.0 best for and who should have avoided it?

Accepted Answer

During its active period, Claude 1.0 was best suited for developers building text-based applications who wanted a safety-conscious alternative to GPT-3.5 with Constitutional AI alignment guarantees. Content moderation teams, summarization tools, and question-answering systems were common use cases. Teams requiring vision, audio, or document parsing should have avoided Claude 1.0 entirely, as it was text-only. Developers needing large context for legal documents or codebases would have been better served by claude-1.3 (100K) or Claude 2 (same context, stronger performance). As of November 6, 2024, the model is fully retired and inaccessible for any workload. Any developer referencing claude-1.0 in production code must migrate to an active model; Anthropic recommended claude-haiku-4-5-20251001 as the migration target for the Claude 1 family.

Claude 1.0: Anthropic's First LLM (2023) | hokai.io

About Claude 1.0

Pricing

Key Features

Pros

Cons

Frequently Asked Questions