Skip to main content

Anthropic

Definition

Anthropic is an AI safety company and model provider founded in 2021 by former OpenAI researchers. Its core thesis is that building capable AI models and solving the alignment problem are inseparable goals — the company pursues cutting-edge capability alongside safety research such as Constitutional AI, interpretability, and mechanistic understanding of model internals. The commercial product of that research is the Claude family of models, available through the Anthropic API and enterprise products.

The Claude model lineup follows a three-tier naming convention reflecting capability and cost trade-offs: Opus (highest quality, complex reasoning), Sonnet (balanced quality and speed), and Haiku (fastest and most cost-efficient). As of 2025 the current generation is Claude 3.7 Sonnet — the flagship model with extended thinking capabilities — along with Claude 3 Opus, Claude 3.5 Sonnet, and Claude 3.5 Haiku. All Claude 3+ models support vision input (images), and the entire family is designed around a 200K-token context window that can handle books, large codebases, and long conversation histories without truncation.

From a platform perspective, Anthropic's API centers on the Messages API — a clean, purpose-built interface for multi-turn conversations. The platform includes tool use (Anthropic's term for function calling), extended thinking (visible chain-of-thought reasoning), prompt caching (reduces cost and latency for large repeated contexts), and batch processing. The Python SDK (anthropic) and TypeScript SDK are the primary client libraries. Claude models are also available through Amazon Bedrock, Google Cloud Vertex AI, and enterprise contracts with data residency options.

How it works

Messages API

The Messages API (POST /v1/messages) is Anthropic's primary interface. Unlike some APIs that use a flat prompt string, the Messages API is conversation-first: you send a messages array of alternating user and assistant turns, with an optional system parameter for context and persona. The model returns a Message object containing a content list — text blocks by default, tool use blocks when the model decides to call a tool. Streaming is supported and recommended for interactive use; the SDK provides both streaming helpers and raw SSE access.

Tool use

Tool use lets Claude call external functions by emitting structured tool_use content blocks. You declare tools as JSON schemas in the tools parameter. When Claude decides a tool is needed, the response contains a tool_use block with the tool name and input; your code executes the function and returns a tool_result in the next user turn. Claude then uses the result to complete its response. This pattern enables agents, code execution environments, database queries, and API integrations without the model needing direct access to any system.

Extended thinking

Extended thinking is a mode available on Claude 3.7 Sonnet that allows the model to reason at length before producing its final answer. When you set thinking: {type: "enabled", budget_tokens: N}, the model emits thinking content blocks containing its internal scratchpad — similar to chain-of-thought but native and structured. Extended thinking significantly improves performance on math competitions, complex code, multi-step reasoning, and tasks requiring careful step-by-step analysis. The thinking tokens count toward the token budget but are visible in the response, giving you transparency into how the model arrived at its answer.

Prompt caching

Prompt caching dramatically reduces cost and latency for workloads that repeatedly use large system prompts or document contexts. You mark prefix sections of your request with cache_control: {type: "ephemeral"}. On the first call, Anthropic caches the prompt prefix on their infrastructure; subsequent calls that match the prefix are served from cache at 90% lower input token cost and significantly reduced time-to-first-token. This is especially valuable for RAG pipelines (large context passed with every query), agent loops (large system prompts repeated every turn), and batch document processing.

Long context (200K tokens)

All Claude 3 and later models support a 200K-token context window — equivalent to roughly 150,000 words or ~500 pages of text. Long context enables entire codebases, legal documents, research papers, or full conversation histories to be processed in a single call without chunking. Anthropic's research on long-context performance ("needle in a haystack" evaluations) shows Claude maintains strong recall accuracy across the full 200K range, making it reliable for document Q&A, contract analysis, and code review over large repositories. This is one of Anthropic's clearest differentiators relative to GPT-4o's 128K window.

When to use / When NOT to use

Use Anthropic whenAvoid or consider alternatives when
You need a 200K context window to process long documents, codebases, or extended conversations without chunkingYour workload requires image generation, audio transcription, or text-to-speech — Claude is text/vision only; OpenAI covers audio
Safety constraints and predictable refusal behavior are critical (compliance, healthcare, finance)You need open-weights models for self-hosting, fine-tuning, or data residency — Anthropic offers no open-weights option
You want extended thinking for deep reasoning tasks (math, complex code, multi-step analysis)Your primary use case is high-volume embedding generation — Anthropic does not offer an embeddings API
Prompt caching will meaningfully reduce cost (large repeated contexts, agent system prompts)You rely heavily on OpenAI-specific tooling (Assistants API, DALL-E, Whisper) that has no Anthropic equivalent
You are building tool use or computer use workflows and want a model well-calibrated for structured outputsYou need the absolute lowest cost-per-token at scale — Claude Haiku competes on price but GPT-4o-mini and open models are cheaper

Comparisons

CriteriaAnthropicOpenAIGoogle Gemini
Flagship modelClaude 3.7 SonnetGPT-4oGemini 2.5 Pro
Context window200K (all Claude 3+)128K (GPT-4o)Up to 1M (Gemini 1.5 Pro)
Reasoning / thinkingExtended thinking (native CoT)o1, o3 seriesGemini 2.5 Pro thinking
Multimodal inputText, imageText, image, audio, videoText, image, audio, video
Audio / speechNoYes (Whisper, TTS)Yes (Gemini)
Image generationNoYes (DALL-E 3)Yes (Imagen)
Embeddings APINoYesYes
Open-weightsNoNoGemma (partial)
Prompt cachingYes (native, 90% discount)Context caching (limited)Yes (Gemini)
Tool use / function callingMature, computer use supportMature, widely adoptedMature
Safety philosophyConstitutional AI, refusal-tunedModeration API, usage policyResponsible AI guidelines
Data residency optionsEnterprise contractEnterprise contractGoogle Cloud regions

Pros and cons

ProsCons
200K context window across all models — best-in-class for long documentsNo audio, speech, or image generation APIs
Extended thinking gives transparent chain-of-thought for hard reasoning tasksNo embeddings API — you need a second provider for RAG
Prompt caching significantly reduces cost for repeated large contextsClosed model with no open-weights option
Safety-first design with careful refusal calibration and Constitutional AISmaller ecosystem than OpenAI — fewer third-party tutorials and integrations
Computer use (beta) enables agentic control of desktop GUIsPricing can be higher than GPT-4o-mini or open-weights alternatives for simple tasks

Code examples

Messages API — basic completion and system prompt

import anthropic

client = anthropic.Anthropic(api_key="sk-ant-...") # or set ANTHROPIC_API_KEY env var

# Basic message
message = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
system="You are a concise technical assistant. Answer in plain English.",
messages=[
{"role": "user", "content": "What is the Anthropic Messages API?"}
],
)
print(message.content[0].text)

# Multi-turn conversation
messages = [
{"role": "user", "content": "What is prompt caching?"},
{"role": "assistant", "content": "Prompt caching stores repeated large context..."},
{"role": "user", "content": "How much does it save?"},
]
response = client.messages.create(
model="claude-3-5-haiku-20241022",
max_tokens=512,
messages=messages,
)
print(response.content[0].text)

Tool use

import json
import anthropic

client = anthropic.Anthropic()

# Define tools as JSON schemas
tools = [
{
"name": "search_docs",
"description": "Search the documentation for a given query and return relevant passages.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"},
"max_results": {"type": "integer", "default": 3},
},
"required": ["query"],
},
}
]

messages = [{"role": "user", "content": "How do I enable prompt caching?"}]

# First call — Claude may request a tool
response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=messages,
)

# Process tool calls
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})

tool_results = []
for block in response.content:
if block.type == "tool_use":
# Simulated tool execution
result = f"Prompt caching docs for '{block.input['query']}': use cache_control param..."
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})

messages.append({"role": "user", "content": tool_results})

# Final call with tool result
final = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=1024,
tools=tools,
messages=messages,
)
print(final.content[0].text)

Extended thinking

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
model="claude-3-7-sonnet-20250219",
max_tokens=16000,
thinking={
"type": "enabled",
"budget_tokens": 10000, # tokens allocated for internal reasoning
},
messages=[{
"role": "user",
"content": (
"A train leaves city A at 9am traveling at 80 km/h. "
"Another train leaves city B (320 km away) at 10am traveling at 100 km/h. "
"At what time do they meet, and how far from city A?"
),
}],
)

for block in response.content:
if block.type == "thinking":
print("=== Model's internal reasoning ===")
print(block.thinking[:500], "...") # first 500 chars for brevity
elif block.type == "text":
print("=== Final answer ===")
print(block.text)

Prompt caching for repeated large context

import anthropic

client = anthropic.Anthropic()

# Large document loaded once — cached after first call
large_document = open("contract.txt").read() # e.g., 50K tokens

response = client.messages.create(
model="claude-3-5-sonnet-20241022",
max_tokens=1024,
system=[
{
"type": "text",
"text": "You are a legal document analyst. Answer questions based solely on the document provided.",
},
{
"type": "text",
"text": large_document,
"cache_control": {"type": "ephemeral"}, # mark for caching
},
],
messages=[{"role": "user", "content": "What are the termination clauses?"}],
)

print(response.content[0].text)
# usage.cache_creation_input_tokens — tokens cached this call (full price)
# usage.cache_read_input_tokens — tokens served from cache (10% price)
print(response.usage)

Practical resources

See also

  • Model providers — Overview and comparison of all providers including a 7-provider comparison table
  • Case study: Claude — For a deeper look at model architecture and training methodology, see the Claude case study
  • OpenAI — GPT-4o, o-series reasoning, function calling, DALL-E, Whisper
  • Prompt engineering — Techniques applicable to all Claude models
  • Tools — Claude Code, Anthropic's AI coding agent built on the Claude API
  • Agents — Building agentic workflows with Claude tool use