DeepSeek

Definition

DeepSeek is a Chinese AI research lab and commercial platform that has gained significant international attention for producing models that achieve performance competitive with the best proprietary models while releasing the weights openly and operating at a fraction of the cost. Founded in 2023 as a subsidiary of High-Flyer (a quantitative hedge fund), DeepSeek's approach is characterized by rigorous research into training efficiency — including innovations in mixture-of-experts (MoE) architectures, reinforcement learning from human feedback, and novel approaches to reasoning that do not rely on massive compute budgets.

The model lineup spans three major capability areas. DeepSeek-V3 is a general-purpose chat and instruction-following model that rivals GPT-4o and Claude 3.5 Sonnet on standard benchmarks while being dramatically cheaper to access via API. DeepSeek-R1 is a dedicated reasoning model that uses extended chain-of-thought (CoT) — the model generates explicit reasoning traces before producing a final answer — making it particularly strong on mathematics, logical deduction, and multi-step problem solving. DeepSeek-Coder (and its successor variants integrated into V3/R1) specializes in code generation, completion, and debugging across a wide range of programming languages.

DeepSeek's open-weights approach means that all major models are available on Hugging Face and can be self-hosted on your own infrastructure — a critical capability for organizations with data sovereignty requirements or those seeking to avoid per-token API costs at scale. The DeepSeek platform also exposes an API that is wire-compatible with the OpenAI API format, meaning that any application built with the OpenAI Python SDK can switch to DeepSeek models by changing the base_url and API key with no other code changes.

How it works

API platform

DeepSeek hosts a cloud inference API at api.deepseek.com that accepts requests in the OpenAI Chat Completions format. This compatibility layer means the integration overhead is minimal — developers familiar with the OpenAI SDK can migrate or test DeepSeek models in minutes. The platform supports streaming responses, function calling, and system prompts. Pricing is token-based and publicly listed, with rates that are typically 90–95% lower than equivalent-tier OpenAI models, making high-volume production deployments substantially cheaper.

Reasoning models (DeepSeek-R1)

DeepSeek-R1 is trained using a multi-stage process that incorporates reinforcement learning to reward the model for producing correct final answers — crucially, without relying on supervised chain-of-thought data at the core training stage. The model generates a <think> block containing its reasoning trace before the final answer. This explicit scratchpad allows the model to perform multi-step deduction, check its work, and back-track from incorrect paths — behaviors that dramatically improve performance on math olympiad problems, formal logic, and complex coding tasks that require planning across many steps.

Code models and DeepSeek-Coder

DeepSeek's code-specialized models are pre-trained on large corpora of source code (GitHub, competitive programming platforms, documentation) and fine-tuned for instruction following on coding tasks. They support fill-in-the-middle (FIM) completion, which is the standard format used by IDE autocomplete tools like Copilot. DeepSeek-Coder achieves top performance on HumanEval, MBPP, and SWE-bench, often outperforming models several times larger from other providers. The coding capabilities are also integrated into DeepSeek-V3 and R1, so general-purpose models also perform well on code tasks.

Open-weights deployment

All major DeepSeek models have their weights released on Hugging Face under permissive licenses, enabling self-hosted inference on consumer or enterprise GPU hardware. DeepSeek-V3 uses a mixture-of-experts architecture where only a subset of parameters are activated per token, reducing inference cost significantly compared to dense models of comparable capability. Popular deployment options include vLLM, Ollama (for quantized versions), and NVIDIA NIM containers. Self-hosted deployment is particularly attractive for large-scale batch workloads, fine-tuning on proprietary data, or scenarios where all data must remain on-premises.

When to use / When NOT to use

Use when	Avoid when
Cost is a primary constraint — DeepSeek API is 90%+ cheaper than GPT-4o at comparable quality	You need a provider with an established enterprise SLA, compliance certifications (SOC 2, HIPAA), or US-based data processing
Tasks require deep multi-step reasoning: math, logic, formal proofs, complex coding	Your task is primarily multimodal — DeepSeek-V3/R1 are text-only models
You want to self-host open-weight models for data sovereignty or custom fine-tuning	You need the broadest possible plugin/tool ecosystem and third-party integrations
Building high-volume batch pipelines where per-token cost reduction compounds significantly	Latency-critical consumer applications where R1's reasoning trace adds response time
Code generation, code review, or debugging are your primary use cases	You are in a jurisdiction with regulatory requirements around AI model origin

Comparisons

Criteria	DeepSeek (V3 / R1)	OpenAI (GPT-4o / o1)	Meta / Llama
Reasoning performance	R1 competitive with o1 on math/logic benchmarks	o1 is top-tier; GPT-4o strong on general reasoning	Llama 3.x competitive but below R1/o1 on hard reasoning
General chat quality	V3 competitive with GPT-4o	GPT-4o best-in-class general quality	Llama 3.3 70B competitive for size
Open weights	Yes (all models on Hugging Face)	No (proprietary only)	Yes (Meta open-sources Llama)
API cost	Very low (~$0.27/M input tokens for V3)	High (~$2.50/M for GPT-4o input)	Free (self-host); Fireworks/Together API affordable
Ecosystem & integrations	Growing; OpenAI-compatible API eases adoption	Largest ecosystem, most integrations	Large open-source ecosystem
Data sovereignty	Self-host possible; API data processed in China	Azure OpenAI for US-region processing	Full self-host possible
Multimodal	Text only (V3/R1)	Yes (GPT-4o, DALL-E)	Llama 3.2 has vision capabilities

Pros and cons

Pros	Cons
Dramatically lower API cost than OpenAI/Anthropic	API data routed through Chinese servers — concern for some regulated industries
R1 delivers frontier-level reasoning performance	R1 reasoning traces add latency and token usage
OpenAI-compatible API — near-zero switching cost	Smaller trust/brand recognition in Western enterprise sales cycles
Open weights allow self-hosting and fine-tuning	V3/R1 are text-only; no native image or audio capabilities
Strong code generation across most mainstream languages	Community and documentation primarily in Chinese; English resources still catching up

Code examples

Chat completion with DeepSeek-V3 (OpenAI-compatible)

from openai import OpenAI

# DeepSeek uses the OpenAI SDK with a custom base_url
client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-chat",  # maps to DeepSeek-V3
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant."},
        {"role": "user", "content": "Explain the difference between MoE and dense transformer architectures."},
    ],
    temperature=0.7,
    max_tokens=1024,
)

print(response.choices[0].message.content)

Reasoning with DeepSeek-R1 (chain-of-thought)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
    model="deepseek-reasoner",  # maps to DeepSeek-R1
    messages=[
        {
            "role": "user",
            "content": (
                "A train leaves City A at 08:00 and travels at 120 km/h. "
                "Another train leaves City B (300 km away) at 09:00 and travels "
                "toward City A at 80 km/h. At what time do they meet?"
            ),
        }
    ],
)

# R1 exposes the reasoning trace in reasoning_content
message = response.choices[0].message
if hasattr(message, "reasoning_content") and message.reasoning_content:
    print("=== Reasoning trace ===")
    print(message.reasoning_content)
    print()

print("=== Final answer ===")
print(message.content)

Streaming response with DeepSeek-V3

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_DEEPSEEK_API_KEY",
    base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "user", "content": "Write a Python function that implements binary search."},
    ],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
print()

Self-hosted inference with vLLM

# Start vLLM server (run in terminal):
# vllm serve deepseek-ai/DeepSeek-V3 --tensor-parallel-size 4 --port 8000

from openai import OpenAI

# Point to your local vLLM server instead of DeepSeek cloud
client = OpenAI(
    api_key="not-needed",  # vLLM does not require a real key
    base_url="http://localhost:8000/v1",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V3",
    messages=[
        {"role": "user", "content": "Summarize the key advantages of mixture-of-experts models."},
    ],
)

print(response.choices[0].message.content)

Practical resources

DeepSeek API documentation — Official reference for the DeepSeek platform API including models, parameters, and pricing
DeepSeek GitHub — Open-source repositories for DeepSeek models, training code, and research papers
DeepSeek-R1 on Hugging Face — Model card with weights, benchmark results, and deployment instructions
DeepSeek-V3 technical report — Research paper detailing the V3 architecture, training approach, and benchmark comparisons
vLLM DeepSeek deployment guide — Instructions for self-hosting DeepSeek models with vLLM for production inference

Definition​

How it works​

API platform​

Reasoning models (DeepSeek-R1)​

Code models and DeepSeek-Coder​

Open-weights deployment​

When to use / When NOT to use​

Comparisons​

Pros and cons​

Code examples​

Chat completion with DeepSeek-V3 (OpenAI-compatible)​

Reasoning with DeepSeek-R1 (chain-of-thought)​

Streaming response with DeepSeek-V3​

Self-hosted inference with vLLM​

Practical resources​

See also​