Skip to main content

DeepSeek

Definition

DeepSeek is a Chinese AI research lab and commercial platform that has gained significant international attention for producing models that achieve performance competitive with the best proprietary models while releasing the weights openly and operating at a fraction of the cost. Founded in 2023 as a subsidiary of High-Flyer (a quantitative hedge fund), DeepSeek's approach is characterized by rigorous research into training efficiency — including innovations in mixture-of-experts (MoE) architectures, reinforcement learning from human feedback, and novel approaches to reasoning that do not rely on massive compute budgets.

The model lineup spans three major capability areas. DeepSeek-V3 is a general-purpose chat and instruction-following model that rivals GPT-4o and Claude 3.5 Sonnet on standard benchmarks while being dramatically cheaper to access via API. DeepSeek-R1 is a dedicated reasoning model that uses extended chain-of-thought (CoT) — the model generates explicit reasoning traces before producing a final answer — making it particularly strong on mathematics, logical deduction, and multi-step problem solving. DeepSeek-Coder (and its successor variants integrated into V3/R1) specializes in code generation, completion, and debugging across a wide range of programming languages.

DeepSeek's open-weights approach means that all major models are available on Hugging Face and can be self-hosted on your own infrastructure — a critical capability for organizations with data sovereignty requirements or those seeking to avoid per-token API costs at scale. The DeepSeek platform also exposes an API that is wire-compatible with the OpenAI API format, meaning that any application built with the OpenAI Python SDK can switch to DeepSeek models by changing the base_url and API key with no other code changes.

How it works

API platform

DeepSeek hosts a cloud inference API at api.deepseek.com that accepts requests in the OpenAI Chat Completions format. This compatibility layer means the integration overhead is minimal — developers familiar with the OpenAI SDK can migrate or test DeepSeek models in minutes. The platform supports streaming responses, function calling, and system prompts. Pricing is token-based and publicly listed, with rates that are typically 90–95% lower than equivalent-tier OpenAI models, making high-volume production deployments substantially cheaper.

Reasoning models (DeepSeek-R1)

DeepSeek-R1 is trained using a multi-stage process that incorporates reinforcement learning to reward the model for producing correct final answers — crucially, without relying on supervised chain-of-thought data at the core training stage. The model generates a <think> block containing its reasoning trace before the final answer. This explicit scratchpad allows the model to perform multi-step deduction, check its work, and back-track from incorrect paths — behaviors that dramatically improve performance on math olympiad problems, formal logic, and complex coding tasks that require planning across many steps.

Code models and DeepSeek-Coder

DeepSeek's code-specialized models are pre-trained on large corpora of source code (GitHub, competitive programming platforms, documentation) and fine-tuned for instruction following on coding tasks. They support fill-in-the-middle (FIM) completion, which is the standard format used by IDE autocomplete tools like Copilot. DeepSeek-Coder achieves top performance on HumanEval, MBPP, and SWE-bench, often outperforming models several times larger from other providers. The coding capabilities are also integrated into DeepSeek-V3 and R1, so general-purpose models also perform well on code tasks.

Open-weights deployment

All major DeepSeek models have their weights released on Hugging Face under permissive licenses, enabling self-hosted inference on consumer or enterprise GPU hardware. DeepSeek-V3 uses a mixture-of-experts architecture where only a subset of parameters are activated per token, reducing inference cost significantly compared to dense models of comparable capability. Popular deployment options include vLLM, Ollama (for quantized versions), and NVIDIA NIM containers. Self-hosted deployment is particularly attractive for large-scale batch workloads, fine-tuning on proprietary data, or scenarios where all data must remain on-premises.

When to use / When NOT to use

Use whenAvoid when
Cost is a primary constraint — DeepSeek API is 90%+ cheaper than GPT-4o at comparable qualityYou need a provider with an established enterprise SLA, compliance certifications (SOC 2, HIPAA), or US-based data processing
Tasks require deep multi-step reasoning: math, logic, formal proofs, complex codingYour task is primarily multimodal — DeepSeek-V3/R1 are text-only models
You want to self-host open-weight models for data sovereignty or custom fine-tuningYou need the broadest possible plugin/tool ecosystem and third-party integrations
Building high-volume batch pipelines where per-token cost reduction compounds significantlyLatency-critical consumer applications where R1's reasoning trace adds response time
Code generation, code review, or debugging are your primary use casesYou are in a jurisdiction with regulatory requirements around AI model origin

Comparisons

CriteriaDeepSeek (V3 / R1)OpenAI (GPT-4o / o1)Meta / Llama
Reasoning performanceR1 competitive with o1 on math/logic benchmarkso1 is top-tier; GPT-4o strong on general reasoningLlama 3.x competitive but below R1/o1 on hard reasoning
General chat qualityV3 competitive with GPT-4oGPT-4o best-in-class general qualityLlama 3.3 70B competitive for size
Open weightsYes (all models on Hugging Face)No (proprietary only)Yes (Meta open-sources Llama)
API costVery low (~$0.27/M input tokens for V3)High (~$2.50/M for GPT-4o input)Free (self-host); Fireworks/Together API affordable
Ecosystem & integrationsGrowing; OpenAI-compatible API eases adoptionLargest ecosystem, most integrationsLarge open-source ecosystem
Data sovereigntySelf-host possible; API data processed in ChinaAzure OpenAI for US-region processingFull self-host possible
MultimodalText only (V3/R1)Yes (GPT-4o, DALL-E)Llama 3.2 has vision capabilities

Pros and cons

ProsCons
Dramatically lower API cost than OpenAI/AnthropicAPI data routed through Chinese servers — concern for some regulated industries
R1 delivers frontier-level reasoning performanceR1 reasoning traces add latency and token usage
OpenAI-compatible API — near-zero switching costSmaller trust/brand recognition in Western enterprise sales cycles
Open weights allow self-hosting and fine-tuningV3/R1 are text-only; no native image or audio capabilities
Strong code generation across most mainstream languagesCommunity and documentation primarily in Chinese; English resources still catching up

Code examples

Chat completion with DeepSeek-V3 (OpenAI-compatible)

from openai import OpenAI

# DeepSeek uses the OpenAI SDK with a custom base_url
client = OpenAI(
api_key="YOUR_DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
model="deepseek-chat", # maps to DeepSeek-V3
messages=[
{"role": "system", "content": "You are a helpful AI assistant."},
{"role": "user", "content": "Explain the difference between MoE and dense transformer architectures."},
],
temperature=0.7,
max_tokens=1024,
)

print(response.choices[0].message.content)

Reasoning with DeepSeek-R1 (chain-of-thought)

from openai import OpenAI

client = OpenAI(
api_key="YOUR_DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com",
)

response = client.chat.completions.create(
model="deepseek-reasoner", # maps to DeepSeek-R1
messages=[
{
"role": "user",
"content": (
"A train leaves City A at 08:00 and travels at 120 km/h. "
"Another train leaves City B (300 km away) at 09:00 and travels "
"toward City A at 80 km/h. At what time do they meet?"
),
}
],
)

# R1 exposes the reasoning trace in reasoning_content
message = response.choices[0].message
if hasattr(message, "reasoning_content") and message.reasoning_content:
print("=== Reasoning trace ===")
print(message.reasoning_content)
print()

print("=== Final answer ===")
print(message.content)

Streaming response with DeepSeek-V3

from openai import OpenAI

client = OpenAI(
api_key="YOUR_DEEPSEEK_API_KEY",
base_url="https://api.deepseek.com",
)

stream = client.chat.completions.create(
model="deepseek-chat",
messages=[
{"role": "user", "content": "Write a Python function that implements binary search."},
],
stream=True,
)

for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
print()

Self-hosted inference with vLLM

# Start vLLM server (run in terminal):
# vllm serve deepseek-ai/DeepSeek-V3 --tensor-parallel-size 4 --port 8000

from openai import OpenAI

# Point to your local vLLM server instead of DeepSeek cloud
client = OpenAI(
api_key="not-needed", # vLLM does not require a real key
base_url="http://localhost:8000/v1",
)

response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3",
messages=[
{"role": "user", "content": "Summarize the key advantages of mixture-of-experts models."},
],
)

print(response.choices[0].message.content)

Practical resources

See also