Model providers

Definition

A model provider is an organization that offers access to large language models, either through hosted APIs, downloadable open weights, or both. The choice of provider shapes your application's capabilities, cost structure, data privacy posture, and deployment flexibility. Understanding the provider landscape is a prerequisite for any production AI system.

The market divides into three categories. API-based providers like OpenAI, Anthropic, and Google offer models exclusively through managed APIs — you send requests, they handle inference infrastructure. Open-weights providers like Meta and Mistral release model weights that you can download and run on your own hardware or through third-party hosting. Hybrid providers like Mistral and DeepSeek offer both open-weights models and commercial API access, giving developers flexibility to choose based on their needs.

Choosing a provider involves tradeoffs across multiple dimensions: model quality, pricing, context window size, multimodal capabilities, data privacy, fine-tuning support, and ecosystem maturity. No single provider dominates across all criteria, which is why most production systems evaluate multiple options and sometimes use different providers for different tasks within the same application.

How it works

API-based providers

API providers host models on their infrastructure and expose them through REST APIs. You authenticate with an API key, send a request with your prompt and configuration parameters, and receive a response. The provider handles scaling, GPU allocation, model updates, and uptime. This is the simplest path to production — no infrastructure to manage — but you send your data to a third party and pay per token.

Open-weights providers

Open-weights providers release model files (typically on Hugging Face) that you download and run locally or on your cloud infrastructure. You control the full stack: hardware selection, quantization, serving framework (vLLM, TGI, llama.cpp), and scaling. This gives maximum privacy and customization but requires ML infrastructure expertise. Third-party inference providers (Together AI, Groq, Fireworks) offer a middle ground — they host open models with an API interface.

Choosing a provider

The decision tree depends on your constraints. Start with your requirements — data privacy, budget, latency, model quality — and narrow from there. Many teams start with API providers for prototyping and evaluate open-weights alternatives for production cost optimization or data sovereignty requirements.

When to use / When NOT to use

Use when	Avoid when
API providers: rapid prototyping, no ML infra team, need cutting-edge models immediately	Data cannot leave your infrastructure (regulated industries, PII)
Open-weights: data privacy requirements, need fine-tuning control, high-volume cost optimization	You lack GPU infrastructure and ML ops expertise
Third-party hosted open models: want open model flexibility without managing infrastructure	You need guaranteed SLAs and enterprise support (use first-party APIs)
Multiple providers: different tasks have different quality/cost requirements	Your use case is simple enough that one provider covers everything

Comparisons

Criteria	OpenAI	Anthropic	Google Gemini	Meta Llama	Mistral	Cohere	DeepSeek
Model access	API only	API only	API + Vertex AI	Open weights	Open + API	API only	Open + API
Top model tier	GPT-4o, o3	Claude Opus/Sonnet	Gemini Ultra/Pro	Llama 3.1 405B	Mistral Large	Command R+	DeepSeek-V3
Context window	128K	200K	1M+	128K	128K	128K	128K
Multimodal	Vision, audio, image gen	Vision	Vision, audio, video	Vision (3.2)	Vision	Text-focused	Text-focused
Specialty	General-purpose, ecosystem	Safety, long context	Multimodal, search grounding	Open-weights, customization	Efficiency, multilingual	Embeddings, RAG, reranking	Reasoning, cost efficiency
Fine-tuning	API fine-tuning	Not available	Vertex AI tuning	Full weight access	API fine-tuning	Not available	Full weight access
Pricing model	Per token	Per token	Per token + free tier	Free (self-host) or third-party	Per token + free models	Per token	Per token (very low cost)

Code examples

Side-by-side API calls (Python)

# OpenAI
from openai import OpenAI

openai_client = OpenAI()
openai_response = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print("OpenAI:", openai_response.choices[0].message.content)

# Anthropic
import anthropic

anthropic_client = anthropic.Anthropic()
anthropic_response = anthropic_client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=256,
    messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print("Anthropic:", anthropic_response.content[0].text)

# Google Gemini
import google.generativeai as genai

model = genai.GenerativeModel("gemini-1.5-pro")
gemini_response = model.generate_content("Explain RAG in one sentence.")
print("Gemini:", gemini_response.text)

Unified interface with LiteLLM (Python)

from litellm import completion

# Same interface, different providers
providers = {
    "OpenAI": "gpt-4o",
    "Anthropic": "claude-sonnet-4-20250514",
    "Gemini": "gemini/gemini-1.5-pro",
}

for name, model in providers.items():
    response = completion(
        model=model,
        messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
    )
    print(f"{name}: {response.choices[0].message.content}")

Practical resources

Artificial Analysis — Independent LLM benchmarks and pricing comparison
LiteLLM — Unified API for 100+ LLM providers
OpenRouter — Single API gateway to multiple providers
Hugging Face Open LLM Leaderboard — Open model benchmarks
LMSYS Chatbot Arena — Crowdsourced LLM rankings via blind human evaluation

Definition​

How it works​

API-based providers​

Open-weights providers​

Choosing a provider​

When to use / When NOT to use​

Comparisons​

Code examples​

Side-by-side API calls (Python)​

Unified interface with LiteLLM (Python)​

Practical resources​

See also​