Model providers
Definition
A model provider is an organization that offers access to large language models, either through hosted APIs, downloadable open weights, or both. The choice of provider shapes your application's capabilities, cost structure, data privacy posture, and deployment flexibility. Understanding the provider landscape is a prerequisite for any production AI system.
The market divides into three categories. API-based providers like OpenAI, Anthropic, and Google offer models exclusively through managed APIs — you send requests, they handle inference infrastructure. Open-weights providers like Meta and Mistral release model weights that you can download and run on your own hardware or through third-party hosting. Hybrid providers like Mistral and DeepSeek offer both open-weights models and commercial API access, giving developers flexibility to choose based on their needs.
Choosing a provider involves tradeoffs across multiple dimensions: model quality, pricing, context window size, multimodal capabilities, data privacy, fine-tuning support, and ecosystem maturity. No single provider dominates across all criteria, which is why most production systems evaluate multiple options and sometimes use different providers for different tasks within the same application.
How it works
API-based providers
API providers host models on their infrastructure and expose them through REST APIs. You authenticate with an API key, send a request with your prompt and configuration parameters, and receive a response. The provider handles scaling, GPU allocation, model updates, and uptime. This is the simplest path to production — no infrastructure to manage — but you send your data to a third party and pay per token.
Open-weights providers
Open-weights providers release model files (typically on Hugging Face) that you download and run locally or on your cloud infrastructure. You control the full stack: hardware selection, quantization, serving framework (vLLM, TGI, llama.cpp), and scaling. This gives maximum privacy and customization but requires ML infrastructure expertise. Third-party inference providers (Together AI, Groq, Fireworks) offer a middle ground — they host open models with an API interface.
Choosing a provider
The decision tree depends on your constraints. Start with your requirements — data privacy, budget, latency, model quality — and narrow from there. Many teams start with API providers for prototyping and evaluate open-weights alternatives for production cost optimization or data sovereignty requirements.
When to use / When NOT to use
| Use when | Avoid when |
|---|---|
| API providers: rapid prototyping, no ML infra team, need cutting-edge models immediately | Data cannot leave your infrastructure (regulated industries, PII) |
| Open-weights: data privacy requirements, need fine-tuning control, high-volume cost optimization | You lack GPU infrastructure and ML ops expertise |
| Third-party hosted open models: want open model flexibility without managing infrastructure | You need guaranteed SLAs and enterprise support (use first-party APIs) |
| Multiple providers: different tasks have different quality/cost requirements | Your use case is simple enough that one provider covers everything |
Comparisons
| Criteria | OpenAI | Anthropic | Google Gemini | Meta Llama | Mistral | Cohere | DeepSeek |
|---|---|---|---|---|---|---|---|
| Model access | API only | API only | API + Vertex AI | Open weights | Open + API | API only | Open + API |
| Top model tier | GPT-4o, o3 | Claude Opus/Sonnet | Gemini Ultra/Pro | Llama 3.1 405B | Mistral Large | Command R+ | DeepSeek-V3 |
| Context window | 128K | 200K | 1M+ | 128K | 128K | 128K | 128K |
| Multimodal | Vision, audio, image gen | Vision | Vision, audio, video | Vision (3.2) | Vision | Text-focused | Text-focused |
| Specialty | General-purpose, ecosystem | Safety, long context | Multimodal, search grounding | Open-weights, customization | Efficiency, multilingual | Embeddings, RAG, reranking | Reasoning, cost efficiency |
| Fine-tuning | API fine-tuning | Not available | Vertex AI tuning | Full weight access | API fine-tuning | Not available | Full weight access |
| Pricing model | Per token | Per token | Per token + free tier | Free (self-host) or third-party | Per token + free models | Per token | Per token (very low cost) |
Code examples
Side-by-side API calls (Python)
# OpenAI
from openai import OpenAI
openai_client = OpenAI()
openai_response = openai_client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print("OpenAI:", openai_response.choices[0].message.content)
# Anthropic
import anthropic
anthropic_client = anthropic.Anthropic()
anthropic_response = anthropic_client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=256,
messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print("Anthropic:", anthropic_response.content[0].text)
# Google Gemini
import google.generativeai as genai
model = genai.GenerativeModel("gemini-1.5-pro")
gemini_response = model.generate_content("Explain RAG in one sentence.")
print("Gemini:", gemini_response.text)
Unified interface with LiteLLM (Python)
from litellm import completion
# Same interface, different providers
providers = {
"OpenAI": "gpt-4o",
"Anthropic": "claude-sonnet-4-20250514",
"Gemini": "gemini/gemini-1.5-pro",
}
for name, model in providers.items():
response = completion(
model=model,
messages=[{"role": "user", "content": "Explain RAG in one sentence."}],
)
print(f"{name}: {response.choices[0].message.content}")
Practical resources
- Artificial Analysis — Independent LLM benchmarks and pricing comparison
- LiteLLM — Unified API for 100+ LLM providers
- OpenRouter — Single API gateway to multiple providers
- Hugging Face Open LLM Leaderboard — Open model benchmarks
- LMSYS Chatbot Arena — Crowdsourced LLM rankings via blind human evaluation