OpenAI

Definition

OpenAI is an AI research company and developer platform headquartered in San Francisco. Founded in 2015 and widely known for releasing ChatGPT in late 2022, OpenAI operates one of the most widely used model APIs in the industry. The platform gives developers programmatic access to a family of models spanning language, vision, audio, and image generation — making it a one-stop shop for most generative AI use cases.

The OpenAI model lineup as of 2025 includes: GPT-4o (flagship multimodal model handling text, images, and audio in a single model), GPT-4o-mini (cost-optimized variant for high-volume tasks), the o-series reasoning models — o1, o1-mini, o3, and o3-mini — which use extended chain-of-thought reasoning for math, coding, and complex analysis, DALL-E 3 for text-to-image generation, Whisper for speech-to-text transcription, and TTS (text-to-speech) for audio synthesis. Embeddings models (text-embedding-3-small and text-embedding-3-large) power semantic search and RAG pipelines.

From a platform perspective, OpenAI offers a tiered API with usage-based pricing, a Playground for interactive testing, a Batch API for async bulk inference at 50% cost reduction, fine-tuning for GPT-4o-mini and GPT-3.5-turbo, an Assistants API for stateful agent-style interactions, and an Evals framework for systematic model evaluation. The Python SDK (openai) and a TypeScript/Node.js SDK are the primary client libraries, and the API format has become a de facto standard that other providers (Mistral, Together, Groq) partially mirror.

How it works

Chat completions API

The chat completions endpoint (POST /v1/chat/completions) is the core of the OpenAI platform. You send an array of messages with roles (system, user, assistant) and receive a completion. The system message sets the assistant's persona and constraints; user messages carry user input; assistant messages represent prior model turns for multi-turn conversation. Streaming is supported via server-sent events so the response can be displayed token-by-token. Temperature and top-p control response randomness; max_tokens caps output length.

Function calling and tools

Function calling (also called "tool use") lets the model invoke external tools by outputting structured JSON instead of free text. You declare tool schemas in the request; the model decides when to call a tool and populates its arguments. Your code executes the function and returns the result as a tool message; the model then uses that result to produce its final answer. This is the foundation of most agent frameworks: the model acts as a reasoning and routing layer while actual computation happens in your code.

Embeddings API

The embeddings endpoint (POST /v1/embeddings) converts text into dense numerical vectors. These vectors encode semantic meaning: similar texts produce similar vectors. text-embedding-3-large (3072 dimensions) delivers the best retrieval quality; text-embedding-3-small (1536 dimensions) is faster and cheaper. Embeddings are the backbone of RAG pipelines: you embed documents at index time and embed queries at search time, then retrieve documents by cosine similarity.

Image and audio APIs

DALL-E 3 (POST /v1/images/generations) generates images from text prompts. You specify size (1024×1024, 1792×1024, or 1024×1792), quality (standard or HD), and style (vivid or natural). Whisper (POST /v1/audio/transcriptions) transcribes audio files with high accuracy across 57+ languages. TTS (POST /v1/audio/speech) converts text to natural-sounding speech with six built-in voices. These APIs share the same authentication and billing model as the text APIs, making it straightforward to build multimodal pipelines in a single application.

When to use / When NOT to use

Use OpenAI when	Avoid or consider alternatives when
You need the broadest ecosystem: libraries, tutorials, and community support all default to OpenAI	Your workload involves highly sensitive or regulated data and you cannot send it to a third-party US provider
You want multimodal support (text + image + audio) from a single vendor	You need to fine-tune deeply or control every aspect of the model — open-weights models offer more flexibility
You need advanced reasoning on math, code, or logic problems (o1, o3 series)	Costs at scale are prohibitive — at very high token volumes, open-weights hosting often beats per-token pricing
You are building function-calling or agent workflows — OpenAI's structured outputs and tool calling are mature	You need a non-English-first experience — Qwen or Mistral may outperform on certain languages
You want the Assistants API for stateful, file-enabled agents without building the state layer yourself	You need reproducible deterministic outputs from a frozen model version — OpenAI updates models on a rolling basis

Comparisons

Criteria	OpenAI	Anthropic	Google Gemini
Flagship model	GPT-4o	Claude 3.7 Sonnet / Opus	Gemini 2.5 Pro
Reasoning model	o3, o1	Extended thinking (Claude 3.7)	Gemini 2.5 Pro (thinking)
Context window	128K (GPT-4o), 200K (o1)	200K	Up to 1M (Gemini 1.5 Pro)
Multimodal input	Text, image, audio, video	Text, image	Text, image, audio, video, code
Open-weights option	No	No	Gemma (partial)
Function / tool calling	Mature, widely adopted	Strong, with computer use	Mature, Google ecosystem
Pricing (flagship)	~$2.50/1M input tokens (GPT-4o)	~$3/1M input tokens (Sonnet)	~$1.25/1M input tokens (Gemini 1.5 Pro)
Safety approach	Moderation API, usage policies	Constitutional AI, refusal tuning	Responsible AI guidelines
Data residency	US (default), enterprise options	US (default), enterprise options	Multi-region, Google Cloud
Best for	Broadest ecosystem, agent tooling, reasoning	Long docs, safety, nuanced instruction	Long context, multimodal, Google Cloud users

Pros and cons

Pros	Cons
Industry-standard API adopted by most frameworks and libraries	Closed model — no visibility into weights or training data
Widest model lineup: language, vision, audio, image in one platform	US-hosted by default; data residency is limited
o-series reasoning models excel at math, code, and logic	Pricing can be high at scale compared to self-hosted open models
Strong ecosystem: cookbook, evals, fine-tuning, batch API	Model versions change on a rolling basis — behavior may shift without notice
Reliable rate limits and enterprise SLAs	No true open-weights offering

Code examples

Chat completion with streaming

from openai import OpenAI

client = OpenAI(api_key="sk-...")  # or set OPENAI_API_KEY env var

# Basic completion
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a concise technical assistant."},
        {"role": "user", "content": "Explain embeddings in two sentences."},
    ],
    temperature=0.2,
    max_tokens=256,
)
print(response.choices[0].message.content)

# Streaming response
with client.chat.completions.stream(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Function calling

import json
from openai import OpenAI

client = OpenAI()

# Define a tool schema
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                    "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]},
                },
                "required": ["city"],
            },
        },
    }
]

messages = [{"role": "user", "content": "What's the weather in Tokyo?"}]

# First call — model may decide to call a tool
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools,
    tool_choice="auto",
)

msg = response.choices[0].message
messages.append(msg)

# If the model called a tool, execute it and return the result
if msg.tool_calls:
    for tc in msg.tool_calls:
        args = json.loads(tc.function.arguments)
        # Simulated function execution
        result = {"city": args["city"], "temp": "18°C", "condition": "Partly cloudy"}
        messages.append({
            "role": "tool",
            "tool_call_id": tc.id,
            "content": json.dumps(result),
        })

# Second call — model uses the tool result to answer
final = client.chat.completions.create(model="gpt-4o", messages=messages)
print(final.choices[0].message.content)

Embeddings for semantic search

from openai import OpenAI
import numpy as np

client = OpenAI()

def embed(texts: list[str], model: str = "text-embedding-3-small") -> list[list[float]]:
    response = client.embeddings.create(input=texts, model=model)
    return [item.embedding for item in response.data]

# Index documents
docs = [
    "Python is a high-level programming language.",
    "OpenAI provides a REST API for language models.",
    "RAG combines retrieval with generation.",
]
doc_vectors = embed(docs)

# Query
query = "How do I call OpenAI from Python?"
query_vector = embed([query])[0]

# Cosine similarity
def cosine_sim(a, b):
    a, b = np.array(a), np.array(b)
    return float(np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b)))

scores = [(cosine_sim(query_vector, dv), doc) for dv, doc in zip(doc_vectors, docs)]
scores.sort(reverse=True)
for score, doc in scores:
    print(f"{score:.3f}  {doc}")

Practical resources

OpenAI API reference — Complete endpoint documentation with request/response schemas
OpenAI pricing — Per-token pricing for all models including batch discounts
OpenAI Cookbook — Practical examples covering function calling, RAG, fine-tuning, evals, and more
OpenAI models overview — Model IDs, context windows, capabilities, and deprecation timelines
OpenAI Python SDK on GitHub — Source, changelog, and migration guides

Definition​

How it works​

Chat completions API​

Function calling and tools​

Embeddings API​

Image and audio APIs​

When to use / When NOT to use​

Comparisons​

Pros and cons​

Code examples​

Chat completion with streaming​

Function calling​

Embeddings for semantic search​

Practical resources​

See also​