Skip to main content

Structured outputs

Definition

Structured outputs refers to the practice of constraining or guiding an LLM to produce machine-readable data — most commonly JSON — rather than free-form prose. In a production pipeline, the gap between an LLM that returns a correct answer and one that returns a correct answer in a parsable format is the gap between a toy demo and a deployable system. A downstream service that needs to extract a product name, a sentiment label, or a list of action items cannot reliably operate on unstructured text; it needs a guaranteed shape it can deserialize, validate, and route.

The evolution of structured output techniques tracks the maturation of LLM APIs. Early systems relied on fragile prompt instructions ("respond only with valid JSON") combined with regex parsing and retry loops. This approach broke whenever the model added an explanatory preamble, wrapped the JSON in a markdown code block, or subtly violated the schema under edge cases. The next generation introduced function calling (OpenAI, mid-2023) and tool use (Anthropic), which move the schema definition out of the prompt and into a first-class API parameter, allowing the model to be explicitly trained and constrained on the output contract. Most recently, providers introduced strict grammar-constrained decoding that makes schema compliance a hard guarantee at the token level, not a soft prompt instruction.

Understanding which technique to apply — and why — matters for anyone building pipelines that depend on LLM output. JSON mode is the simplest entry point but provides no schema validation. Function calling / tool use provides a typed schema and structured parsing in the API response, but requires defining tool schemas upfront. Pydantic-based extraction libraries (Instructor, LangChain output parsers) sit above the API layer and add Python-level validation, automatic retry on schema violations, and ergonomic model definition. The right choice depends on the complexity of the target schema, the criticality of validation, and how much retry/correction logic you want the library to handle for you.

How it works

JSON mode

JSON mode is the most basic structured output mechanism. When enabled, the model is constrained to produce only valid JSON as its top-level output. In OpenAI's API this is activated by setting response_format={"type": "json_object"} on the request; in Anthropic's API a similar effect can be achieved by prefilling the assistant turn with {. JSON mode guarantees syntactic validity (the output can always be parsed by json.loads), but it does not validate against any schema — the model might return {"result": "yes"} when you expected {"score": 0.87, "label": "positive", "confidence": 0.92}. You must add schema validation (e.g. with Pydantic or jsonschema) as a separate step, and implement retry logic for schema mismatches. JSON mode is best suited for simple, flat structures where the risk of schema drift is low.

Function calling and tool use

Function calling (OpenAI) and tool use (Anthropic) represent a qualitative step forward. Instead of embedding the output schema in the system prompt, you declare it as a tool or function definition with a JSON Schema object. The API returns the model's output as a structured tool_use block with a parsed input dict, separate from any text content. This decoupling is significant: the text and the structured data live in different parts of the response, and the API itself handles the JSON parsing. You get type annotations for every field, required vs. optional field semantics, enum constraints, and nested object support — all enforced by the schema at the API level. OpenAI's strict mode (2024) goes further by enabling constrained decoding, making schema adherence a hard guarantee. Tool use is the right choice for extracting structured data from documents, populating database records, or driving downstream API calls with typed arguments.

Schema-based extraction with Pydantic

Libraries like Instructor and LangChain's output parsers wrap the function calling / tool use API with a Pydantic-first interface. You define your output schema as a pydantic.BaseModel subclass and pass the model class to the library; it automatically generates the JSON Schema for the tool definition, calls the API, validates the response against your model, and retries with validation error feedback if the schema is violated. This approach is the most ergonomic for Python practitioners because the output is a fully typed Python object — not a raw dict — with field validation, default values, and nested model support. Automatic retry with error context dramatically reduces the rate of silent schema violations. The cost is an additional library dependency and slightly more token usage when validation errors trigger retry messages.

When to use / When NOT to use

Use whenAvoid when
The LLM output must be consumed programmatically (API response, DB insert, workflow trigger)The output is read only by humans and no downstream parsing is needed
You need a typed, validated Python object rather than a raw stringThe schema is so simple (single string or number) that plain text is easier to parse
Building pipelines where schema violations would cause silent data corruptionLatency is extremely tight and you cannot afford the overhead of retry loops
The extraction involves nested structures, arrays, or enum-constrained fieldsYou are in early prototyping and the output schema is not yet stable
You need reproducible, testable extraction behavior across model versionsThe model you are using has poor support for tool use / function calling

Code examples

OpenAI — JSON mode with Pydantic validation

# Structured extraction with OpenAI JSON mode + Pydantic validation
# pip install openai pydantic

import json, os
from pydantic import BaseModel, ValidationError
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


class SentimentResult(BaseModel):
label: str # "positive" | "negative" | "neutral"
score: float # 0.0 - 1.0
key_phrases: list[str]


def extract_sentiment(text: str, max_retries: int = 3) -> SentimentResult:
system = (
"You are a sentiment analysis engine. Respond ONLY with valid JSON: "
'{"label": "positive"|"negative"|"neutral", "score": <float>, "key_phrases": [...]}'
)
for attempt in range(max_retries):
resp = client.chat.completions.create(
model="gpt-4o-mini",
response_format={"type": "json_object"},
messages=[{"role": "system", "content": system},
{"role": "user", "content": f"Analyze: {text}"}],
temperature=0,
)
try:
return SentimentResult(**json.loads(resp.choices[0].message.content))
except (json.JSONDecodeError, ValidationError) as e:
if attempt == max_retries - 1:
raise RuntimeError(f"Validation failed: {e}") from e
raise RuntimeError("Unreachable")


if __name__ == "__main__":
r = extract_sentiment("The model is fast, but docs leave much to be desired.")
print(r.label, r.score, r.key_phrases)

OpenAI — function calling with strict schema

# Structured extraction with OpenAI function calling (strict mode)
# pip install openai

import os, json
from openai import OpenAI

client = OpenAI(api_key=os.environ["OPENAI_API_KEY"])

TOOL = {
"type": "function",
"function": {
"name": "extract_product_info",
"description": "Extract structured product info from a description.",
"strict": True,
"parameters": {
"type": "object",
"properties": {
"product_name": {"type": "string"},
"price_usd": {"type": "number"},
"features": {"type": "array", "items": {"type": "string"}},
"in_stock": {"type": "boolean"},
},
"required": ["product_name", "price_usd", "features", "in_stock"],
"additionalProperties": False,
},
},
}


def extract_product(description: str) -> dict:
resp = client.chat.completions.create(
model="gpt-4o",
tools=[TOOL],
tool_choice={"type": "function", "function": {"name": "extract_product_info"}},
messages=[{"role": "system", "content": "Extract product information."},
{"role": "user", "content": description}],
temperature=0,
)
return json.loads(resp.choices[0].message.tool_calls[0].function.arguments)


if __name__ == "__main__":
desc = ("AcmePro X200 headphones — ships now at $149.99. "
"Features: 40-hour battery, ANC, USB-C charging.")
print(json.dumps(extract_product(desc), indent=2))

Anthropic — tool use for structured extraction

# Structured extraction with Anthropic tool use
# pip install anthropic pydantic

import os
from pydantic import BaseModel
import anthropic

client = anthropic.Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])

TOOL = {
"name": "extract_meeting_notes",
"description": "Extract structured meeting notes. Always call this tool.",
"input_schema": {
"type": "object",
"properties": {
"summary": {"type": "string"},
"action_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"owner": {"type": "string"},
"task": {"type": "string"},
"due_date": {"type": "string"},
},
"required": ["owner", "task", "due_date"],
},
},
"decisions": {"type": "array", "items": {"type": "string"}},
},
"required": ["summary", "action_items", "decisions"],
},
}


class ActionItem(BaseModel):
owner: str
task: str
due_date: str | None


class MeetingNotes(BaseModel):
summary: str
action_items: list[ActionItem]
decisions: list[str]


def extract_meeting_notes(transcript: str) -> MeetingNotes:
resp = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=[TOOL],
tool_choice={"type": "tool", "name": "extract_meeting_notes"},
messages=[{"role": "user", "content": f"Extract notes:\n\n{transcript}"}],
)
for block in resp.content:
if block.type == "tool_use":
return MeetingNotes(**block.input)
raise RuntimeError("No tool_use block")


if __name__ == "__main__":
notes = extract_meeting_notes("""
Alice: New pricing model starts Q3. Bob: I'll update the pricing page by June 15.
Carol: I'll brief legal by end of week. Alice: We dropped the free tier.
""")
print("Summary:", notes.summary)
print("Decisions:", notes.decisions)
for item in notes.action_items:
print(f" [{item.owner}] {item.task} — due {item.due_date}")

Comparisons

CriterionJSON modeFunction calling / Tool usePydantic-based (Instructor)
Schema enforcementSyntactic only (valid JSON, no schema)Structural (fields, types, required)Structural + semantic (validators, field constraints)
API surfaceresponse_format parametertools + tool_choice parametersLibrary wrapper over tools
Output typeRaw string requiring json.loadsParsed dict in tool call argumentsTyped Pydantic model instance
Retry on failureManual — must implement yourselfManualAutomatic — library handles retry with error context
Nested schemasPossible but error-proneWell-supported via JSON SchemaFirst-class via nested BaseModel
Best forSimple, flat structures; quick prototypingProduction extraction and typed API dispatchComplex schemas with Python-level validation needs

Practical resources

See also