Case study: ChatGPT

Definición

ChatGPT es una familia de LLMs de OpenAI. Son trained with supervised fine-tuning supervisado y aprendizaje por refuerzo a partir de retroalimentación humana (RLHF) to follow instructions and converse safely.

Ilustran el stack completo de LLM stack: modelo base preentrenado, ajuste de instrucciones y RL-alineamiento basado en (RLHF). Las mismas ideas (ajuste de instrucciones, preference optimization) appear in open and other proprietary models. Use case: chat, prompt-driven tasks, and agent-like workflows with tools.

Cómo funciona

Start from a base model (por ej. GPT-4): a decoder-only transformer preentrenado en predicción del siguiente token. Instruction tuning: fine-tune on (instruction, response) pairs so the model follows user intent. RLHF: train a reward model on human preference data (which of two responses is better); then optimize the policy (the LLM) with reinforcement learning (por ej. PPO) to maximize the reward. El resultado es un model that is helpful, follows instructions, and is less likely to produce harmful or off-policy content. Safety and guardrails (content filters, refusals, monitoring) are applied in the product. Prompt engineering and RAG or agents extend the system for specific use cases.

Casos de uso

ChatGPT-style systems fit chat, writing, code help, and task automation that benefit from instruction-following and tool use.

Conversational assistants and customer support
Writing, summarization, and brainstorming
Code help, tutoring, and task automation via chat

Documentación externa

OpenAI – ChatGPT and models
InstructGPT (Ouyang et al.) — RLHF and ajuste de instrucciones

Ver también

LLMs
Reinforcement learning
Prompt engineering
Claude — Comparable conversational LLM
Gemini — Multimodal LLM family

Definición​

Cómo funciona​

Casos de uso​

Documentación externa​

Ver también​

Definición

Cómo funciona

Casos de uso

Documentación externa

Ver también