Case study: ChatGPT
Definición
ChatGPT es una familia de LLMs de OpenAI. Son trained with supervised fine-tuning supervisado y aprendizaje por refuerzo a partir de retroalimentación humana (RLHF) to follow instructions and converse safely.
Ilustran el stack completo de LLM stack: modelo base preentrenado, ajuste de instrucciones y RL-alineamiento basado en (RLHF). Las mismas ideas (ajuste de instrucciones, preference optimization) appear in open and other proprietary models. Use case: chat, prompt-driven tasks, and agent-like workflows with tools.
Cómo funciona
Start from a base model (por ej. GPT-4): a decoder-only transformer preentrenado en predicción del siguiente token. Instruction tuning: fine-tune on (instruction, response) pairs so the model follows user intent. RLHF: train a reward model on human preference data (which of two responses is better); then optimize the policy (the LLM) with reinforcement learning (por ej. PPO) to maximize the reward. El resultado es un model that is helpful, follows instructions, and is less likely to produce harmful or off-policy content. Safety and guardrails (content filters, refusals, monitoring) are applied in the product. Prompt engineering and RAG or agents extend the system for specific use cases.
Casos de uso
ChatGPT-style systems fit chat, writing, code help, and task automation that benefit from instruction-following and tool use.
- Conversational assistants and customer support
- Writing, summarization, and brainstorming
- Code help, tutoring, and task automation via chat
Documentación externa
- OpenAI – ChatGPT and models
- InstructGPT (Ouyang et al.) — RLHF and ajuste de instrucciones
Ver también
- LLMs
- Reinforcement learning
- Prompt engineering
- Claude — Comparable conversational LLM
- Gemini — Multimodal LLM family