Case study: ChatGPT
定义
ChatGPT is a 对话式模型家族 LLMs 来自 OpenAI. 它们是 trained with supervised fine-tuning and reinforcement learning from human feedback (RLHF) to follow instructions and converse safely.
它们展示了完整的 LLM stack: pretrained base model, 指令调优, and RL-based alignment (RLHF). The same ideas (指令调优, preference optimization) appear in open and other proprietary models. Use case: chat, prompt-driven tasks, and agent-like workflows with tools.
工作原理
Start from a base model (例如 GPT-4): a decoder-only transformer 预训练于 下一个 token 预测. Instruction tuning: fine-tune on (instruction, response) pairs so the model follows user intent. RLHF: train a reward model on human preference data (which of two responses is better); then optimize the policy (the LLM) with reinforcement learning (例如 PPO) to maximize the reward. 结果是一个 model that is helpful, follows instructions, and is less likely to produce harmful or off-policy content. Safety and guardrails (content filters, refusals, monitoring) are applied in the product. Prompt engineering and RAG or agents extend the system for specific use cases.
应用场景
ChatGPT-style systems fit chat, writing, code help, and task automation that benefit from instruction-following and tool use.
- Conversational assistants and customer support
- Writing, summarization, and brainstorming
- Code help, tutoring, and task automation via chat
外部文档
- OpenAI – ChatGPT and models
- InstructGPT (Ouyang et al.) — RLHF and 指令调优
另请参阅
- LLMs
- Reinforcement learning
- Prompt engineering
- Claude — Comparable conversational LLM
- Gemini — Multimodal LLM family