Case study: ChatGPT

定义

ChatGPT is a 对话式模型家族 LLMs 来自 OpenAI. 它们是 trained with supervised fine-tuning and reinforcement learning from human feedback (RLHF) to follow instructions and converse safely.

它们展示了完整的 LLM stack: pretrained base model, 指令调优, and RL-based alignment (RLHF). The same ideas (指令调优, preference optimization) appear in open and other proprietary models. Use case: chat, prompt-driven tasks, and agent-like workflows with tools.

工作原理

Start from a base model (例如 GPT-4): a decoder-only transformer 预训练于下一个 token 预测. Instruction tuning: fine-tune on (instruction, response) pairs so the model follows user intent. RLHF: train a reward model on human preference data (which of two responses is better); then optimize the policy (the LLM) with reinforcement learning (例如 PPO) to maximize the reward. 结果是一个 model that is helpful, follows instructions, and is less likely to produce harmful or off-policy content. Safety and guardrails (content filters, refusals, monitoring) are applied in the product. Prompt engineering and RAG or agents extend the system for specific use cases.

应用场景

ChatGPT-style systems fit chat, writing, code help, and task automation that benefit from instruction-following and tool use.

Conversational assistants and customer support
Writing, summarization, and brainstorming
Code help, tutoring, and task automation via chat

外部文档

OpenAI – ChatGPT and models
InstructGPT (Ouyang et al.) — RLHF and 指令调优

另请参阅

LLMs
Reinforcement learning
Prompt engineering
Claude — Comparable conversational LLM
Gemini — Multimodal LLM family

定义​

工作原理​

应用场景​

外部文档​

另请参阅​

定义

工作原理

应用场景

外部文档

另请参阅