GPT
Definition
GPT refers to decoder-only transformer models trained to predict the next token (autoregressive). Scaling these models has led to today's large language models (LLMs) capable of few-shot and zero-shot tasks.
Decoder-only design is well-suited for generation: at each step the model conditions on previous tokens and predicts the next. LLMs built on this idea are then instruction-tuned and aligned (e.g. RLHF) for chat and tool use. For understanding-only tasks, BERT-style encoders can be more parameter-efficient.
How it works
Tokens are embedded and fed into causal decoder layers: each position can attend only to itself and previous positions (masked self-attention), so the model cannot “see” the future. The next token is predicted from the last position’s representation (often with a linear layer and softmax over the vocabulary). Training maximizes the likelihood of the next token given the preceding context (teacher forcing). Inference generates autoregressively: sample or greedily pick the next token, append it, and repeat until a stop condition. Prompt engineering and fine-tuning shape how the model uses this mechanism for tasks.
Use cases
Decoder-only models are the backbone of chat, code, and any task that benefits from autoregressive generation or few-shot prompting.
- Text and code generation (completion, summarization, dialogue)
- Few-shot and zero-shot classification via prompts
- Assistants and chatbots built on instruction-tuned models