检索增强生成 (RAG)

定义

**检索增强生成（RAG）**通过检索步骤增强大语言模型: given a query, you retrieve relevant documents (from a vector store or search index), then pass them as context to the LLM to generate an answer. 这减少了幻觉并使答案基于您的数据。

RAG 当需要以下情况时，通常比微调更受欢迎 频繁更新知识 (例如 internal docs, support articles) 无需重新训练, when you have domain-specific or private data that shouldn't be baked into weights, or when you want to cite sources in the model's answer. Fine-tuning is better when the desired behavior or style is stable and you can afford training and hosting.

工作原理

Index: Documents are chunked and embedded; vectors are stored in a vector database.
**查询：**用户查询被嵌入；系统检索 top-k 个最相似的块（参见 embeddings and RAG architecture).
**生成：**LLM 接收查询和检索到的文本并生成最终答案。

下图显示了查询时的流程：查询和向量数据库输入到嵌入和检索；检索到的 text becomes context and is passed with the query to the LLM to produce the answer. Indexing (chunking, embedding, storing) is done offline or incrementally; 检索 and generation run at query time. Quality depends on chunking, embedding choice, and how the prompt includes context.

Simple RAG pipeline (Python)

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

# Index documents (one-time or incremental)
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# Query
query = "What is RAG?"
docs = vectorstore.similarity_search(query, k=4)
context = "\n\n".join(d.page_content for d in docs)

prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using only the context below.\n\n{context}"),
    ("human", "{question}"),
])
llm = ChatOpenAI(model="gpt-4")
chain = prompt | llm
answer = chain.invoke({"context": context, "question": query})

应用场景

RAG fits any application where answers must be grounded in up-to-date or private documents rather than the model’s training data.

Customer support chatbots that answer from a knowledge base
Internal wiki and document Q&A
Legal or contract search and summarization
Product and FAQ search with cited answers

优缺点

Pros	Cons
Reduces hallucination	Retrieval quality depends on chunks and embeddings
No need to retrain for new docs	Latency from 检索 + generation
Easy to update knowledge	Need good chunking and indexing strategy

外部文档

RAG paper (Lewis et al.) — Original 检索-augmented generation
LangChain – Question answering / RAG
LlamaIndex – RAG
Vertex AI – RAG and grounding — RAG on Google Cloud

定义​

工作原理​

Simple RAG pipeline (Python)​

应用场景​

优缺点​

外部文档​

另请参阅​

定义

工作原理

Simple RAG pipeline (Python)

应用场景

优缺点

外部文档

另请参阅