RAG-Architektur

Definition

RAG-Architektur umfasst, wie Sie Dokumente aufteilen,, choose embeddings and vector stores, run Abruf (dense, sparse, or hybrid), and combine context mit dem LLM (prompt Entwurf, reranking).

Design-Entscheidungen here beeinflussen direkt RAG quality and latency. Trade-offs include chunk size (larger = more context per chunk, less precision), embedding model (quality vs cost), and whether to add a reranker or hybrid search. See vector databases for indexing options.

Funktionsweise

Chunking: Dokumente werden in Segmente aufgeteilt (nach Absatz, Satz oder fester Größe); Überlappung und Metadaten können hinzugefügt werden. Embed and index: Chunks are turned into vectors via an embedding model and stored in a vector database. Query: At query time the query is embedded; retrieve fetches the top-k similar chunks (dense search), optionally combined with keyword (sparse) for hybrid. Rank: An optional reranker (z. B. cross-encoder) rescores the top candidates. The chosen chunks are then formatted into the LLM prompt. Advanced setups add query rewriting, multi-hop Abruf, and citation extraction.

Anwendungsfälle

Architecture choices (chunking, Abruf, reranking) beeinflussen direkt answer quality and latency in production RAG.

Designing chunking and indexing for long documents or codebases
Choosing dense vs. sparse or hybrid Abruf for domain data
Adding reranking and citation for production RAG systems

RAG-Architektur

Definition

Funktionsweise

Anwendungsfälle

Externe Dokumentation

Siehe auch

Definition​

Funktionsweise​

Anwendungsfälle​

Externe Dokumentation​

Siehe auch​

Definition

Funktionsweise

Anwendungsfälle

Externe Dokumentation

Siehe auch