RAG (Retrieval-Augmented Generation)
Add knowledge retrieval to your agents with plug-and-play RAG.
When to use
- You have documents or knowledge bases to ground answers beyond the model weights.
- You already use
@agentskit/memoryvector backends and an@agentskit/adaptersembedder.
createRAG wires chunk → embed → store → retrieve; you still choose where vectors live (file, Redis, or custom store).
Install
npm install @agentskit/rag @agentskit/memory @agentskit/adapters
Quick Start
import { createRAG } from '@agentskit/rag'
import { openaiEmbedder } from '@agentskit/adapters'
import { fileVectorMemory } from '@agentskit/memory'
const rag = createRAG({
embed: openaiEmbedder({ apiKey: process.env.OPENAI_API_KEY }),
store: fileVectorMemory({ path: './vectors' }),
})
// Ingest documents
await rag.ingest([
{ id: 'doc-1', content: 'AgentsKit is a JavaScript agent toolkit...' },
{ id: 'doc-2', content: 'The runtime supports ReAct loops...' },
])
// Retrieve relevant context
const docs = await rag.retrieve('How does AgentsKit work?', { topK: 3 })
With Runtime
import { createRuntime } from '@agentskit/runtime'
import { openai } from '@agentskit/adapters'
const runtime = createRuntime({
adapter: openai({ apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o' }),
retriever: rag, // auto-injects retrieved context into prompts
})
const result = await runtime.run('Explain the AgentsKit architecture')
With React
import { useRAGChat } from '@agentskit/rag'
import { openai } from '@agentskit/adapters'
const chat = useRAGChat({
adapter: openai({ apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o' }),
rag,
})
Lifecycle: ingest vs retrieve
ingest(documents)— splits text into chunks (see Chunking), embeds each chunk, upserts intoVectorMemory. Duplicateids are overwritten per backend semantics.retrieve(query, { topK, threshold? })— embeds the query, runs vector search, returns ranked chunks for prompting.- Runtime /
useRAGChat— callretrieve(or equivalent) on your behalf each turn so the model sees fresh context.
Re-ingest when source documents change; there is no automatic filesystem watcher.
Public surface (summary)
| Export | Role |
|---|---|
createRAG(config) | Factory returning RAG instance with ingest, retrieve, and retriever-compatible surface |
useRAGChat | React hook: chat + automatic retrieval wiring |
Chunking
Documents are automatically chunked before embedding:
const rag = createRAG({
embed: openaiEmbedder({ apiKey }),
store: fileVectorMemory({ path: './vectors' }),
chunkSize: 512, // characters per chunk (default: 1000)
chunkOverlap: 50, // overlap between chunks (default: 100)
})
Bring Your Own Embedder
Any function matching (text: string) => Promise<number[]> works:
import { openaiEmbedder, geminiEmbedder, ollamaEmbedder } from '@agentskit/adapters'
openaiEmbedder({ apiKey, model: 'text-embedding-3-small' })
geminiEmbedder({ apiKey })
ollamaEmbedder({ model: 'nomic-embed-text' })
// Custom
const myEmbedder = async (text: string) => {
const response = await fetch('/api/embed', { method: 'POST', body: text })
return response.json()
}
Vector Stores
RAG works with any VectorMemory from @agentskit/memory:
| Store | Best for |
|---|---|
fileVectorMemory | Local development, small datasets |
redisVectorMemory | Production, fast networked access |
Custom VectorStore | LanceDB, Pinecone, Qdrant, etc. |
Troubleshooting
| Issue | What to check |
|---|---|
| No results / low quality | Increase topK, lower similarity threshold, shorten chunkSize, or improve chunk overlap. |
| Dimension errors | Embedder output size must match vector store dimensions (Redis) or first-write inference rules. |
| Stale answers | Re-run ingest after content changes; clear or rotate the vector path/index if needed. |
| Rate limits on ingest | Batch smaller; backoff between ingest calls; use local ollamaEmbedder for dev. |
See also
Start here · Packages · TypeDoc (@agentskit/rag) · Memory · Adapters · Runtime · RAG Pipeline example · @agentskit/core