RAG (Retrieval-Augmented Generation)

Add knowledge retrieval to your agents with plug-and-play RAG.

When to use

You have documents or knowledge bases to ground answers beyond the model weights.
You already use @agentskit/memory vector backends and an @agentskit/adapters embedder.

createRAG wires chunk → embed → store → retrieve; you still choose where vectors live (file, Redis, or custom store).

Install

npm install @agentskit/rag @agentskit/memory @agentskit/adapters

Quick Start

import { createRAG } from '@agentskit/rag'
import { openaiEmbedder } from '@agentskit/adapters'
import { fileVectorMemory } from '@agentskit/memory'

const rag = createRAG({
  embed: openaiEmbedder({ apiKey: process.env.OPENAI_API_KEY }),
  store: fileVectorMemory({ path: './vectors' }),
})

// Ingest documents
await rag.ingest([
  { id: 'doc-1', content: 'AgentsKit is a JavaScript agent toolkit...' },
  { id: 'doc-2', content: 'The runtime supports ReAct loops...' },
])

// Retrieve relevant context
const docs = await rag.retrieve('How does AgentsKit work?', { topK: 3 })

With Runtime

import { createRuntime } from '@agentskit/runtime'
import { openai } from '@agentskit/adapters'

const runtime = createRuntime({
  adapter: openai({ apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o' }),
  retriever: rag, // auto-injects retrieved context into prompts
})

const result = await runtime.run('Explain the AgentsKit architecture')

With React

import { useRAGChat } from '@agentskit/rag'
import { openai } from '@agentskit/adapters'

const chat = useRAGChat({
  adapter: openai({ apiKey: process.env.OPENAI_API_KEY!, model: 'gpt-4o' }),
  rag,
})

Lifecycle: ingest vs retrieve

ingest(documents) — splits text into chunks (see Chunking), embeds each chunk, upserts into VectorMemory. Duplicate ids are overwritten per backend semantics.
retrieve(query, { topK, threshold? }) — embeds the query, runs vector search, returns ranked chunks for prompting.
Runtime / useRAGChat — call retrieve (or equivalent) on your behalf each turn so the model sees fresh context.

Re-ingest when source documents change; there is no automatic filesystem watcher.

Public surface (summary)

Export	Role
`createRAG(config)`	Factory returning RAG instance with `ingest`, `retrieve`, and retriever-compatible surface
`useRAGChat`	React hook: chat + automatic retrieval wiring

Chunking

Documents are automatically chunked before embedding:

const rag = createRAG({
  embed: openaiEmbedder({ apiKey }),
  store: fileVectorMemory({ path: './vectors' }),
  chunkSize: 512,    // characters per chunk (default: 1000)
  chunkOverlap: 50,  // overlap between chunks (default: 100)
})

Bring Your Own Embedder

Any function matching (text: string) => Promise<number[]> works:

import { openaiEmbedder, geminiEmbedder, ollamaEmbedder } from '@agentskit/adapters'

openaiEmbedder({ apiKey, model: 'text-embedding-3-small' })
geminiEmbedder({ apiKey })
ollamaEmbedder({ model: 'nomic-embed-text' })

// Custom
const myEmbedder = async (text: string) => {
  const response = await fetch('/api/embed', { method: 'POST', body: text })
  return response.json()
}

Vector Stores

RAG works with any VectorMemory from @agentskit/memory:

Store	Best for
`fileVectorMemory`	Local development, small datasets
`redisVectorMemory`	Production, fast networked access
Custom `VectorStore`	LanceDB, Pinecone, Qdrant, etc.

Troubleshooting

Issue	What to check
No results / low quality	Increase `topK`, lower similarity `threshold`, shorten `chunkSize`, or improve chunk overlap.
Dimension errors	Embedder output size must match vector store `dimensions` (Redis) or first-write inference rules.
Stale answers	Re-run `ingest` after content changes; clear or rotate the vector path/index if needed.
Rate limits on ingest	Batch smaller; backoff between `ingest` calls; use local `ollamaEmbedder` for dev.

When to use​

Install​

Quick Start​

With Runtime​

With React​

Lifecycle: ingest vs retrieve​

Public surface (summary)​

Chunking​

Bring Your Own Embedder​

Vector Stores​

Troubleshooting​

See also​