RAG
Retrieval-Augmented Generation: chunk documents, store embeddings, and retrieve relevant context at query time.
Flint's RAG module gives you the building blocks — a chunk() splitter, an Embedder interface you implement with your preferred embedding model, a memoryStore() in-memory vector store, and a retrieve() helper that wires them together. Bring your own embeddings: Flint does not bundle an embedding model.
Importing
import { chunk, memoryStore, retrieve } from 'flint/rag';
import type { Doc, Match, Filter, Embedder, VectorStore, ChunkOpts, RetrieveOpts } from 'flint/rag';memoryStore()
An in-memory vector store backed by cosine-similarity search. Documents survive for the lifetime of the VectorStore instance — nothing is persisted to disk.
Type
interface VectorStore {
upsert(docs: Doc[]): Promise<void>;
query(vec: number[], k: number, filter?: Filter): Promise<Match[]>;
delete(ids: string[]): Promise<void>;
}
function memoryStore(): VectorStore;Supporting types
type Doc = {
id: string;
text: string;
embedding: number[];
metadata?: Record<string, unknown>;
};
type Match = {
id: string;
text: string;
score: number; // cosine similarity, 0–1
metadata?: Record<string, unknown>;
};
type Filter = Record<string, unknown>;Methods
| Method | Description |
|---|---|
upsert(docs) | Insert or update documents. If a doc with the same id already exists it is replaced. All embeddings in the store must share the same dimension — mixing dimensions throws a TypeError. |
query(vec, k, filter?) | Return the top-k matches by cosine similarity. Optionally restrict candidates to docs whose metadata contains all key-value pairs in filter. |
delete(ids) | Remove documents by id. |
Example
import { memoryStore } from 'flint/rag';
const store = memoryStore();
// Upsert two documents (embeddings must be pre-computed)
await store.upsert([
{ id: 'doc-1', text: 'The capital of France is Paris.', embedding: [0.1, 0.9, 0.3] },
{ id: 'doc-2', text: 'The capital of Germany is Berlin.', embedding: [0.2, 0.8, 0.4] },
]);
// Query with a pre-computed vector
const matches = await store.query([0.15, 0.85, 0.35], 1);
// matches[0].text === 'The capital of France is Paris.'
// matches[0].score ≈ 0.99 (cosine similarity)
// Metadata filtering — only return docs tagged with source: 'wiki'
const filtered = await store.query(queryVec, 5, { source: 'wiki' });Limitation: All documents are stored in process memory. For production workloads with large corpora, swap memoryStore() for a persistent store (e.g. Pinecone, Qdrant, pgvector) that implements the same VectorStore interface.
Embedder interface
Flint defines the Embedder interface but does not provide an implementation. You supply one using whatever embedding API or model you prefer.
Type
interface Embedder {
embed(texts: string[]): Promise<number[][]>;
dimensions: number;
}| Member | Description |
|---|---|
embed(texts) | Accept a batch of strings; return one embedding vector per string. The returned array must be the same length as texts. |
dimensions | The fixed vector dimension this embedder produces. |
Example implementation (OpenAI)
import OpenAI from 'openai';
import type { Embedder } from 'flint/rag';
const openai = new OpenAI();
const openAIEmbedder: Embedder = {
dimensions: 1536, // text-embedding-3-small output size
async embed(texts: string[]): Promise<number[][]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
return response.data.map((d) => d.embedding);
},
};Example implementation (local / custom)
import type { Embedder } from 'flint/rag';
// Stub for illustration — replace with your model inference call
const localEmbedder: Embedder = {
dimensions: 384,
async embed(texts) {
// e.g. call a local Ollama endpoint, a WASM model, etc.
return texts.map(() => Array.from({ length: 384 }, Math.random));
},
};chunk()
Split a long string into overlapping or non-overlapping fixed-size chunks before embedding. Chunking prevents individual embeddings from spanning too much semantic territory and keeps each vector within your embedding model's context limit.
Type
type ChunkOpts = {
size: number;
overlap?: number;
};
function chunk(text: string, opts: ChunkOpts): string[];Options
| Option | Type | Default | Description |
|---|---|---|---|
size | number | — | Character count of each chunk. Must be > 0. |
overlap | number | 0 | Characters to repeat from the end of one chunk at the start of the next. Must be < size. |
Example
import { chunk } from 'flint/rag';
const text = 'The quick brown fox jumps over the lazy dog.';
// Non-overlapping chunks of 15 characters
chunk(text, { size: 15 });
// ['The quick brown', ' fox jumps over', ' the lazy dog.']
// Overlapping chunks — 5-character overlap preserves context across boundaries
chunk(text, { size: 15, overlap: 5 });
// ['The quick brown', 'rown fox jumps ', 'mps over the la', 'e lazy dog.']When to use overlap: Set overlap to roughly 10–20 % of size when your documents contain sentences that straddle chunk boundaries and losing the boundary context would harm retrieval quality.
retrieve()
A convenience wrapper that embeds a query string and calls store.query() in one step.
Type
type RetrieveOpts = {
embedder: Embedder;
store: VectorStore;
k: number;
filter?: Filter;
};
async function retrieve(query: string, opts: RetrieveOpts): Promise<Match[]>;Full RAG pipeline example
This example shows the complete flow: chunk a document, embed the chunks, store them, then retrieve relevant context at query time and inject it into an LLM call.
import { chunk, memoryStore, retrieve } from 'flint/rag';
import { call } from 'flint';
import type { Embedder } from 'flint/rag';
import { anthropicAdapter } from '@flint/adapter-anthropic';
import OpenAI from 'openai';
// 1. Set up adapter and embedder
const adapter = anthropicAdapter({ apiKey: process.env.ANTHROPIC_API_KEY! });
const openai = new OpenAI();
const embedder: Embedder = {
dimensions: 1536,
async embed(texts) {
const res = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: texts,
});
return res.data.map((d) => d.embedding);
},
};
// 2. Build the store
const store = memoryStore();
// 3. Index a document
const document = `
Flint is a TypeScript library for building LLM-powered applications.
It provides primitives for calling language models, managing conversation
history, and retrieving relevant documents via RAG.
Flint is model-agnostic and works with any OpenAI-compatible API.
`;
const chunks = chunk(document, { size: 200, overlap: 40 });
// Embed all chunks in one batch call
const embeddings = await embedder.embed(chunks);
await store.upsert(
chunks.map((text, i) => ({
id: `doc-chunk-${i}`,
text,
embedding: embeddings[i]!,
metadata: { source: 'flint-overview' },
})),
);
// 4. At query time — retrieve then generate
async function answer(question: string): Promise<string> {
const matches = await retrieve(question, { embedder, store, k: 3 });
const context = matches.map((m) => m.text).join('\n\n');
const result = await call({
adapter,
messages: [
{
role: 'system',
content: `Answer using only the provided context.\n\nContext:\n${context}`,
},
{ role: 'user', content: question },
],
model: 'claude-haiku-4-5',
});
if (!result.ok) throw result.error;
return result.value.message.content as string;
}
console.log(await answer('What is Flint?'));Limitations
- In-memory only.
memoryStore()loses all data when the process exits. For persistence, implementVectorStoreagainst a database. - Bring your own embedder. Flint intentionally ships no embedding model to avoid bundling large dependencies.
- Character-based chunking.
chunk()splits on character count, not token count or sentence boundaries. For higher-quality retrieval, pre-process text with a sentence splitter before passing it tochunk().
chunk() options
function chunk(text: string, options?: { size?: number; overlap?: number }): Chunk[]
type Chunk = { text: string; index: number };| Option | Default | Description |
|---|---|---|
size | 512 | Target chunk size in characters |
overlap | 64 | Characters of overlap between adjacent chunks |
Overlap helps retrieval: a sentence split across chunks still appears in full in at least one chunk.
EmbeddingStore interface
type EmbeddingStore = {
add(chunks: Chunk[], embedder: (text: string) => Promise<number[]>): Promise<void>;
query(embedding: number[], topK: number): Promise<Array<{ text: string; score: number }>>;
};Implement this to use any vector database. See FAQ: Does Flint include a vector database?
retrieve() options
function retrieve(
store: EmbeddingStore,
query: string,
embedder: (text: string) => Promise<number[]>,
options?: { topK?: number }
): Promise<Array<{ text: string; score: number }>>score is the cosine similarity — 1.0 is identical, 0.0 is orthogonal. Filter by score threshold for quality control:
const results = await retrieve(store, query, embed, { topK: 10 });
const relevant = results.filter(r => r.score > 0.7);See Also
- Memory — conversation history and scratchpad for injecting retrieved context
- call() — the low-level LLM call primitive used in the pipeline example
- Example: RAG Pipeline
- FAQ: How does Flint handle RAG?
