Quick RAG ⚡

🚀 Production-ready RAG (Retrieval-Augmented Generation) for JavaScript & React
Built on official Ollama & LM Studio SDKs.

🎉 v2.5.3 Released! React subpath exports, hybrid search sync fixes, live Ollama integration tests, and npm script stability improvements. See CHANGELOG.md for details.

✨ Features

🆕 v2.5.3 - Stability & Release Hardening

✅ React Subpath Exports - quick-rag/react now exports useRAG, initRAG, and createBrowserModelClient
✅ Hybrid Search Sync Fix - BM25 sync now indexes the full document set instead of truncating at 100 docs
✅ SmartRetriever Compatibility - Result shape is retriever-compatible while still exposing decision metadata
✅ Live Ollama Integration Test - validated against qwen3.5:9b + qwen3-embedding:0.6b
✅ Stable npm Scripts on Windows - npm script shell pinned to cmd.exe for reliable test execution

🆕 v2.5.2 - Stability & Compatibility

✅ React Export Fix - quick-rag/react now resolves correctly to useRAG
✅ Deterministic initRAG Tests - core test suite no longer depends on external Ollama availability
✅ Ollama Base Model Alignment - examples and tests standardized to granite4:3b
🐛 Critical Bug Fix - ConversationManager.addAssistantMessage() now correctly passes content
🌐 Browser Compatibility - Cross-platform UUID generation (Node.js + Browser)
📦 Cleaner Dependencies - Removed invalid self-referencing dependency
🤖 Updated Default Models - qwen3-embedding:0.6b (Ollama) & google/gemma-3-4b (LM Studio)

v2.4.0 - Robustness & Explainability

🔪 Robust Chunking - Abbreviation-aware sentence splitting & word-safe text chunking
🔍 Rich Explainability - Detailed retrieval snippets, keyword density & term match metrics
🚀 BM25 Optimization - Min-Heap based top-K selection for fast retrieval in large datasets
🌐 Environment Stability - Universal UUID support for Node.js and Browser (globalThis.crypto)

v2.3.0 - Performance & Evaluation

🚀 Caching Layer - LRU cache, embedding cache, query cache for 10x speedup
💬 Conversation Manager - Context window management & auto-summarization
📊 RAG Evaluation - Precision@K, Recall, MRR, NDCG metrics
🗄️ Vector DB Connectors - ChromaDB & Qdrant adapters

🔍 v2.2.0 - Advanced Search

🔍 BM25 Sparse Search - Pure JS keyword-based retrieval (no dependencies!)
🔀 Hybrid Search - Combines BM25 + Vector with RRF fusion (20-30% better retrieval)
📊 Reranking - Multi-signal scoring (keyword, semantic, coverage, coherence)
🔄 Query Transformation - Expansion, decomposition, multi-query, HyDE

Core Features

🎯 Official SDKs - Built on ollama and @lmstudio/sdk packages
💾 Embedded Persistence - SQLite-based vector store (No server required!)
🛡️ Robust Error Handling - 7 custom error classes with recovery suggestions
📊 Telemetry & Metrics - Track performance, latency, and usage
📝 Structured Logging - JSON logging with Pino integration
⚡ 5x Faster - Parallel batch embedding
📄 Document Loaders - PDF, Word, Excel, Text, Markdown, URLs
🔪 Robust Chunking - Intelligent splitting that respects abbreviations (Dr., Prof.) and avoids word cutting
🏷️ Metadata Filtering - Filter by document properties
🔍 Rich Query Explainability - See WHY docs were retrieved with snippets and density metrics (unique!)
🎨 Dynamic Prompts - 10 built-in templates + full customization
🧠 Weighted Decision Making - Multi-criteria document scoring
🎯 Heuristic Reasoning - Pattern learning and query optimization
🔄 CRUD Operations - Add, update, delete documents on the fly
🌊 Streaming Support - Real-time AI responses
🔧 Zero Config - Works with React, Next.js, Vite, Node.js
💪 Type Safe - Full TypeScript support

📦 Installation

npm install quick-rag

Default Ollama models (examples/docs):

ollama pull granite4:3b
ollama pull qwen3-embedding:0.6b

Optional Dependencies:

# For embedded persistence
npm install better-sqlite3

# For vector databases (optional)
npm install chromadb @qdrant/js-client-rest

🆕 What's New in v2.3.0

🚀 Caching Layer

Speed up repeated operations with intelligent caching:

import { CacheManager, EmbeddingCache } from 'quick-rag';

// Unified cache manager
const cache = new CacheManager({
  embeddings: { maxSize: 5000, ttl: 3600000 }, // 1 hour
  queries: { maxSize: 500, ttl: 1800000 }      // 30 min
});

// Wrap embedding function for automatic caching
const cachedEmbed = cache.wrapEmbedding(embedFn);

// Check statistics
console.log(cache.getStats());
// { embeddings: { size: 100, cacheHits: 450, cacheMisses: 50, hitRate: 0.9 } }

💬 Conversation Manager

Manage chat history with context window limits:

import { ConversationManager, getContextLimit } from 'quick-rag';

const conversation = new ConversationManager({
  maxTokens: getContextLimit('llama3'), // 8192
  autoSummarize: true,
  systemPrompt: 'You are a helpful assistant.'
});

conversation.addMessage('user', 'What is RAG?');
conversation.addMessage('assistant', 'RAG stands for...');

// Get context for LLM (respects token limits)
const context = conversation.getContext();

// Fork, export, or summarize
const forked = conversation.fork();
const json = conversation.toJSON();

📊 RAG Evaluation

Measure retrieval quality with standard metrics:

import { precisionAtK, meanReciprocalRank, RAGEvaluator } from 'quick-rag';

// Individual metrics
const retrieved = ['doc1', 'doc4', 'doc2'];
const relevant = ['doc1', 'doc2', 'doc3'];

console.log(precisionAtK(retrieved, relevant, 3));  // 0.667
console.log(meanReciprocalRank(retrieved, relevant)); // 1.0

// Full evaluation
const evaluator = new RAGEvaluator(retriever);
const results = await evaluator.evaluate(testQueries);
console.log(results.metrics); // { precision, recall, mrr, ndcg }

🗄️ Vector Database Connectors

Connect to external vector databases:

import { createVectorStore, ChromaVectorStore, QdrantVectorStore } from 'quick-rag';

// Factory pattern
const store = await createVectorStore('chroma', embedFn, {
  collectionName: 'my-docs',
  host: 'localhost',
  port: 8000
});

// Or direct usage
const qdrant = new QdrantVectorStore(embedFn, {
  url: 'http://localhost:6333',
  collectionName: 'documents'
});

🆕 What's New in v2.4.0

🔪 Robust Chunking

Intelligent text splitting that handles abbreviations and prevents word splitting:

import { chunkBySentences, chunkText } from 'quick-rag';

// Handles Dr., Prof., LTD., approx., etc.
const chunks = chunkBySentences(text, { 
  sentencesPerChunk: 3,
  overlapSentences: 1 
});

// Avoids cutting words in half
const textChunks = chunkText(text, { 
  chunkSize: 500,
  overlap: 50,
  separator: ' ' // Word-safe splitting
});

🔍 Rich Query Explainability

Get deep insights into why a document was retrieved:

const results = await retriever.getRelevant(query, 3, { explain: true });

console.log(results[0].explanation);
/*
{
  score: 0.88,
  snippet: "...context surrounding the match...",
  relevanceFactors: {
    semanticScore: 0.88,
    termMatch: 0.75,   // 3/4 terms matched
    density: 0.15      // concentration of keywords
  }
}
*/

🔍 What's in v2.2.0

🔍 BM25 Sparse Search

Pure JavaScript implementation - no external dependencies!

import { BM25 } from 'quick-rag';

const bm25 = new BM25({ k1: 1.2, b: 0.75 });
bm25.addDocuments([
  { id: '1', text: 'Machine learning is a subset of AI' },
  { id: '2', text: 'Deep learning uses neural networks' },
  { id: '3', text: 'Natural language processing handles text' }
]);

const results = bm25.search('neural networks AI', 2);
// Fast keyword-based retrieval with TF-IDF scoring

🔀 Hybrid Search (BM25 + Vector)

Combine sparse and dense retrieval for 20-30% better results!

import { HybridRetriever, InMemoryVectorStore } from 'quick-rag';

const vectorStore = new InMemoryVectorStore(embedFn);
await vectorStore.addDocuments(docs);

const hybrid = new HybridRetriever(vectorStore, {
  alpha: 0.5,           // Balance: 0=sparse only, 1=dense only
  fusionMethod: 'rrf',  // Reciprocal Rank Fusion
  rrfK: 60
});

const results = await hybrid.search('query', 5, { explain: true });
// Results include both dense and sparse scores

📊 Reranking

Multi-signal scoring to improve top-K precision:

import { Reranker, createRerankedRetriever } from 'quick-rag';

const reranker = new Reranker({
  keywordWeight: 0.35,   // Keyword overlap
  semanticWeight: 0.35,  // Semantic similarity
  coverageWeight: 0.20,  // Query term coverage
  coherenceWeight: 0.10  // Text coherence
});

// Rerank any retriever's results
const reranked = reranker.rerank(query, initialResults, { explain: true });

// Or wrap a retriever for automatic reranking
const smartRetriever = createRerankedRetriever(hybridRetriever, rerankerOptions);

🔄 Query Transformation

Advanced query processing techniques:

import { QueryExpander, QueryDecomposer, MultiQueryGenerator } from 'quick-rag';

// 1. Query Expansion - Add synonyms
const expander = new QueryExpander();
expander.addSynonyms('ml', ['machine learning', 'AI']);
const expanded = expander.expand('ml models');
// "ml models machine learning AI"

// 2. Query Decomposition - Split complex queries
const decomposer = new QueryDecomposer();
const parts = decomposer.decompose('Compare BM25 with vector search and explain differences');
// ["Compare BM25 with vector search", "explain differences"]

// 3. Multi-Query - Generate variations
const generator = new MultiQueryGenerator();
const variations = generator.generate('How does RAG work?');
// ["How does RAG work?", "What is RAG?", "RAG explanation"]

🎯 Full Pipeline Example

Combine all features for maximum retrieval quality:

import {
  OllamaRAGClient,
  createOllamaRAGEmbedding,
  InMemoryVectorStore,
  HybridRetriever,
  createRerankedRetriever,
  QueryExpander,
  generateWithRAG
} from 'quick-rag';

// Setup
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');
const store = new InMemoryVectorStore(embed);
await store.addDocuments(documents);

// Create hybrid + reranked retriever
const hybrid = new HybridRetriever(store, { alpha: 0.5, fusionMethod: 'rrf' });
const retriever = createRerankedRetriever(hybrid, { keywordWeight: 0.3 });

// Expand query and retrieve
const expander = new QueryExpander();
const { expanded } = expander.expand(userQuery);
const results = await retriever.getRelevant(expanded, 5);

// Generate response
const response = await generateWithRAG(client, 'llama3', userQuery, results);

📚 Previous Features

💾 Embedded Persistence (v2.1.0)

Store your vectors locally without setting up a complex database server!

Zero Setup: Just provide a file path (./rag.db)
Fast: Built on better-sqlite3
Full Features: Batch insert, metadata filtering, CRUD

🛡️ Advanced Error Handling

Never crash without knowing why. New error system provides:

Specific Error Types: RAGError, EmbeddingError, RetrievalError, etc.
Error Codes: Programmatic handling
Recovery Hints: Actionable suggestions in error messages

📊 Metrics & Logging

Monitor your RAG pipeline in production:

Performance Tracking: Embedding time, search latency, generation speed
Structured Logs: JSON format for easy parsing
Prometheus Support: Export metrics for monitoring dashboards Advanced filtering with custom logic - filter documents using JavaScript functions:

const results = await retriever.getRelevant('latest AI news', 5, {
  filter: (meta) => {
    return meta.year === 2024 && 
           meta.tags.includes('AI') &&
           meta.difficulty !== 'beginner';
  }
});

📽️ PowerPoint Support

Load .pptx and .ppt files with officeparser:

import { loadDocument } from 'quick-rag';
const pptDoc = await loadDocument('./presentation.pptx');

📁 Organized Examples

12 comprehensive examples covering all features:

Basic Usage (Ollama & LM Studio)
Document Loading (PDF, Word, Excel)
Metadata Filtering
Streaming Responses
Advanced Filtering
Query Explainability
Prompt Management
Decision Engine (Simple & Real-World)
Conversation History & Export
New examples/ folder for direct npm i quick-rag usage

🆕 Previous Features (v1.1.x)

📝 Internationalization Update

Translated all example files to English for better international accessibility
examples/10-decision-engine.js - Smart Document Selection example
examples/11-loaders.js - Document loaders example

🧠 Decision Engine (v1.1.0)

Revolutionary AI-powered retrieval system - The most advanced RAG retrieval available!

Quick RAG now includes a Decision Engine that goes far beyond simple cosine similarity. It combines:

🎯 Multi-Criteria Weighted Scoring - 5 factors evaluated together
🧠 Heuristic Reasoning - Pattern-based query optimization
� Adaptive Learning - Learns from user feedback
�🔍 Full Transparency - See exactly why each document was selected

Multi-Criteria Scoring

5 weighted factors beyond similarity:

📊 Semantic Similarity (50%) - Cosine similarity score
🔤 Keyword Match (20%) - Term matching in document
📅 Recency (15%) - Document freshness with exponential decay
⭐ Source Quality (10%) - Source reliability (official=1.0, research=0.9, blog=0.7, forum=0.6)
🎯 Context Relevance (5%) - Contextual fit

import { SmartRetriever, DEFAULT_WEIGHTS } from 'quick-rag';

// Create smart retriever with default weights
const smartRetriever = new SmartRetriever(basicRetriever);

// Or customize weights for your use case
const smartRetriever = new SmartRetriever(basicRetriever, {
  weights: {
    semanticSimilarity: 0.35,
    keywordMatch: 0.20,
    recency: 0.30,         // Higher for news sites
    sourceQuality: 0.10,
    contextRelevance: 0.05
  }
});

// Get results with decision transparency
const results = await smartRetriever.getRelevant('latest AI news', 3);

// See scoring breakdown for each document
console.log(results[0]);
// {
//   text: "...",
//   weightedScore: 0.742,
//   scoreBreakdown: {
//     semanticSimilarity: { score: 0.85, weight: 0.35, contribution: 0.298 },
//     keywordMatch: { score: 0.67, weight: 0.20, contribution: 0.134 },
//     recency: { score: 0.95, weight: 0.30, contribution: 0.285 },
//     sourceQuality: { score: 0.90, weight: 0.10, contribution: 0.090 },
//     contextRelevance: { score: 1.00, weight: 0.05, contribution: 0.050 }
//   }
// }

// Decision context shows WHY these results
console.log(results.decisions);
// {
//   weights: { ... },
//   appliedRules: ["boost-recent-for-news"],
//   suggestions: [
//     "Time-sensitive query detected. Prioritizing recent documents.",
//     "Consider using filters if you need older historical content."
//   ]
// }

Heuristic Reasoning

Pattern-based optimization that learns:

// Enable learning mode
const smartRetriever = new SmartRetriever(basicRetriever, {
  enableLearning: true,
  enableHeuristics: true
});

// Add custom rules
smartRetriever.heuristicEngine.addRule(
  'boost-documentation',
  (query, context) => query.includes('documentation'),
  (query, context) => {
    context.adjustWeight('sourceQuality', 0.15);  // Increase quality weight
    return { adjusted: true, reason: 'Documentation query prioritizes quality' };
  },
  5  // Priority
);

// Provide feedback to enable learning
smartRetriever.provideFeedback(query, results, {
  rating: 5,           // 1-5 rating
  hasFilters: true,    // User applied filters
  comment: 'Perfect results!'
});

// System learns successful patterns
const insights = smartRetriever.getInsights();
console.log(insights.heuristics.successfulPatterns);
// ["latest", "documentation", "official release"]

// Export learned knowledge
const knowledge = smartRetriever.exportKnowledge();

// Import to another instance
newRetriever.importKnowledge(knowledge);

Scenario Customization

Different weights for different use cases:

// News Platform - Recency Priority
const newsRetriever = new SmartRetriever(basicRetriever, {
  weights: {
    semanticSimilarity: 0.30,
    keywordMatch: 0.20,
    recency: 0.40,         // 🔥 High recency
    sourceQuality: 0.05,
    contextRelevance: 0.05
  }
});

// Documentation Site - Quality Priority  
const docsRetriever = new SmartRetriever(basicRetriever, {
  weights: {
    semanticSimilarity: 0.35,
    keywordMatch: 0.20,
    recency: 0.10,
    sourceQuality: 0.30,   // 🔥 High quality
    contextRelevance: 0.05
  }
});

// Research Platform - Balanced
const researchRetriever = new SmartRetriever(basicRetriever, {
  weights: DEFAULT_WEIGHTS  // Balanced approach
});

Real-World Example

See examples/11-loaders.js for a complete example with:

PDF document loading
Multiple source types (official, blog, research, forum)
3 different scenarios (news, documentation, research)
RAG generation with quality metrics
Decision transparency and explanations

Benefits:

✅ More accurate retrieval than pure similarity
✅ Adapts to different content types automatically
✅ Learns from user interactions
✅ Fully explainable decisions
✅ Customizable for any use case
✅ Production-ready with proven patterns

🔍 Query Explainability (v1.1.0)

Understand WHY documents were retrieved - A first-of-its-kind feature!

const results = await retriever.getRelevant('What is Ollama?', 3, {
  explain: true
});

// Each result includes detailed explanation:
console.log(results[0].explanation);
// {
//   queryTerms: ["ollama", "local", "ai"],
//   matchedTerms: ["ollama", "local"],
//   matchCount: 2,
//   matchRatio: 0.67,
//   cosineSimilarity: 0.856,
//   relevanceFactors: {
//     termMatches: 2,
//     semanticSimilarity: 0.856,
//     coverage: "67%"
//   }
// }

Use cases: Debug searches, optimize queries, validate accuracy, explain to users

🎨 Dynamic Prompt Management (v1.1.0)

10 built-in templates + full customization

// Quick template selection
await generateWithRAG(client, model, query, docs, {
  template: 'conversational'  // or: technical, academic, code, etc.
});

// System prompts for role definition
await generateWithRAG(client, model, query, docs, {
  systemPrompt: 'You are a helpful programming tutor',
  template: 'instructional'
});

// Advanced: Reusable PromptManager
import { createPromptManager } from 'quick-rag';

const promptMgr = createPromptManager({
  systemPrompt: 'You are an expert engineer',
  template: 'technical'
});

await generateWithRAG(client, model, query, docs, {
  promptManager: promptMgr
});

Templates: default, conversational, technical, academic, code, concise, detailed, qa, instructional, creative

🚀 Quick Start

Option 1: With Official Ollama SDK (Recommended)

import { 
  OllamaRAGClient, 
  createOllamaRAGEmbedding,
  InMemoryVectorStore, 
  Retriever 
} from 'quick-rag';

// 1. Initialize client (official SDK)
const client = new OllamaRAGClient({
  host: 'http://127.0.0.1:11434'
});

// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');

// 3. Create vector store
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocument({ 
  text: 'Ollama provides local LLM hosting.' 
});

// 5. Query with streaming (official SDK feature!)
const results = await retriever.getRelevant('What is Ollama?', 2);
const context = results.map(d => d.text).join('\n');

const response = await client.chat({
  model: 'granite4:3b',
  messages: [{ 
    role: 'user', 
    content: `Context: ${context}\n\nQuestion: What is Ollama?` 
  }],
  stream: true, // Official SDK streaming!
});

// Stream response
for await (const part of response) {
  process.stdout.write(part.message?.content || '');
}

Option 2: React with Vite

💡 Starting from scratch? Check out the detailed step-by-step guide in QUICKSTART_REACT.md!

Step 1: Create your project

npm create vite@latest my-rag-app -- --template react
cd my-rag-app
npm install quick-rag express concurrently

Step 2: Create backend proxy (server.js in project root)

import express from 'express';
import { OllamaRAGClient } from 'quick-rag';

const app = express();
app.use(express.json());

const client = new OllamaRAGClient({ host: 'http://127.0.0.1:11434' });

app.post('/api/generate', async (req, res) => {
  const { model = 'granite4:3b', messages } = req.body;
  const response = await client.chat({ model, messages, stream: false });
  res.json({ response: response.message.content });
});

app.post('/api/embed', async (req, res) => {
  const { model = 'qwen3-embedding:0.6b', input } = req.body;
  const response = await client.embed(model, input);
  res.json(response);
});

app.listen(3001, () => console.log('🚀 Server: http://127.0.0.1:3001'));

Step 3: Configure Vite proxy (vite.config.js)

import { defineConfig } from 'vite';
import react from '@vitejs/plugin-react';

export default defineConfig({
  plugins: [react()],
  server: {
    proxy: {
      '/api': {
        target: 'http://127.0.0.1:3001',
        changeOrigin: true
      }
    }
  }
});

Step 4: Update package.json scripts

{
  "scripts": {
    "dev": "concurrently \"npm:server\" \"npm:client\"",
    "server": "node server.js",
    "client": "vite"
  }
}

Step 5: Use in your React component (src/App.jsx)

import { useState, useEffect } from 'react';
import { useRAG, initRAG, createBrowserModelClient } from 'quick-rag';

const docs = [
  { id: '1', text: 'React is a JavaScript library for building user interfaces.' },
  { id: '2', text: 'Ollama provides local LLM hosting.' },
  { id: '3', text: 'RAG combines retrieval with AI generation.' }
];

export default function App() {
  const [rag, setRAG] = useState(null);
  const [query, setQuery] = useState('');
  
  const { run, loading, response, docs: results } = useRAG({
    retriever: rag?.retriever,
    modelClient: createBrowserModelClient(),
    model: 'granite4:3b'
  });

  useEffect(() => {
    initRAG(docs, {
      baseEmbeddingOptions: {
        useBrowser: true,
        baseUrl: '/api/embed',
        model: 'qwen3-embedding:0.6b'
      }
    }).then(core => setRAG(core));
  }, []);

  return (
    <div style={{ padding: 40 }}>
      <h1>🤖 RAG Demo</h1>
      <input 
        value={query} 
        onChange={e => setQuery(e.target.value)}
        placeholder="Ask something..."
        style={{ width: 300, padding: 10 }}
      />
      <button onClick={() => run(query)} disabled={loading}>
        {loading ? 'Thinking...' : 'Ask AI'}
      </button>
      
      {results && (
        <div>
          <h3>📚 Retrieved:</h3>
          {results.map(d => <p key={d.id}>{d.text}</p>)}
        </div>
      )}
      
      {response && (
        <div>
          <h3>✨ Answer:</h3>
          <p>{response}</p>
        </div>
      )}
    </div>
  );
}

Step 6: Run your app

npm run dev

Open http://localhost:5173 🎉

Option 2: Next.js (Pages Router)

Step 1: Create API routes

// pages/api/generate.js
import { OllamaClient } from 'quick-rag';

export default async function handler(req, res) {
  const client = new OllamaClient();
  const { model = 'granite4:3b', prompt } = req.body;
  const response = await client.generate(model, prompt);
  res.json({ response });
}

// pages/api/embed.js
import { OllamaClient } from 'quick-rag';

export default async function handler(req, res) {
  const client = new OllamaClient();
  const { model = 'qwen3-embedding:0.6b', input } = req.body;
  const response = await client.embed(model, input);
  res.json(response);
}

Step 2: Use in your page (same React component as above)

Option 3: Vanilla JavaScript (Node.js)

Simple approach with official Ollama SDK:

import { 
  OllamaRAGClient, 
  createOllamaRAGEmbedding, 
  InMemoryVectorStore, 
  Retriever 
} from 'quick-rag';

// 1. Initialize client
const client = new OllamaRAGClient();

// 2. Setup embedding
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');

// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocuments([
  { text: 'JavaScript is a programming language.' },
  { text: 'Python is great for data science.' },
  { text: 'Rust is a systems programming language.' }
]);

// 5. Query
const query = 'What is JavaScript?';
const results = await retriever.getRelevant(query, 2);

// 6. Generate answer
const context = results.map(d => d.text).join('\n');
const response = await client.chat({
  model: 'granite4:3b',
  messages: [{ 
    role: 'user', 
    content: `Context:\n${context}\n\nQuestion: ${query}\n\nAnswer:` 
  }]
});

// Clean output
console.log('📚 Retrieved:', results.map(d => d.text));
console.log('🤖 Answer:', response.message.content);

Output:

📚 Retrieved: [
  'JavaScript is a programming language.',
  'Python is great for data science.'
]
🤖 Answer: JavaScript is a programming language that allows developers 
to write code and implement functionality in web browsers...

Option 4: LM Studio 🎨

Use LM Studio instead of Ollama with OpenAI-compatible API:

import { 
  LMStudioRAGClient, 
  createLMStudioRAGEmbedding, 
  InMemoryVectorStore, 
  Retriever, 
  generateWithRAG 
} from 'quick-rag';

// 1. Initialize LM Studio client
const client = new LMStudioRAGClient();

// 2. Setup embedding (use your embedding model from LM Studio)
const embed = createLMStudioRAGEmbedding(client, 'text-embedding-embeddinggemma-300m');

// 3. Create vector store and retriever
const vectorStore = new InMemoryVectorStore(embed);
const retriever = new Retriever(vectorStore);

// 4. Add documents
await vectorStore.addDocuments([
  { text: 'LM Studio is a desktop app for running LLMs locally.' },
  { text: 'It provides an OpenAI-compatible API.' },
  { text: 'You can use models like Llama, Mistral, and more.' }
]);

// 5. Query with RAG
const results = await retriever.getRelevant('What is LM Studio?', 2);
const answer = await generateWithRAG(
  client,
  'google/gemma-3-4b', // or your model name
  'What is LM Studio?',
  results
);

console.log('Answer:', answer);

Prerequisites for LM Studio:

Download and install LM Studio
Download a language model (e.g., Llama 3.2, Mistral)
Download an embedding model (e.g., text-embedding-embeddinggemma-300m)
Start the local server: Developer > Local Server (default: http://localhost:1234)

For React projects: Import hooks from 'quick-rag/react':

import { useRAG, initRAG, createBrowserModelClient } from 'quick-rag/react';

📖 API Reference

React Hook: `useRAG`

const { run, loading, response, docs, streaming, error } = useRAG({
  retriever,        // Retriever instance
  modelClient,      // Model client (OllamaClient or BrowserModelClient)
  model            // Model name (e.g., 'granite4:3b')
});

// Ask a question
await run('What is React?');

// With options
await run('What is React?', {
  topK: 5,           // Number of documents to retrieve
  stream: true,      // Enable streaming
  onDelta: (chunk, fullText) => console.log(chunk)
});

Core Functions

Initialize RAG

const { retriever, store, mrl } = await initRAG(documents, {
  defaultDim: 128,              // Embedding dimension
  k: 2,                         // Default number of results
  mrlBaseDim: 768,             // Base embedding dimension
  baseEmbeddingOptions: {
    useBrowser: true,           // Use browser-safe fetch
    baseUrl: '/api/embed',      // Embedding endpoint
    model: 'qwen3-embedding:0.6b'    // Embedding model
  }
});

Generate with RAG

const result = await generateWithRAG({
  retriever,
  modelClient,
  model,
  query: 'Your question',
  topK: 3              // Optional: override default k
});

// Returns: { docs, response, prompt }

VectorStore API

const store = new InMemoryVectorStore(embeddingFn, { defaultDim: 128 });

// Add documents
await store.addDocument({ id: '1', text: 'Document text' });

// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([{ id: '1', text: '...' }], { 
  dim: 128,
  batchSize: 20,        // Process 20 chunks at a time
  maxConcurrent: 5,     // Max 5 concurrent requests
  onProgress: (current, total) => {
    console.log(`Progress: ${current}/${total}`);
  }
});

// Query
const results = await store.similaritySearch('query', k, queryDim);

// CRUD
const doc = store.getDocument('id');
const all = store.getAllDocuments();
await store.updateDocument('id', 'new text', { meta: 'data' });
store.deleteDocument('id');
store.clear();

Batch Processing for Large Documents (v2.0.3):

// Process large PDFs efficiently
const chunks = chunkDocuments([largePDF], { chunkSize: 1000, overlap: 100 });

await store.addDocuments(chunks, {
  batchSize: 20,        // Process 20 chunks per batch
  maxConcurrent: 5,     // Max 5 concurrent embedding requests
  onProgress: (current, total) => {
    console.log(`Embedding progress: ${current}/${total} (${Math.round(current/total*100)}%)`);
  }
});

Model Clients

Browser (with proxy)

const client = createBrowserModelClient({
  endpoint: '/api/generate'  // Your proxy endpoint
});

Node.js (direct)

const client = new OllamaClient({
  baseUrl: 'http://127.0.0.1:11434/api'
});

💡 Examples

CRUD Operations

// Add document dynamically
await store.addDocument({ 
  id: 'new-doc', 
  text: 'TypeScript adds types to JavaScript.' 
});

// Add multiple documents with batch processing (v2.0.3!)
await store.addDocuments([
  { id: 'doc1', text: 'First document' },
  { id: 'doc2', text: 'Second document' }
], {
  batchSize: 10,        // Process in batches
  maxConcurrent: 5,     // Rate limiting
  onProgress: (current, total) => {
    console.log(`Added ${current}/${total} documents`);
  }
});

// Update existing
await store.updateDocument('1', 'React 19 is the latest version.', {
  version: '19',
  updated: Date.now()
});

// Delete
store.deleteDocument('2');

// Query all
const allDocs = store.getAllDocuments();
console.log(`Total documents: ${allDocs.length}`);

Dynamic Retrieval

// Ask with different topK values
const result1 = await run('What is JavaScript?', { topK: 1 }); // Get 1 doc
const result2 = await run('What is JavaScript?', { topK: 5 }); // Get 5 docs

Streaming Responses

await run('Explain React hooks', {
  stream: true,
  onDelta: (chunk, fullText) => {
    console.log('New chunk:', chunk);
    // Update UI in real-time
  }
});

Custom Embedding Models

// Use different embedding models
const rag = await initRAG(docs, {
  baseEmbeddingOptions: {
    useBrowser: true,
    baseUrl: '/api/embed',
    model: 'qwen3-embedding:0.6b'  // or another compatible embedding model
  }
});

More examples: Check the examples/ folder for complete demos.

📄 Document Loaders (v0.7.4+)

Load documents from various formats and use them with RAG!

Supported Formats

Format	Function	Requires
PDF	`loadPDF()`	`npm install pdf-parse`
Word (.docx)	`loadWord()`	`npm install mammoth`
Excel (.xlsx)	`loadExcel()`	`npm install xlsx`
Text (.txt)	`loadText()`	Built-in ✅
JSON	`loadJSON()`	Built-in ✅
Markdown	`loadMarkdown()`	Built-in ✅
Web URLs	`loadURL()`	Built-in ✅

Quick Start

Load PDF:

import { loadPDF, chunkDocuments } from 'quick-rag';

// Load PDF
const pdf = await loadPDF('./document.pdf');
console.log(`Loaded ${pdf.meta.pages} pages`);

// Chunk and add to RAG
const chunks = chunkDocuments([pdf], { 
  chunkSize: 500, 
  overlap: 50 
});
await store.addDocuments(chunks);

Load from URL:

import { loadURL } from 'quick-rag';

const doc = await loadURL('https://example.com', {
  extractText: true  // Convert HTML to plain text
});
await store.addDocuments([doc]);

Load Directory:

import { loadDirectory } from 'quick-rag';

// Load all supported documents from a folder
const docs = await loadDirectory('./documents', {
  extensions: ['.pdf', '.docx', '.txt', '.md'],
  recursive: true
});

console.log(`Loaded ${docs.length} documents`);

// Chunk and add to vector store
const chunks = chunkDocuments(docs, { chunkSize: 500 });
await store.addDocuments(chunks);

Auto-Detect Format:

import { loadDocument } from 'quick-rag';

// Automatically detects file type
const doc = await loadDocument('./file.pdf');
// Works with: .pdf, .docx, .xlsx, .txt, .md, .json

Installation

# Core package (includes text, JSON, markdown, URL loaders)
npm install quick-rag

# Optional: PDF support
npm install pdf-parse

# Optional: Word support
npm install mammoth

# Optional: Excel support
npm install xlsx

# Or install all at once:
npm install quick-rag pdf-parse mammoth xlsx

Complete Example

import {
  loadPDF,
  loadDirectory,
  chunkDocuments,
  InMemoryVectorStore,
  Retriever,
  OllamaRAGClient,
  createOllamaRAGEmbedding,
  generateWithRAG
} from 'quick-rag';

// Load documents
const pdf = await loadPDF('./research.pdf');
const docs = await loadDirectory('./articles');

// Combine and chunk
const allDocs = [pdf, ...docs];
const chunks = chunkDocuments(allDocs, { 
  chunkSize: 500,
  overlap: 50 
});

// Setup RAG
const client = new OllamaRAGClient();
const embed = createOllamaRAGEmbedding(client, 'qwen3-embedding:0.6b');
const store = new InMemoryVectorStore(embed);
const retriever = new Retriever(store);

// Add to vector store
await store.addDocuments(chunks);

// Query
const results = await retriever.getRelevant('What is the main topic?', 3);
const answer = await generateWithRAG(client, 'granite4:3b', 
  'What is the main topic?', results);

console.log(answer);

See full example: examples/11-loaders.js

❓ Troubleshooting

Problem	Solution
🚫 CORS errors	Use a proxy server (Express/Next.js API routes)
🔌 Connection refused	Ensure Ollama is running: `ollama serve`
📦 Models not found	Pull models: `ollama pull granite4:3b && ollama pull qwen3-embedding:0.6b`
🌐 404 on `/api/embed`	Check your proxy configuration in `vite.config.js` or API routes
💻 Windows IPv6 issues	Use `127.0.0.1` instead of `localhost`
📦 Module not found	Check imports: use `'quick-rag'` not `'quick-rag/...'`

Note: v0.6.5+ automatically detects and uses the correct API (generate or chat) for any model.

📚 Documentation

📖 API Reference - Complete API documentation
🛡️ Error Handling - Error handling best practices
💾 SQLite Persistence - Embedded storage guide
📊 Metrics & Telemetry - Monitoring and logging
🤝 Contributing - Contribution guidelines
📝 Changelog - Version history
💡 Examples - Working code examples
🚀 Quickstart - Quick start guides

🔗 Resources

Ollama Models: ollama.ai/library
LM Studio: lmstudio.ai
Issues: GitHub Issues
Discussions: GitHub Discussions
NPM Package: npmjs.com/package/quick-rag

📄 License

🙏 Acknowledgments

Built with:

Ollama JS SDK
LM Studio SDK
Pino - Fast logging
Better SQLite3 - Embedded database

Special thanks to all contributors and the open-source community!

Made with ❤️ for the JavaScript & AI community

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
.github		.github
docs		docs
examples		examples
src		src
test		test
.editorconfig		.editorconfig
.gitignore		.gitignore
.npmignore		.npmignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
QUICKSTART_REACT.md		QUICKSTART_REACT.md
README.md		README.md
build.js		build.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

Quick RAG ⚡

✨ Features

🆕 v2.5.3 - Stability & Release Hardening

🆕 v2.5.2 - Stability & Compatibility

v2.4.0 - Robustness & Explainability

v2.3.0 - Performance & Evaluation

🔍 v2.2.0 - Advanced Search

Core Features

📦 Installation

🆕 What's New in v2.3.0

🚀 Caching Layer

💬 Conversation Manager

📊 RAG Evaluation

🗄️ Vector Database Connectors

🆕 What's New in v2.4.0

🔪 Robust Chunking

🔍 Rich Query Explainability

🔍 What's in v2.2.0

🔍 BM25 Sparse Search

🔀 Hybrid Search (BM25 + Vector)

📊 Reranking

🔄 Query Transformation

🎯 Full Pipeline Example

📚 Previous Features

💾 Embedded Persistence (v2.1.0)

🛡️ Advanced Error Handling

📊 Metrics & Logging

📽️ PowerPoint Support

📁 Organized Examples

🆕 Previous Features (v1.1.x)

📝 Internationalization Update

🧠 Decision Engine (v1.1.0)

Multi-Criteria Scoring

Heuristic Reasoning

Scenario Customization

Real-World Example

🔍 Query Explainability (v1.1.0)

🎨 Dynamic Prompt Management (v1.1.0)

🚀 Quick Start

Option 1: With Official Ollama SDK (Recommended)

Option 2: React with Vite

Option 2: Next.js (Pages Router)

Option 3: Vanilla JavaScript (Node.js)

Option 4: LM Studio 🎨

📖 API Reference

React Hook: useRAG

Core Functions

VectorStore API

Model Clients

💡 Examples

CRUD Operations

Dynamic Retrieval

Streaming Responses

Custom Embedding Models

📄 Document Loaders (v0.7.4+)

Supported Formats

Quick Start

Installation

Complete Example

❓ Troubleshooting

📚 Documentation

🔗 Resources

📄 License

🙏 Acknowledgments

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Uh oh!

React Hook: `useRAG`

Packages