This repo demonstrates a minimal Retrieval-Augmented Generation (RAG) pipeline that runs fully offline on your machine. ## Stack - Ollama to run a local LLM (e.g., mistral, llama3) - SentenceTransformers for text embeddings - ChromaDB as the local vector database ## Quickstart 1) Install Ollama and pull a model: bash
ollama pull mistral 2) Create a virtual environment & install dependencies: bash python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txt
Note: For GPU acceleration, install the appropriate PyTorch build following the instructions at https://pytorch.org/get-started/locally/ before installing sentence-transformers. 3) Add your .txt files to the data/ folder (a sample is provided). 4) Run the pipeline: bash python simple_rag.py --reset --query "What does the document say about renewable energy?" Common flags: - --data_dir data Folder with .txt files - --persist_dir .chroma Directory for Chroma persistence ('' for in-memory) - --collection local_rag Chroma collection name - --embed_model sentence-transformers/all-MiniLM-L6-v2 - --llm_model mistral (try llama3 or qwen if installed) - --chunk_size 500 --overlap 50 - --n_results 3 5) Example: bash python simple_rag.py --reset --n_results 5 --query "List the barriers to renewable deployment mentioned in the documents."
- If
ollamais not found in your PATH, start the Ollama app/daemon and ensure the CLI is available. - To start fresh, pass
--resetto recreate the Chroma collection. - You can switch models with
--llm_model llama3(afterollama pull llama3). - For larger datasets, consider a persistent DB (
--persist_dir .chroma) so you don’t re-index every run.
This guide shows how to set up a local RAG system using MCP (Model Context Protocol) with FastMCP.
- A local MCP server that exposes RAG capabilities as tools
- Search your documents using semantic, keyword, or hybrid search
- Get AI-generated answers grounded in your documents
- All running locally with free, open-source tools
- Python 3.8+ installed
- Ollama installed and running (https://ollama.com)
- At least one Ollama model downloaded (e.g.,
ollama pull mistral)
# Install MCP RAG requirements
pip install -r requirements-mcp.txt
# Download NLTK data (for tokenization)
python -c "import nltk; nltk.download('punkt')"Make sure you installed all required dependencies and downloaded an Ollama model before proceeding.
# Start with auto-initialization
python mcp_rag.py --auto-init
# Or start and initialize manually
python mcp_rag.pyThe server will run on http://localhost:8000 by default.
# Interactive mode
python mcp_client.py --mode interactive
# Demo mode
python mcp_client.py --mode demoYour MCP server exposes these tools:
Set up the RAG system with your documents.
Parameters:
data_dir(str): Directory with text files (default: "data")pattern(str): File pattern to match (default: "*.txt")embed_model(str): Sentence transformer modelchunk_size(int): Text chunk size (default: 500)overlap(int): Chunk overlap (default: 50)reset(bool): Reset collection (default: false)
Search your knowledge base.
Parameters:
query(str): Search querymode(str): "semantic", "keyword", or "hybrid"top_k(int): Number of results (default: 3)
Get AI-generated answers using RAG.
Parameters:
query(str): Question to answermode(str): Search modellm_model(str): Ollama model (default: "mistral")top_k(int): Context chunks to use
Perform mathematical calculations.
Parameters:
expression(str): Math expression to evaluate
Get current system status.
List available Ollama models.
# Custom host and port
python mcp_rag.py --host 0.0.0.0 --port 8080
# Auto-initialize with custom data directory
python mcp_rag.py --auto-init --data-dir /path/to/docs# Connect to different server
python mcp_client.py --server http://your-server:8080Add to your Claude Desktop configuration:
{
"mcpServers": {
"local-rag": {
"command": "python",
"args": ["/path/to/your/mcp_rag.py"],
"env": {}
}
}
}The server implements the standard MCP protocol and works with:
- Claude Desktop
- Continue.dev
- Any MCP-compatible client
💬 Enter command: init data_dir=data reset=true
✅ RAG system initialized successfully!
📁 Loaded 2 files from 'data'
📄 Created 45 chunks
💬 Enter command: search renewable energy
🔍 Search Results (hybrid mode, top 3):
[Retrieved context about renewable energy...]
💬 Enter command: answer What are the benefits of solar power?
🤖 AI Answer (using mistral):
Based on the provided context, solar power offers several benefits...
💬 Enter command: calc 25 * 8 + 150
🧮 Calculation: 25 * 8 + 150 = 350
This implementation demonstrates:
- Local MCP Server: No external dependencies or API keys needed
- RAG Integration: Full semantic, keyword, and hybrid search
- Tool Composition: Multiple tools working together (RAG + calculator)
- State Management: Persistent RAG state across tool calls
- Error Handling: Graceful error handling and user feedback
- Extensibility: Easy to add new tools and capabilities
- 100% Local: No cloud services or API keys required
- Production Ready: Proper error handling and state management
- MCP Standard: Works with any MCP-compatible client
- Backwards Compatible: Your existing
agentic_rag.pystill works - Extensible: Easy to add new MCP tools
-
"Ollama not found"
- Install Ollama from https://ollama.com
- Make sure
ollamacommand is in your PATH
-
"FastMCP not installed"
- Run:
pip install fastmcp
- Run:
-
"No files found"
- Check your
data/directory has .txt files - Verify file permissions
- Check your
-
"Collection already exists"
- Use
reset=trueparameter ininitialize_rag
- Use
- Check server logs for detailed error messages
- Use
rag_statustool to check system state - Test with the included client before using other MCP clients
- Add more document formats (PDF, Word, etc.)
- Implement document management tools
- Add query history and analytics
- Create custom embedding models
- Scale to larger document collections
Happy RAG-ing with MCP! 🚀