Skip to content

prk2007/Dynamic-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ—„οΈ LanceDB MCP Server for Cursor

Node.js 18+ License: MIT

An HTTP-based Model Context Protocol (MCP) server that enables Cursor IDE to interact directly with documents through agentic RAG and hybrid search in LanceDB. Ask questions about your document dataset as a whole or about specific documents.

✨ Features

  • πŸ” LanceDB-powered serverless vector index and document summary catalog
  • 🌐 HTTP-based MCP server compatible with Cursor IDE
  • 🎯 OpenAI Embeddings for high-quality semantic search (text-embedding-3-small)
  • πŸ“Š Efficient token usage - The LLM looks up what it needs when it needs it
  • πŸ“ˆ Security - Index is stored locally, minimizing data transfer
  • πŸš€ Multiple seed script options - TypeScript/Node.js and Python implementations available

⚠️ Important Notes

  • This is an HTTP-based MCP server, not stdio-based
  • Designed for Cursor IDE, not Claude Desktop
  • Requires OpenAI API key for embeddings (uses text-embedding-3-small model)
  • The client.js uses OpenAI embeddings, not Ollama embeddings

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • npx
  • OpenAI API Key (for embeddings)
  • Cursor IDE with MCP support
  • Summarization models for seeding (if using Ollama):
    • ollama pull snowflake-arctic-embed2 (for seed scripts)
    • ollama pull gemma3:4b (for summarization)

Configuration Required

Before running the server, you need to configure your OpenAI API key in the client.ts file:

  1. Open src/lancedb/client.ts
  2. Replace {OPEN_AI_KEY} with your actual OpenAI API key:
const OPENAI_API_KEY = "your-actual-openai-api-key-here";

Installation & Setup

  1. Build the project:
npm install
npm run build
  1. Configure Cursor IDE:

Create or edit ~/.cursor/mcp.json:

{
  "mcpServers": {
    "lancedb": {
      "url": "http://localhost:3001/mcp"
    }
  }
}
  1. Start the MCP server:
node dist/index.js

The server will run on port 3001 by default. You can change this with the PORT environment variable:

PORT=3002 node dist/index.js

Server Endpoints

The HTTP server provides multiple endpoints:

  • GET / - Server information and configuration instructions
  • POST /mcp - Main MCP JSON-RPC endpoint for Cursor
  • GET /tools - List available tools
  • POST /search - Direct search API for testing
  • GET /health - Health check endpoint
  • POST /test-mcp - Test MCP protocol implementation

Testing the Server

Test if the server is running correctly:

# Check server health
curl http://localhost:3001/health

# List available tools
curl -X POST http://localhost:3001/mcp \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'

# Direct search test
curl -X POST http://localhost:3001/search \
  -H "Content-Type: application/json" \
  -d '{"query":"test search"}'

πŸ“š Seeding Data

The seed script creates two tables in LanceDB:

  1. Catalog table - Document summaries and metadata
  2. Chunks table - Vectorized document chunks for semantic search

⚠️ Embedding Model Mismatch Warning

Important: The seed scripts use different embedding models than the MCP server:

  • Seed scripts: Use Ollama embeddings (snowflake-arctic-embed2) or API-specific embeddings
  • MCP server: Uses OpenAI embeddings (text-embedding-3-small)

This means you must use compatible seed scripts or modify the embedding configuration to match.

Option 1: TypeScript/Node.js Seed Script (Built-in)

⚠️ Note: The default TypeScript seed script uses Ollama embeddings. You'll need to modify it to use OpenAI embeddings for compatibility with the server.

npm run seed -- --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS>

Optional flags:

  • --overwrite - Recreate the index from scratch

Example:

npm run seed -- --dbpath /Users/username/lancedb-index --filesdir /Users/username/documents --overwrite

Option 2: Python Seed Scripts

The Python seed scripts are available in the Seed Script directory within the project:

OpenAI-based Script (Recommended for compatibility)

cd "Seed Script"
python3 seed_openai.py --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS> --api-key <YOUR_OPENAI_API_KEY>

Gemini-based Script

cd "Seed Script"
python3 seed_gemini.py --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS> --api-key <YOUR_GEMINI_API_KEY>

Standard Python Script (Ollama models - Not compatible without modification)

cd "Seed Script"
python3 seed.py --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS> [--overwrite]

Python Requirements

Before running Python scripts, install dependencies:

cd "Seed Script"
pip install -r requirenments.txt

Configuration

Default models can be adjusted in:

  • TypeScript: src/config.ts
  • Python: Variables at the top of each seed script
  • Client embeddings: src/lancedb/client.ts (OpenAI embeddings)

Current server configuration in src/lancedb/client.ts:

const OPENAI_API_KEY = "{OPEN_AI_KEY}"; // Replace with your key
const OPENAI_EMBEDDING_MODEL = "text-embedding-3-small";

🎯 Example Usage in Cursor

Once configured, you can use the MCP server in Cursor by:

  1. Opening Cursor IDE
  2. Using the MCP integration to query your documents
  3. Example prompts:
    • "What documents do we have in the catalog?"
    • "Summarize the key topics across all documents"
    • "Find information about [specific topic] in our documents"
    • "What does [specific document] say about [topic]?"

πŸ“ Available Tools

The server provides these tools for interaction with the index:

Catalog Tools

  • catalog_search: Search for relevant documents in the catalog

Chunks Tools

  • chunks_search: Find relevant chunks based on a specific document from the catalog
  • all_chunks_search: Find relevant chunks from all known documents

πŸ—οΈ Project Structure

lance-mcp/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ Seed/           # Seed script utilities
β”‚   β”œβ”€β”€ lancedb/        # LanceDB integration
β”‚   β”‚   └── client.ts   # OpenAI embeddings configuration
β”‚   β”œβ”€β”€ tools/          # MCP tools implementation
β”‚   β”œβ”€β”€ utils/          # Utility functions
β”‚   β”œβ”€β”€ config.ts       # Configuration settings
β”‚   β”œβ”€β”€ index.ts        # HTTP MCP server entry point
β”‚   └── seed.ts         # TypeScript seed script
β”œβ”€β”€ Seed Script/        # Python seed scripts
β”‚   β”œβ”€β”€ seed.py         # Python seed script (Ollama)
β”‚   β”œβ”€β”€ seed_gemini.py  # Gemini-based seed script
β”‚   β”œβ”€β”€ seed_openai.py  # OpenAI-based seed script
β”‚   β”œβ”€β”€ requirenments.txt # Python dependencies
β”‚   └── ReadMe.md       # Python scripts documentation
β”œβ”€β”€ sample-docs/        # Sample documents for testing
β”œβ”€β”€ dist/               # Compiled JavaScript (generated)
β”œβ”€β”€ package.json        # Node dependencies
└── tsconfig.json       # TypeScript configuration

πŸ”§ Development

Building the Project

npm run build

Watch Mode (Auto-rebuild on changes)

npm run watch

Environment Variables

  • PORT - Server port (default: 3001)
  • DEFAULT_DB_PATH - Default database path (default: '{path_to_lanceDB}')
  • ALLOWED_ORIGINS - Comma-separated list of allowed CORS origins

Example:

PORT=3002 DEFAULT_DB_PATH=/path/to/db node dist/index.js

πŸ› Troubleshooting

Common Issues

  1. OpenAI API Key not configured:

    • Edit src/lancedb/client.ts and add your OpenAI API key
    • Rebuild the project: npm run build
  2. Embedding model mismatch:

    • Ensure seed scripts and server use compatible embeddings
    • Recommended: Use seed_openai.py for seeding when using OpenAI embeddings in the server
  3. Cursor not connecting:

    • Verify ~/.cursor/mcp.json configuration
    • Check server is running on the correct port
    • Test with: curl http://localhost:3001/health
  4. Database path issues:

    • Use absolute paths for database and file directories
    • Ensure the path exists and has write permissions
  5. CORS errors:

    • The server is configured to accept Cursor requests
    • Additional origins can be added via ALLOWED_ORIGINS environment variable
  6. Memory issues with large documents:

    • Consider adjusting chunk size in seed scripts
    • Process documents in batches
  7. Python dependencies issues:

    • Make sure you're in the Seed Script directory when installing requirements
    • Use virtual environment if needed:
      cd "Seed Script"
      python3 -m venv venv
      source venv/bin/activate  # On Windows: venv\Scripts\activate
      pip install -r requirenments.txt

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

πŸ“§ Support

For issues and questions:

πŸ”— Related Links

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published