An HTTP-based Model Context Protocol (MCP) server that enables Cursor IDE to interact directly with documents through agentic RAG and hybrid search in LanceDB. Ask questions about your document dataset as a whole or about specific documents.
- π LanceDB-powered serverless vector index and document summary catalog
- π HTTP-based MCP server compatible with Cursor IDE
- π― OpenAI Embeddings for high-quality semantic search (text-embedding-3-small)
- π Efficient token usage - The LLM looks up what it needs when it needs it
- π Security - Index is stored locally, minimizing data transfer
- π Multiple seed script options - TypeScript/Node.js and Python implementations available
- This is an HTTP-based MCP server, not stdio-based
- Designed for Cursor IDE, not Claude Desktop
- Requires OpenAI API key for embeddings (uses text-embedding-3-small model)
- The client.js uses OpenAI embeddings, not Ollama embeddings
- Node.js 18+
- npx
- OpenAI API Key (for embeddings)
- Cursor IDE with MCP support
- Summarization models for seeding (if using Ollama):
ollama pull snowflake-arctic-embed2(for seed scripts)ollama pull gemma3:4b(for summarization)
Before running the server, you need to configure your OpenAI API key in the client.ts file:
- Open
src/lancedb/client.ts - Replace
{OPEN_AI_KEY}with your actual OpenAI API key:
const OPENAI_API_KEY = "your-actual-openai-api-key-here";- Build the project:
npm install
npm run build- Configure Cursor IDE:
Create or edit ~/.cursor/mcp.json:
{
"mcpServers": {
"lancedb": {
"url": "http://localhost:3001/mcp"
}
}
}- Start the MCP server:
node dist/index.jsThe server will run on port 3001 by default. You can change this with the PORT environment variable:
PORT=3002 node dist/index.jsThe HTTP server provides multiple endpoints:
GET /- Server information and configuration instructionsPOST /mcp- Main MCP JSON-RPC endpoint for CursorGET /tools- List available toolsPOST /search- Direct search API for testingGET /health- Health check endpointPOST /test-mcp- Test MCP protocol implementation
Test if the server is running correctly:
# Check server health
curl http://localhost:3001/health
# List available tools
curl -X POST http://localhost:3001/mcp \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/list","params":{}}'
# Direct search test
curl -X POST http://localhost:3001/search \
-H "Content-Type: application/json" \
-d '{"query":"test search"}'The seed script creates two tables in LanceDB:
- Catalog table - Document summaries and metadata
- Chunks table - Vectorized document chunks for semantic search
Important: The seed scripts use different embedding models than the MCP server:
- Seed scripts: Use Ollama embeddings (snowflake-arctic-embed2) or API-specific embeddings
- MCP server: Uses OpenAI embeddings (text-embedding-3-small)
This means you must use compatible seed scripts or modify the embedding configuration to match.
npm run seed -- --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS>Optional flags:
--overwrite- Recreate the index from scratch
Example:
npm run seed -- --dbpath /Users/username/lancedb-index --filesdir /Users/username/documents --overwriteThe Python seed scripts are available in the Seed Script directory within the project:
cd "Seed Script"
python3 seed_openai.py --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS> --api-key <YOUR_OPENAI_API_KEY>cd "Seed Script"
python3 seed_gemini.py --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS> --api-key <YOUR_GEMINI_API_KEY>cd "Seed Script"
python3 seed.py --dbpath <PATH_TO_LOCAL_INDEX_DIR> --filesdir <PATH_TO_DOCS> [--overwrite]Before running Python scripts, install dependencies:
cd "Seed Script"
pip install -r requirenments.txtDefault models can be adjusted in:
- TypeScript:
src/config.ts - Python: Variables at the top of each seed script
- Client embeddings:
src/lancedb/client.ts(OpenAI embeddings)
Current server configuration in src/lancedb/client.ts:
const OPENAI_API_KEY = "{OPEN_AI_KEY}"; // Replace with your key
const OPENAI_EMBEDDING_MODEL = "text-embedding-3-small";Once configured, you can use the MCP server in Cursor by:
- Opening Cursor IDE
- Using the MCP integration to query your documents
- Example prompts:
- "What documents do we have in the catalog?"
- "Summarize the key topics across all documents"
- "Find information about [specific topic] in our documents"
- "What does [specific document] say about [topic]?"
The server provides these tools for interaction with the index:
catalog_search: Search for relevant documents in the catalog
chunks_search: Find relevant chunks based on a specific document from the catalogall_chunks_search: Find relevant chunks from all known documents
lance-mcp/
βββ src/
β βββ Seed/ # Seed script utilities
β βββ lancedb/ # LanceDB integration
β β βββ client.ts # OpenAI embeddings configuration
β βββ tools/ # MCP tools implementation
β βββ utils/ # Utility functions
β βββ config.ts # Configuration settings
β βββ index.ts # HTTP MCP server entry point
β βββ seed.ts # TypeScript seed script
βββ Seed Script/ # Python seed scripts
β βββ seed.py # Python seed script (Ollama)
β βββ seed_gemini.py # Gemini-based seed script
β βββ seed_openai.py # OpenAI-based seed script
β βββ requirenments.txt # Python dependencies
β βββ ReadMe.md # Python scripts documentation
βββ sample-docs/ # Sample documents for testing
βββ dist/ # Compiled JavaScript (generated)
βββ package.json # Node dependencies
βββ tsconfig.json # TypeScript configuration
npm run buildnpm run watchPORT- Server port (default: 3001)DEFAULT_DB_PATH- Default database path (default: '{path_to_lanceDB}')ALLOWED_ORIGINS- Comma-separated list of allowed CORS origins
Example:
PORT=3002 DEFAULT_DB_PATH=/path/to/db node dist/index.js-
OpenAI API Key not configured:
- Edit
src/lancedb/client.tsand add your OpenAI API key - Rebuild the project:
npm run build
- Edit
-
Embedding model mismatch:
- Ensure seed scripts and server use compatible embeddings
- Recommended: Use
seed_openai.pyfor seeding when using OpenAI embeddings in the server
-
Cursor not connecting:
- Verify
~/.cursor/mcp.jsonconfiguration - Check server is running on the correct port
- Test with:
curl http://localhost:3001/health
- Verify
-
Database path issues:
- Use absolute paths for database and file directories
- Ensure the path exists and has write permissions
-
CORS errors:
- The server is configured to accept Cursor requests
- Additional origins can be added via
ALLOWED_ORIGINSenvironment variable
-
Memory issues with large documents:
- Consider adjusting chunk size in seed scripts
- Process documents in batches
-
Python dependencies issues:
- Make sure you're in the
Seed Scriptdirectory when installing requirements - Use virtual environment if needed:
cd "Seed Script" python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirenments.txt
- Make sure you're in the
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For issues and questions:
- GitHub Issues( Original Repo ): https://github.com/adiom-data/lance-mcp/issues
- Author: Alex Komyagin (alex@adiom.io)
- Prashant Khurana [ Modified to support OPEN_AI embeddings for publicly available data. ]