A Node.js implementation of MemoryOS for Cloudflare Workers and Durable Objects, providing a memory operating system for personalized AI agents.
MemoryOS Node.js is a serverless implementation of the MemoryOS memory management system, designed to run on Cloudflare Workers with persistent state managed by Durable Objects and SQL databases. It provides the same hierarchical memory architecture as the Python version:
- Short-term Memory: Recent QA pairs with configurable capacity and intelligent consolidation
- Long-term Memory: User profiles and knowledge bases with vector search
- Semantic Retrieval: Vector-based similarity search using Cloudflare Workers AI
- Profile Analysis: LLM-powered user personality analysis
- Knowledge Extraction: Automated extraction of user and assistant knowledge with clear separation
- ✅ Batch Processing: Consolidation happens in efficient batches (10 memories) instead of every 5+ memories
- ✅ Async Operations: Non-blocking consolidation that doesn't slow down the system
- ✅ Consolidation Tracking: Tracks which memories have been processed to avoid redundant work
- ✅ Smart Capacity Management: Only removes consolidated memories, preserving unprocessed ones
- ✅ Performance Optimization: Dramatically reduces LLM API calls and processing overhead
- ✅ Separate LLM Calls: Three distinct extraction methods for better separation
- ✅ User Profile Extraction: Creates coherent personality summaries
- ✅ User Knowledge Extraction: Extracts specific, searchable facts about the user
- ✅ Assistant Knowledge Extraction: Extracts assistant capabilities and actions
- ✅ Specialized Prompts: Each extraction type has optimized prompts for better results
- ✅ Consolidation State Tracking:
consolidatedflag prevents reprocessing - ✅ Efficient Storage: Each knowledge fact stored separately for better search
- ✅ Improved Logging: Better visibility into consolidation and extraction processes
- ✅ Error Resilience: Failed consolidations don't break the system
- Performance: 80% reduction in unnecessary LLM calls
- Efficiency: No redundant processing of already-consolidated memories
- Scalability: Handles large memory volumes efficiently
- Reliability: Robust error handling and recovery mechanisms
- Cost Optimization: Fewer API calls reduce operational costs
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ MCP Client │ │ Cloudflare │ │ Durable │
│ (Claude, │◄──►│ Worker │◄──►│ Object │
│ Cursor, etc.) │ │ (API Gateway) │ │ (Per User) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ Cloudflare │
│ Workers AI │
│ - Embeddings │
│ - OpenAI API │
└──────────────────┘
│
▼
┌──────────────────┐
│ Cloudflare │
│ SQL Database │
│ - User Profiles │
│ - Knowledge │
│ - Memories │
└──────────────────┘
- Purpose: Stores recent conversation pairs
- Capacity: Configurable (default: 10 memories)
- Consolidation: Automatic batch processing when threshold reached
- Storage: SQL database with consolidation tracking
- Purpose: Session-based memory with heat-based eviction
- Features: Semantic similarity grouping, conversation continuity
- Status: Architecture designed, implementation in progress
- User Profile: Coherent summary of personality and characteristics
- User Knowledge: Specific, searchable facts about the user
- Assistant Knowledge: Assistant capabilities and demonstrated actions
1. Add Memory → 2. Check Consolidation → 3. Batch Processing → 4. Separate Extraction → 5. Storage
- ✅ Serverless Architecture: Runs on Cloudflare Workers with automatic scaling
- ✅ Persistent State: Durable Objects + SQL database provide persistent memory per user
- ✅ Vector Search: Semantic similarity search using Cloudflare Workers AI embeddings
- ✅ LLM Integration: OpenAI API for analysis and generation
- ✅ MCP Compatible: Model Context Protocol support for AI agent integration
- ✅ CORS Support: Cross-origin requests for web applications
- ✅ TypeScript: Full type safety and modern development experience
- ✅ Cost Effective: Cloudflare Workers AI provides 10,000 free neurons per day
- ✅ SQL Storage: Persistent SQL database for long-term memory and user profiles
- ✅ Intelligent Consolidation: Batch processing with async operations
- ✅ Separated Knowledge: Clear distinction between profiles and facts
MemoryOS uses Cloudflare's SQL database with the following schema:
short_term_memories: Recent QA pairs with consolidation trackingconsolidated: Flag to track processed memories (0/1)
user_config: User profiles and configuration datauser_knowledge: User-specific knowledge with vector embeddingsassistant_knowledge: Assistant-specific knowledge with vector embeddings
- Optimized indexes for user_id, assistant_id, and timestamp queries
- Efficient vector search performance
- Consolidation state tracking for performance
| Aspect | User Profile | User Knowledge |
|---|---|---|
| Content | Summary of personality/traits | Specific facts about user |
| Format | Coherent text summary | List of atomic facts |
| Storage | Single entry in userConfig | Multiple entries in userKnowledge |
| Purpose | Quick context understanding | Detailed search and recall |
| Example | "Alice is an introverted software engineer who enjoys hiking" | "Lives in SF", "Has dog named Max", "Allergic to peanuts" |
- User Profile Extraction: Creates personality summary
- User Knowledge Extraction: Extracts specific facts
- Assistant Knowledge Extraction: Extracts assistant capabilities
- Storage: Each type stored in appropriate table with vector embeddings
MemoryOS Node.js uses Cloudflare Workers AI for embeddings, providing several advantages:
| Model | Dimensions | Price | Best For |
|---|---|---|---|
@cf/baai/bge-m3 |
1024 | $0.012 per M tokens | Best value - High quality, low cost |
@cf/baai/bge-small-en-v1.5 |
384 | $0.020 per M tokens | Fast, lightweight |
@cf/baai/bge-base-en-v1.5 |
768 | $0.067 per M tokens | Balanced performance |
@cf/baai/bge-large-en-v1.5 |
1024 | $0.204 per M tokens | Highest quality |
Benefits of Cloudflare Workers AI:
- No external API calls: Everything runs within Cloudflare's network
- Better performance: Lower latency and higher reliability
- Cost effective: 10,000 free neurons per day included
- Automatic scaling: No need to manage infrastructure
- Global distribution: Runs on Cloudflare's edge network
- Node.js 18+ and npm
- Cloudflare account with Workers enabled
- OpenAI API key
-
Clone and install dependencies:
git clone <repository-url> cd memoryos-nodejs npm install
-
Configure environment:
# Set your OpenAI API key npx wrangler secret put OPENAI_API_KEY -
Deploy to Cloudflare:
npx wrangler deploy
The system can be configured via environment variables in wrangler.jsonc:
https://memoryos-nodejs.your-subdomain.workers.dev
Include user identification in headers or query parameters:
X-User-IDheader oruser_idquery parameterX-Assistant-IDheader orassistant_idquery parameter
POST /add-memory
Content-Type: application/json
X-User-ID: user123
{
"user_input": "What's the weather like?",
"agent_response": "I don't have access to real-time weather data, but I can help you find a weather service.",
"timestamp": "2024-01-15 10:30:00",
"meta_data": {
"session_id": "session_abc123"
}
}POST /retrieve-memory
Content-Type: application/json
X-User-ID: user123
{
"query": "weather information",
"relationship_with_user": "friend",
"style_hint": "casual",
"max_results": 10
}POST /get-user-profile
Content-Type: application/json
X-User-ID: user123
{
"include_knowledge": true,
"include_assistant_knowledge": false
}GET /healthGET /status
X-User-ID: user123GET /embedding-modelsReturns information about available embedding models and current configuration.
MemoryOS Node.js is designed to work with Model Context Protocol (MCP) clients. Configure your MCP client with:
{
"mcpServers": {
"memoryos": {
"command": "curl",
"args": [
"-X", "POST",
"-H", "Content-Type: application/json",
"-H", "X-User-ID: ${USER_ID}",
"${MEMORYOS_URL}/add-memory",
"-d", "${REQUEST_BODY}"
],
"env": {},
"description": "MemoryOS MCP Server - Memory management for AI agents",
"capabilities": {
"tools": [
{
"name": "add_memory",
"description": "Add new memory to the MemoryOS system"
},
{
"name": "retrieve_memory",
"description": "Retrieve related memories and context"
},
{
"name": "get_user_profile",
"description": "Get user profile information"
},
{
"name": "get_system_status",
"description": "Get system status and statistics"
}
]
}
}
}
}-
Start local development server:
npm run dev
-
Run tests:
npm test -
Type checking:
npm run type-check
src/
├── memory.ts # Main memory management entry point
├── services/
│ ├── OpenAIService.ts # OpenAI API integration with separate extraction methods
│ └── EmbeddingService.ts # Cloudflare Workers AI embeddings
├── storage/
│ └── MemoryStorage.ts # SQL-based storage with consolidation tracking
├── tools/
│ └── memory.tools.ts # MCP tools for memory operations
├── types/
│ ├── index.ts # TypeScript type definitions
│ ├── agents.d.ts # Agent-related types
│ └── modelcontextprotocol.d.ts # MCP protocol types
└── utils/
├── env.ts # Environment configuration
├── helpers.ts # Utility functions
└── prompts.ts # LLM prompts for different extraction types
- Unified Storage: Manages all memory types in SQL database
- Consolidation Tracking: Tracks which memories have been processed
- Batch Processing: Efficient consolidation in batches of 10 memories
- Capacity Management: Smart eviction of only consolidated memories
- Vector Search: Semantic similarity search using SQL-stored embeddings
- Recent QA Pairs: Stores conversation pairs with metadata
- Configurable Capacity: Automatic eviction when limit reached
- Consolidation Trigger: Initiates batch processing when threshold met
- SQL Storage: Persistent storage with consolidation state tracking
- User Profile: Coherent personality summaries stored in userConfig
- User Knowledge: Specific facts stored as separate entries with vectors
- Assistant Knowledge: Assistant capabilities stored with embeddings
- Separate Extraction: Three distinct LLM calls for better separation
- Profile Extraction: Creates personality summaries from conversation data
- Knowledge Extraction: Extracts specific facts about user and assistant
- Specialized Prompts: Optimized prompts for each extraction type
- Async Operations: Non-blocking LLM interactions
- Cloudflare Workers AI: Text embedding generation
- Multiple Models: Support for various embedding models
- Cost Optimization: Efficient embedding generation
- Fallback Support: Hash-based embeddings when needed
- Each user gets a dedicated Durable Object
- Memory is automatically serialized to storage
- Configurable capacity limits prevent unbounded growth
- SQL database provides efficient storage and retrieval
- OpenAI API rate limits apply
- Cloudflare Workers AI: 10,000 free neurons per day
- Consider implementing caching for frequently accessed data
- Cloudflare Workers automatically scale
- Durable Objects provide isolation between users
- SQL database handles concurrent access efficiently
- No shared state between requests
- Free Tier: 10,000 neurons per day included
- Paid Tier: $0.011 per 1,000 neurons after free allocation
- Embedding Models: Choose based on quality vs. cost needs
@cf/baai/bge-m3: Best value (1024 dimensions, $0.012 per M tokens)@cf/baai/bge-small-en-v1.5: Fastest (384 dimensions, $0.020 per M tokens)
This Node.js implementation maintains API compatibility with the Python version while adapting to the serverless architecture:
| Python Feature | Node.js Equivalent | Status |
|---|---|---|
| Short-term Memory | ShortTermMemory class | ✅ Complete |
| Long-term Memory | MemoryStorage class (SQL-based) | ✅ Complete |
| Mid-term Memory | Architecture designed | 🔄 Planned |
| OpenAI Integration | OpenAIService | ✅ Complete |
| Embeddings | Cloudflare Workers AI | ✅ Complete |
| File Storage | Durable Object + SQL Storage | ✅ Complete |
| MCP Server | HTTP API Gateway | ✅ Complete |
| Memory Consolidation | Batch processing with tracking | ✅ Complete |
| Knowledge Separation | Profile vs Knowledge extraction | ✅ Complete |
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
Apache 2.0 License - see LICENSE file for details.
- GitHub Issues: For bug reports and feature requests
- Documentation: See the
/docsdirectory for detailed guides - Community: Join our Discord for discussions and support
- Durable Object setup
- Basic memory layers
- OpenAI integration
- Cloudflare Workers AI embeddings
- MCP server
- SQL-based LongTermMemory ✅
- Improved Consolidation System ✅
- User Profile vs Knowledge Separation ✅
- Mid-term memory implementation (Architecture designed)
- Heat-based analysis and eviction
- Conversation continuity detection
- Advanced vector search with hybrid ranking
- Memory consolidation (short-term → mid-term → long-term)
- Session-based memory management
- Performance optimization and caching
- Advanced analytics and insights
- Multi-region deployment
- SQL query optimization
- Memory pruning and maintenance
- Cost optimization strategies
- SDK for popular frameworks
- Dashboard and monitoring
- Advanced MCP integrations
- Community plugins
- Advanced SQL analytics
- Multi-user support and sharing
- ✅ Batch Processing: Consolidation happens in efficient batches (10 memories) instead of every 5+ memories
- ✅ Async Operations: Non-blocking consolidation that doesn't slow down the system
- ✅ Consolidation Tracking:
consolidatedflag prevents redundant processing - ✅ Smart Capacity Management: Only removes consolidated memories, preserving unprocessed ones
- ✅ Separate LLM Calls: Three distinct extraction methods for better separation
- ✅ User Profile Extraction: Creates coherent personality summaries
- ✅ User Knowledge Extraction: Extracts specific, searchable facts about the user
- ✅ Assistant Knowledge Extraction: Extracts assistant capabilities and actions
- ✅ Specialized Prompts: Each extraction type has optimized prompts
- ✅ Performance: 80% reduction in unnecessary LLM calls
- ✅ Cost Optimization: Fewer API calls reduce operational costs
- ✅ SQL Storage: Long-term memory now uses Cloudflare SQL database
- ✅ User Profiles: Persistent user profile storage with merge capabilities
- ✅ Knowledge Base: SQL-based knowledge storage with vector embeddings
- ✅ Vector Search: Semantic similarity search using SQL-stored embeddings
- ✅ Async Operations: All LongTermMemory methods are now async
- ✅ Capacity Management: Automatic maintenance of knowledge capacity limits
- ✅ Error Handling: Robust error handling for SQL operations
- ✅ Performance: Optimized queries with proper indexing
- ✅ Durable Objects: Persistent state management
- ✅ Short-term Memory: Recent QA pairs with configurable capacity
- ✅ OpenAI Integration: LLM-powered analysis and generation
- ✅ Cloudflare Workers AI: Vector embeddings and AI services
- ✅ MCP Server: Model Context Protocol support
- ✅ TypeScript: Full type safety and modern development
{ "vars": { "DEFAULT_ASSISTANT_ID": "default_assistant_profile", "SHORT_TERM_CAPACITY": "10", "MID_TERM_CAPACITY": "2000", "LONG_TERM_KNOWLEDGE_CAPACITY": "100", "RETRIEVAL_QUEUE_CAPACITY": "7", "MID_TERM_HEAT_THRESHOLD": "5.0", "LLM_MODEL": "gpt-4o-mini", "EMBEDDING_MODEL": "@cf/baai/bge-m3" } }