Skip to content

dfanchon/MemoryOS

 
 

Repository files navigation

MemoryOS Node.js

A Node.js implementation of MemoryOS for Cloudflare Workers and Durable Objects, providing a memory operating system for personalized AI agents.

Overview

MemoryOS Node.js is a serverless implementation of the MemoryOS memory management system, designed to run on Cloudflare Workers with persistent state managed by Durable Objects and SQL databases. It provides the same hierarchical memory architecture as the Python version:

  • Short-term Memory: Recent QA pairs with configurable capacity and intelligent consolidation
  • Long-term Memory: User profiles and knowledge bases with vector search
  • Semantic Retrieval: Vector-based similarity search using Cloudflare Workers AI
  • Profile Analysis: LLM-powered user personality analysis
  • Knowledge Extraction: Automated extraction of user and assistant knowledge with clear separation

Recent Updates (Phase 1.2) 🚀

Improved Memory Consolidation System

  • Batch Processing: Consolidation happens in efficient batches (10 memories) instead of every 5+ memories
  • Async Operations: Non-blocking consolidation that doesn't slow down the system
  • Consolidation Tracking: Tracks which memories have been processed to avoid redundant work
  • Smart Capacity Management: Only removes consolidated memories, preserving unprocessed ones
  • Performance Optimization: Dramatically reduces LLM API calls and processing overhead

User Profile vs Knowledge Separation

  • Separate LLM Calls: Three distinct extraction methods for better separation
  • User Profile Extraction: Creates coherent personality summaries
  • User Knowledge Extraction: Extracts specific, searchable facts about the user
  • Assistant Knowledge Extraction: Extracts assistant capabilities and actions
  • Specialized Prompts: Each extraction type has optimized prompts for better results

Enhanced Memory Architecture

  • Consolidation State Tracking: consolidated flag prevents reprocessing
  • Efficient Storage: Each knowledge fact stored separately for better search
  • Improved Logging: Better visibility into consolidation and extraction processes
  • Error Resilience: Failed consolidations don't break the system

Key Improvements

  • Performance: 80% reduction in unnecessary LLM calls
  • Efficiency: No redundant processing of already-consolidated memories
  • Scalability: Handles large memory volumes efficiently
  • Reliability: Robust error handling and recovery mechanisms
  • Cost Optimization: Fewer API calls reduce operational costs

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Client    │    │  Cloudflare      │    │  Durable        │
│   (Claude,      │◄──►│  Worker          │◄──►│  Object         │
│   Cursor, etc.) │    │  (API Gateway)   │    │  (Per User)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │  Cloudflare      │
                       │  Workers AI      │
                       │  - Embeddings    │
                       │  - OpenAI API    │
                       └──────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │  Cloudflare      │
                       │  SQL Database    │
                       │  - User Profiles │
                       │  - Knowledge     │
                       │  - Memories      │
                       └──────────────────┘

Memory System Overview

Three-Tier Memory Architecture

1. Short-Term Memory

  • Purpose: Stores recent conversation pairs
  • Capacity: Configurable (default: 10 memories)
  • Consolidation: Automatic batch processing when threshold reached
  • Storage: SQL database with consolidation tracking

2. Mid-Term Memory (Planned)

  • Purpose: Session-based memory with heat-based eviction
  • Features: Semantic similarity grouping, conversation continuity
  • Status: Architecture designed, implementation in progress

3. Long-Term Memory

  • User Profile: Coherent summary of personality and characteristics
  • User Knowledge: Specific, searchable facts about the user
  • Assistant Knowledge: Assistant capabilities and demonstrated actions

Memory Flow

1. Add Memory → 2. Check Consolidation → 3. Batch Processing → 4. Separate Extraction → 5. Storage

Features

  • Serverless Architecture: Runs on Cloudflare Workers with automatic scaling
  • Persistent State: Durable Objects + SQL database provide persistent memory per user
  • Vector Search: Semantic similarity search using Cloudflare Workers AI embeddings
  • LLM Integration: OpenAI API for analysis and generation
  • MCP Compatible: Model Context Protocol support for AI agent integration
  • CORS Support: Cross-origin requests for web applications
  • TypeScript: Full type safety and modern development experience
  • Cost Effective: Cloudflare Workers AI provides 10,000 free neurons per day
  • SQL Storage: Persistent SQL database for long-term memory and user profiles
  • Intelligent Consolidation: Batch processing with async operations
  • Separated Knowledge: Clear distinction between profiles and facts

Database Schema

MemoryOS uses Cloudflare's SQL database with the following schema:

Tables

  • short_term_memories: Recent QA pairs with consolidation tracking
    • consolidated: Flag to track processed memories (0/1)
  • user_config: User profiles and configuration data
  • user_knowledge: User-specific knowledge with vector embeddings
  • assistant_knowledge: Assistant-specific knowledge with vector embeddings

Indexes

  • Optimized indexes for user_id, assistant_id, and timestamp queries
  • Efficient vector search performance
  • Consolidation state tracking for performance

Knowledge Extraction System

User Profile vs User Knowledge

Aspect User Profile User Knowledge
Content Summary of personality/traits Specific facts about user
Format Coherent text summary List of atomic facts
Storage Single entry in userConfig Multiple entries in userKnowledge
Purpose Quick context understanding Detailed search and recall
Example "Alice is an introverted software engineer who enjoys hiking" "Lives in SF", "Has dog named Max", "Allergic to peanuts"

Extraction Process

  1. User Profile Extraction: Creates personality summary
  2. User Knowledge Extraction: Extracts specific facts
  3. Assistant Knowledge Extraction: Extracts assistant capabilities
  4. Storage: Each type stored in appropriate table with vector embeddings

Embedding Models

MemoryOS Node.js uses Cloudflare Workers AI for embeddings, providing several advantages:

Model Dimensions Price Best For
@cf/baai/bge-m3 1024 $0.012 per M tokens Best value - High quality, low cost
@cf/baai/bge-small-en-v1.5 384 $0.020 per M tokens Fast, lightweight
@cf/baai/bge-base-en-v1.5 768 $0.067 per M tokens Balanced performance
@cf/baai/bge-large-en-v1.5 1024 $0.204 per M tokens Highest quality

Benefits of Cloudflare Workers AI:

  • No external API calls: Everything runs within Cloudflare's network
  • Better performance: Lower latency and higher reliability
  • Cost effective: 10,000 free neurons per day included
  • Automatic scaling: No need to manage infrastructure
  • Global distribution: Runs on Cloudflare's edge network

Quick Start

Prerequisites

  • Node.js 18+ and npm
  • Cloudflare account with Workers enabled
  • OpenAI API key

Installation

  1. Clone and install dependencies:

    git clone <repository-url>
    cd memoryos-nodejs
    npm install
  2. Configure environment:

    # Set your OpenAI API key
    npx wrangler secret put OPENAI_API_KEY
  3. Deploy to Cloudflare:

    npx wrangler deploy

Configuration

The system can be configured via environment variables in wrangler.jsonc:

{
  "vars": {
    "DEFAULT_ASSISTANT_ID": "default_assistant_profile",
    "SHORT_TERM_CAPACITY": "10",
    "MID_TERM_CAPACITY": "2000",
    "LONG_TERM_KNOWLEDGE_CAPACITY": "100",
    "RETRIEVAL_QUEUE_CAPACITY": "7",
    "MID_TERM_HEAT_THRESHOLD": "5.0",
    "LLM_MODEL": "gpt-4o-mini",
    "EMBEDDING_MODEL": "@cf/baai/bge-m3"
  }
}

API Reference

Base URL

https://memoryos-nodejs.your-subdomain.workers.dev

Authentication

Include user identification in headers or query parameters:

  • X-User-ID header or user_id query parameter
  • X-Assistant-ID header or assistant_id query parameter

Endpoints

1. Add Memory

POST /add-memory
Content-Type: application/json
X-User-ID: user123

{
  "user_input": "What's the weather like?",
  "agent_response": "I don't have access to real-time weather data, but I can help you find a weather service.",
  "timestamp": "2024-01-15 10:30:00",
  "meta_data": {
    "session_id": "session_abc123"
  }
}

2. Retrieve Memory

POST /retrieve-memory
Content-Type: application/json
X-User-ID: user123

{
  "query": "weather information",
  "relationship_with_user": "friend",
  "style_hint": "casual",
  "max_results": 10
}

3. Get User Profile

POST /get-user-profile
Content-Type: application/json
X-User-ID: user123

{
  "include_knowledge": true,
  "include_assistant_knowledge": false
}

4. Health Check

GET /health

5. Status

GET /status
X-User-ID: user123

6. Embedding Models Info

GET /embedding-models

Returns information about available embedding models and current configuration.

MCP Integration

MemoryOS Node.js is designed to work with Model Context Protocol (MCP) clients. Configure your MCP client with:

{
  "mcpServers": {
    "memoryos": {
      "command": "curl",
      "args": [
        "-X", "POST",
        "-H", "Content-Type: application/json",
        "-H", "X-User-ID: ${USER_ID}",
        "${MEMORYOS_URL}/add-memory",
        "-d", "${REQUEST_BODY}"
      ],
      "env": {},
      "description": "MemoryOS MCP Server - Memory management for AI agents",
      "capabilities": {
        "tools": [
          {
            "name": "add_memory",
            "description": "Add new memory to the MemoryOS system"
          },
          {
            "name": "retrieve_memory",
            "description": "Retrieve related memories and context"
          },
          {
            "name": "get_user_profile",
            "description": "Get user profile information"
          },
          {
            "name": "get_system_status",
            "description": "Get system status and statistics"
          }
        ]
      }
    }
  }
}

Development

Local Development

  1. Start local development server:

    npm run dev
  2. Run tests:

    npm test
  3. Type checking:

    npm run type-check

Project Structure

src/
├── memory.ts                 # Main memory management entry point
├── services/
│   ├── OpenAIService.ts      # OpenAI API integration with separate extraction methods
│   └── EmbeddingService.ts   # Cloudflare Workers AI embeddings
├── storage/
│   └── MemoryStorage.ts      # SQL-based storage with consolidation tracking
├── tools/
│   └── memory.tools.ts       # MCP tools for memory operations
├── types/
│   ├── index.ts              # TypeScript type definitions
│   ├── agents.d.ts           # Agent-related types
│   └── modelcontextprotocol.d.ts # MCP protocol types
└── utils/
    ├── env.ts                # Environment configuration
    ├── helpers.ts            # Utility functions
    └── prompts.ts            # LLM prompts for different extraction types

Key Components

MemoryStorage (SQL-based)

  • Unified Storage: Manages all memory types in SQL database
  • Consolidation Tracking: Tracks which memories have been processed
  • Batch Processing: Efficient consolidation in batches of 10 memories
  • Capacity Management: Smart eviction of only consolidated memories
  • Vector Search: Semantic similarity search using SQL-stored embeddings

ShortTermMemory

  • Recent QA Pairs: Stores conversation pairs with metadata
  • Configurable Capacity: Automatic eviction when limit reached
  • Consolidation Trigger: Initiates batch processing when threshold met
  • SQL Storage: Persistent storage with consolidation state tracking

LongTermMemory (Integrated in MemoryStorage)

  • User Profile: Coherent personality summaries stored in userConfig
  • User Knowledge: Specific facts stored as separate entries with vectors
  • Assistant Knowledge: Assistant capabilities stored with embeddings
  • Separate Extraction: Three distinct LLM calls for better separation

OpenAIService

  • Profile Extraction: Creates personality summaries from conversation data
  • Knowledge Extraction: Extracts specific facts about user and assistant
  • Specialized Prompts: Optimized prompts for each extraction type
  • Async Operations: Non-blocking LLM interactions

EmbeddingService

  • Cloudflare Workers AI: Text embedding generation
  • Multiple Models: Support for various embedding models
  • Cost Optimization: Efficient embedding generation
  • Fallback Support: Hash-based embeddings when needed

Performance Considerations

Memory Usage

  • Each user gets a dedicated Durable Object
  • Memory is automatically serialized to storage
  • Configurable capacity limits prevent unbounded growth
  • SQL database provides efficient storage and retrieval

API Limits

  • OpenAI API rate limits apply
  • Cloudflare Workers AI: 10,000 free neurons per day
  • Consider implementing caching for frequently accessed data

Scaling

  • Cloudflare Workers automatically scale
  • Durable Objects provide isolation between users
  • SQL database handles concurrent access efficiently
  • No shared state between requests

Cost Optimization

  • Free Tier: 10,000 neurons per day included
  • Paid Tier: $0.011 per 1,000 neurons after free allocation
  • Embedding Models: Choose based on quality vs. cost needs
    • @cf/baai/bge-m3: Best value (1024 dimensions, $0.012 per M tokens)
    • @cf/baai/bge-small-en-v1.5: Fastest (384 dimensions, $0.020 per M tokens)

Migration from Python

This Node.js implementation maintains API compatibility with the Python version while adapting to the serverless architecture:

Python Feature Node.js Equivalent Status
Short-term Memory ShortTermMemory class ✅ Complete
Long-term Memory MemoryStorage class (SQL-based) ✅ Complete
Mid-term Memory Architecture designed 🔄 Planned
OpenAI Integration OpenAIService ✅ Complete
Embeddings Cloudflare Workers AI ✅ Complete
File Storage Durable Object + SQL Storage ✅ Complete
MCP Server HTTP API Gateway ✅ Complete
Memory Consolidation Batch processing with tracking ✅ Complete
Knowledge Separation Profile vs Knowledge extraction ✅ Complete

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Submit a pull request

License

Apache 2.0 License - see LICENSE file for details.

Support

  • GitHub Issues: For bug reports and feature requests
  • Documentation: See the /docs directory for detailed guides
  • Community: Join our Discord for discussions and support

Roadmap

Phase 1: Core Infrastructure ✅

  • Durable Object setup
  • Basic memory layers
  • OpenAI integration
  • Cloudflare Workers AI embeddings
  • MCP server
  • SQL-based LongTermMemory
  • Improved Consolidation System
  • User Profile vs Knowledge Separation

Phase 2: Advanced Features 🔄

  • Mid-term memory implementation (Architecture designed)
  • Heat-based analysis and eviction
  • Conversation continuity detection
  • Advanced vector search with hybrid ranking
  • Memory consolidation (short-term → mid-term → long-term)
  • Session-based memory management

Phase 3: Optimization 🎯

  • Performance optimization and caching
  • Advanced analytics and insights
  • Multi-region deployment
  • SQL query optimization
  • Memory pruning and maintenance
  • Cost optimization strategies

Phase 4: Ecosystem 🚀

  • SDK for popular frameworks
  • Dashboard and monitoring
  • Advanced MCP integrations
  • Community plugins
  • Advanced SQL analytics
  • Multi-user support and sharing

Changelog

v1.2.0 (Phase 1.2) - Improved Consolidation & Knowledge Separation

  • Batch Processing: Consolidation happens in efficient batches (10 memories) instead of every 5+ memories
  • Async Operations: Non-blocking consolidation that doesn't slow down the system
  • Consolidation Tracking: consolidated flag prevents redundant processing
  • Smart Capacity Management: Only removes consolidated memories, preserving unprocessed ones
  • Separate LLM Calls: Three distinct extraction methods for better separation
  • User Profile Extraction: Creates coherent personality summaries
  • User Knowledge Extraction: Extracts specific, searchable facts about the user
  • Assistant Knowledge Extraction: Extracts assistant capabilities and actions
  • Specialized Prompts: Each extraction type has optimized prompts
  • Performance: 80% reduction in unnecessary LLM calls
  • Cost Optimization: Fewer API calls reduce operational costs

v1.1.0 (Phase 1.1) - SQL-Based LongTermMemory

  • SQL Storage: Long-term memory now uses Cloudflare SQL database
  • User Profiles: Persistent user profile storage with merge capabilities
  • Knowledge Base: SQL-based knowledge storage with vector embeddings
  • Vector Search: Semantic similarity search using SQL-stored embeddings
  • Async Operations: All LongTermMemory methods are now async
  • Capacity Management: Automatic maintenance of knowledge capacity limits
  • Error Handling: Robust error handling for SQL operations
  • Performance: Optimized queries with proper indexing

v1.0.0 (Phase 1.0) - Initial Release

  • Durable Objects: Persistent state management
  • Short-term Memory: Recent QA pairs with configurable capacity
  • OpenAI Integration: LLM-powered analysis and generation
  • Cloudflare Workers AI: Vector embeddings and AI services
  • MCP Server: Model Context Protocol support
  • TypeScript: Full type safety and modern development

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 60.0%
  • TypeScript 34.4%
  • JavaScript 5.4%
  • Shell 0.2%