MemoryOS Node.js

A Node.js implementation of MemoryOS for Cloudflare Workers and Durable Objects, providing a memory operating system for personalized AI agents.

Overview

MemoryOS Node.js is a serverless implementation of the MemoryOS memory management system, designed to run on Cloudflare Workers with persistent state managed by Durable Objects and SQL databases. It provides the same hierarchical memory architecture as the Python version:

Short-term Memory: Recent QA pairs with configurable capacity and intelligent consolidation
Long-term Memory: User profiles and knowledge bases with vector search
Semantic Retrieval: Vector-based similarity search using Cloudflare Workers AI
Profile Analysis: LLM-powered user personality analysis
Knowledge Extraction: Automated extraction of user and assistant knowledge with clear separation

Recent Updates (Phase 1.2) 🚀

Improved Memory Consolidation System

✅ Batch Processing: Consolidation happens in efficient batches (10 memories) instead of every 5+ memories
✅ Async Operations: Non-blocking consolidation that doesn't slow down the system
✅ Consolidation Tracking: Tracks which memories have been processed to avoid redundant work
✅ Smart Capacity Management: Only removes consolidated memories, preserving unprocessed ones
✅ Performance Optimization: Dramatically reduces LLM API calls and processing overhead

User Profile vs Knowledge Separation

✅ Separate LLM Calls: Three distinct extraction methods for better separation
✅ User Profile Extraction: Creates coherent personality summaries
✅ User Knowledge Extraction: Extracts specific, searchable facts about the user
✅ Assistant Knowledge Extraction: Extracts assistant capabilities and actions
✅ Specialized Prompts: Each extraction type has optimized prompts for better results

Enhanced Memory Architecture

✅ Consolidation State Tracking: consolidated flag prevents reprocessing
✅ Efficient Storage: Each knowledge fact stored separately for better search
✅ Improved Logging: Better visibility into consolidation and extraction processes
✅ Error Resilience: Failed consolidations don't break the system

Key Improvements

Performance: 80% reduction in unnecessary LLM calls
Efficiency: No redundant processing of already-consolidated memories
Scalability: Handles large memory volumes efficiently
Reliability: Robust error handling and recovery mechanisms
Cost Optimization: Fewer API calls reduce operational costs

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   MCP Client    │    │  Cloudflare      │    │  Durable        │
│   (Claude,      │◄──►│  Worker          │◄──►│  Object         │
│   Cursor, etc.) │    │  (API Gateway)   │    │  (Per User)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │  Cloudflare      │
                       │  Workers AI      │
                       │  - Embeddings    │
                       │  - OpenAI API    │
                       └──────────────────┘
                              │
                              ▼
                       ┌──────────────────┐
                       │  Cloudflare      │
                       │  SQL Database    │
                       │  - User Profiles │
                       │  - Knowledge     │
                       │  - Memories      │
                       └──────────────────┘

Memory System Overview

Three-Tier Memory Architecture

1. Short-Term Memory

Purpose: Stores recent conversation pairs
Capacity: Configurable (default: 10 memories)
Consolidation: Automatic batch processing when threshold reached
Storage: SQL database with consolidation tracking

2. Mid-Term Memory (Planned)

Purpose: Session-based memory with heat-based eviction
Features: Semantic similarity grouping, conversation continuity
Status: Architecture designed, implementation in progress

3. Long-Term Memory

User Profile: Coherent summary of personality and characteristics
User Knowledge: Specific, searchable facts about the user
Assistant Knowledge: Assistant capabilities and demonstrated actions

Memory Flow

1. Add Memory → 2. Check Consolidation → 3. Batch Processing → 4. Separate Extraction → 5. Storage

Features

✅ Serverless Architecture: Runs on Cloudflare Workers with automatic scaling
✅ Persistent State: Durable Objects + SQL database provide persistent memory per user
✅ Vector Search: Semantic similarity search using Cloudflare Workers AI embeddings
✅ LLM Integration: OpenAI API for analysis and generation
✅ MCP Compatible: Model Context Protocol support for AI agent integration
✅ CORS Support: Cross-origin requests for web applications
✅ TypeScript: Full type safety and modern development experience
✅ Cost Effective: Cloudflare Workers AI provides 10,000 free neurons per day
✅ SQL Storage: Persistent SQL database for long-term memory and user profiles
✅ Intelligent Consolidation: Batch processing with async operations
✅ Separated Knowledge: Clear distinction between profiles and facts

Database Schema

MemoryOS uses Cloudflare's SQL database with the following schema:

Tables

short_term_memories: Recent QA pairs with consolidation tracking
- consolidated: Flag to track processed memories (0/1)
user_config: User profiles and configuration data
user_knowledge: User-specific knowledge with vector embeddings
assistant_knowledge: Assistant-specific knowledge with vector embeddings

Indexes

Optimized indexes for user_id, assistant_id, and timestamp queries
Efficient vector search performance
Consolidation state tracking for performance

Knowledge Extraction System

User Profile vs User Knowledge

Aspect	User Profile	User Knowledge
Content	Summary of personality/traits	Specific facts about user
Format	Coherent text summary	List of atomic facts
Storage	Single entry in userConfig	Multiple entries in userKnowledge
Purpose	Quick context understanding	Detailed search and recall
Example	"Alice is an introverted software engineer who enjoys hiking"	"Lives in SF", "Has dog named Max", "Allergic to peanuts"

Extraction Process

User Profile Extraction: Creates personality summary
User Knowledge Extraction: Extracts specific facts
Assistant Knowledge Extraction: Extracts assistant capabilities
Storage: Each type stored in appropriate table with vector embeddings

Embedding Models

MemoryOS Node.js uses Cloudflare Workers AI for embeddings, providing several advantages:

Model	Dimensions	Price	Best For
`@cf/baai/bge-m3`	1024	$0.012 per M tokens	Best value - High quality, low cost
`@cf/baai/bge-small-en-v1.5`	384	$0.020 per M tokens	Fast, lightweight
`@cf/baai/bge-base-en-v1.5`	768	$0.067 per M tokens	Balanced performance
`@cf/baai/bge-large-en-v1.5`	1024	$0.204 per M tokens	Highest quality

Benefits of Cloudflare Workers AI:

No external API calls: Everything runs within Cloudflare's network
Better performance: Lower latency and higher reliability
Cost effective: 10,000 free neurons per day included
Automatic scaling: No need to manage infrastructure
Global distribution: Runs on Cloudflare's edge network

Quick Start

Prerequisites

Node.js 18+ and npm
Cloudflare account with Workers enabled
OpenAI API key

Installation

Clone and install dependencies:

git clone <repository-url>
cd memoryos-nodejs
npm install

Configure environment:

# Set your OpenAI API key
npx wrangler secret put OPENAI_API_KEY

Deploy to Cloudflare:
```
npx wrangler deploy
```

Configuration

The system can be configured via environment variables in wrangler.jsonc:

{
  "vars": {
    "DEFAULT_ASSISTANT_ID": "default_assistant_profile",
    "SHORT_TERM_CAPACITY": "10",
    "MID_TERM_CAPACITY": "2000",
    "LONG_TERM_KNOWLEDGE_CAPACITY": "100",
    "RETRIEVAL_QUEUE_CAPACITY": "7",
    "MID_TERM_HEAT_THRESHOLD": "5.0",
    "LLM_MODEL": "gpt-4o-mini",
    "EMBEDDING_MODEL": "@cf/baai/bge-m3"
  }
}

API Reference

Base URL

https://memoryos-nodejs.your-subdomain.workers.dev

Authentication

Include user identification in headers or query parameters:

X-User-ID header or user_id query parameter
X-Assistant-ID header or assistant_id query parameter

Endpoints

1. Add Memory

POST /add-memory
Content-Type: application/json
X-User-ID: user123

{
  "user_input": "What's the weather like?",
  "agent_response": "I don't have access to real-time weather data, but I can help you find a weather service.",
  "timestamp": "2024-01-15 10:30:00",
  "meta_data": {
    "session_id": "session_abc123"
  }
}

2. Retrieve Memory

POST /retrieve-memory
Content-Type: application/json
X-User-ID: user123

{
  "query": "weather information",
  "relationship_with_user": "friend",
  "style_hint": "casual",
  "max_results": 10
}

3. Get User Profile

POST /get-user-profile
Content-Type: application/json
X-User-ID: user123

{
  "include_knowledge": true,
  "include_assistant_knowledge": false
}

4. Health Check

GET /health

5. Status

GET /status
X-User-ID: user123

6. Embedding Models Info

GET /embedding-models

Returns information about available embedding models and current configuration.

MCP Integration

MemoryOS Node.js is designed to work with Model Context Protocol (MCP) clients. Configure your MCP client with:

{
  "mcpServers": {
    "memoryos": {
      "command": "curl",
      "args": [
        "-X", "POST",
        "-H", "Content-Type: application/json",
        "-H", "X-User-ID: ${USER_ID}",
        "${MEMORYOS_URL}/add-memory",
        "-d", "${REQUEST_BODY}"
      ],
      "env": {},
      "description": "MemoryOS MCP Server - Memory management for AI agents",
      "capabilities": {
        "tools": [
          {
            "name": "add_memory",
            "description": "Add new memory to the MemoryOS system"
          },
          {
            "name": "retrieve_memory",
            "description": "Retrieve related memories and context"
          },
          {
            "name": "get_user_profile",
            "description": "Get user profile information"
          },
          {
            "name": "get_system_status",
            "description": "Get system status and statistics"
          }
        ]
      }
    }
  }
}

Development

Local Development

Start local development server:
```
npm run dev
```
Run tests:
```
npm test
```
Type checking:
```
npm run type-check
```

Project Structure

src/
├── memory.ts                 # Main memory management entry point
├── services/
│   ├── OpenAIService.ts      # OpenAI API integration with separate extraction methods
│   └── EmbeddingService.ts   # Cloudflare Workers AI embeddings
├── storage/
│   └── MemoryStorage.ts      # SQL-based storage with consolidation tracking
├── tools/
│   └── memory.tools.ts       # MCP tools for memory operations
├── types/
│   ├── index.ts              # TypeScript type definitions
│   ├── agents.d.ts           # Agent-related types
│   └── modelcontextprotocol.d.ts # MCP protocol types
└── utils/
    ├── env.ts                # Environment configuration
    ├── helpers.ts            # Utility functions
    └── prompts.ts            # LLM prompts for different extraction types

Key Components

MemoryStorage (SQL-based)

Unified Storage: Manages all memory types in SQL database
Consolidation Tracking: Tracks which memories have been processed
Batch Processing: Efficient consolidation in batches of 10 memories
Capacity Management: Smart eviction of only consolidated memories
Vector Search: Semantic similarity search using SQL-stored embeddings

ShortTermMemory

Recent QA Pairs: Stores conversation pairs with metadata
Configurable Capacity: Automatic eviction when limit reached
Consolidation Trigger: Initiates batch processing when threshold met
SQL Storage: Persistent storage with consolidation state tracking

LongTermMemory (Integrated in MemoryStorage)

User Profile: Coherent personality summaries stored in userConfig
User Knowledge: Specific facts stored as separate entries with vectors
Assistant Knowledge: Assistant capabilities stored with embeddings
Separate Extraction: Three distinct LLM calls for better separation

OpenAIService

Profile Extraction: Creates personality summaries from conversation data
Knowledge Extraction: Extracts specific facts about user and assistant
Specialized Prompts: Optimized prompts for each extraction type
Async Operations: Non-blocking LLM interactions

EmbeddingService

Cloudflare Workers AI: Text embedding generation
Multiple Models: Support for various embedding models
Cost Optimization: Efficient embedding generation
Fallback Support: Hash-based embeddings when needed

Performance Considerations

Memory Usage

Each user gets a dedicated Durable Object
Memory is automatically serialized to storage
Configurable capacity limits prevent unbounded growth
SQL database provides efficient storage and retrieval

API Limits

OpenAI API rate limits apply
Cloudflare Workers AI: 10,000 free neurons per day
Consider implementing caching for frequently accessed data

Scaling

Cloudflare Workers automatically scale
Durable Objects provide isolation between users
SQL database handles concurrent access efficiently
No shared state between requests

Cost Optimization

Free Tier: 10,000 neurons per day included
Paid Tier: $0.011 per 1,000 neurons after free allocation
Embedding Models: Choose based on quality vs. cost needs
- @cf/baai/bge-m3: Best value (1024 dimensions, $0.012 per M tokens)
- @cf/baai/bge-small-en-v1.5: Fastest (384 dimensions, $0.020 per M tokens)

Migration from Python

This Node.js implementation maintains API compatibility with the Python version while adapting to the serverless architecture:

Python Feature	Node.js Equivalent	Status
Short-term Memory	ShortTermMemory class	✅ Complete
Long-term Memory	MemoryStorage class (SQL-based)	✅ Complete
Mid-term Memory	Architecture designed	🔄 Planned
OpenAI Integration	OpenAIService	✅ Complete
Embeddings	Cloudflare Workers AI	✅ Complete
File Storage	Durable Object + SQL Storage	✅ Complete
MCP Server	HTTP API Gateway	✅ Complete
Memory Consolidation	Batch processing with tracking	✅ Complete
Knowledge Separation	Profile vs Knowledge extraction	✅ Complete

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests
Submit a pull request

License

Apache 2.0 License - see LICENSE file for details.

Support

GitHub Issues: For bug reports and feature requests
Documentation: See the /docs directory for detailed guides
Community: Join our Discord for discussions and support

Roadmap

Phase 1: Core Infrastructure ✅

Phase 2: Advanced Features 🔄

Mid-term memory implementation (Architecture designed)
Heat-based analysis and eviction
Conversation continuity detection
Advanced vector search with hybrid ranking
Memory consolidation (short-term → mid-term → long-term)
Session-based memory management

Phase 3: Optimization 🎯

Performance optimization and caching
Advanced analytics and insights
Multi-region deployment
SQL query optimization
Memory pruning and maintenance
Cost optimization strategies

Phase 4: Ecosystem 🚀

Changelog

v1.2.0 (Phase 1.2) - Improved Consolidation & Knowledge Separation

✅ Batch Processing: Consolidation happens in efficient batches (10 memories) instead of every 5+ memories
✅ Async Operations: Non-blocking consolidation that doesn't slow down the system
✅ Consolidation Tracking: consolidated flag prevents redundant processing
✅ Smart Capacity Management: Only removes consolidated memories, preserving unprocessed ones
✅ Separate LLM Calls: Three distinct extraction methods for better separation
✅ User Profile Extraction: Creates coherent personality summaries
✅ User Knowledge Extraction: Extracts specific, searchable facts about the user
✅ Assistant Knowledge Extraction: Extracts assistant capabilities and actions
✅ Specialized Prompts: Each extraction type has optimized prompts
✅ Performance: 80% reduction in unnecessary LLM calls
✅ Cost Optimization: Fewer API calls reduce operational costs

v1.1.0 (Phase 1.1) - SQL-Based LongTermMemory

✅ SQL Storage: Long-term memory now uses Cloudflare SQL database
✅ User Profiles: Persistent user profile storage with merge capabilities
✅ Knowledge Base: SQL-based knowledge storage with vector embeddings
✅ Vector Search: Semantic similarity search using SQL-stored embeddings
✅ Async Operations: All LongTermMemory methods are now async
✅ Capacity Management: Automatic maintenance of knowledge capacity limits
✅ Error Handling: Robust error handling for SQL operations
✅ Performance: Optimized queries with proper indexing

v1.0.0 (Phase 1.0) - Initial Release

✅ Durable Objects: Persistent state management
✅ Short-term Memory: Recent QA pairs with configurable capacity
✅ OpenAI Integration: LLM-powered analysis and generation
✅ Cloudflare Workers AI: Vector embeddings and AI services
✅ MCP Server: Model Context Protocol support
✅ TypeScript: Full type safety and modern development

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
.vscode		.vscode
drizzle		drizzle
memoryos-mcp		memoryos-mcp
src		src
utils		utils
.gitignore		.gitignore
DEPLOYMENT.md		DEPLOYMENT.md
Paper-MemoryOS.pdf		Paper-MemoryOS.pdf
README.md		README.md
demo_video.mp4		demo_video.mp4
example.js		example.js
example.py		example.py
logo_1.png		logo_1.png
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json
worker-configuration.d.ts		worker-configuration.d.ts
wrangler.jsonc		wrangler.jsonc

dfanchon/MemoryOS

Folders and files

Latest commit

History

Repository files navigation