Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Overview

Distributed KV, Built As One System

NoKV

Launch View
Embedded Use NoKV.Open as a serious local engine.
Seeded Promote an existing workdir into a distributed seed.
Replicated Roll out peers, move leaders, and verify recovery.

NoKV starts as a serious standalone engine and grows into a multi-Raft distributed KV database without swapping out its storage core. That is the hook: WAL, LSM, MVCC, migration, replication, and control-plane behavior are treated as one system, not a pile of loosely connected features.

Standalone to Cluster Seed a distributed region from an existing workdir and keep the same storage layer.
Correctness First Mode gates, logical region snapshots, recovery metadata, and execution/control-plane split.
Tested as a System Migration flow, restart recovery, Coordinator degradation, transport chaos, and publish-boundary failpoints.
Start Here

Run a local cluster first, then follow the standalone → seeded → cluster path.

What You Can Actually Do

Use NoKV in three different ways

  • Embed it locally through NoKV.Open.
  • Start a multi-node cluster with scripts/dev/cluster.sh.
  • Take an existing standalone workdir and migrate it into a replicated region.
What To Look For

What makes this project worth reading

  • One storage layer instead of separate standalone and distributed engines.
  • Formal lifecycle and migration protocol instead of dump/import glue.
  • System-level verification under restart, degraded Coordinator, chaos, and failpoints.

What Matters

Why NoKV

Three reasons this project is interesting

NoKV is not trying to be a feature checklist.

It is trying to answer a narrower and harder question well: can one storage core grow from an embedded engine into a distributed multi-Raft KV without turning migration, metadata, and recovery into glue code?

  • One storage layer across standalone and distributed modes.
  • Explicit lifecycle and migration semantics instead of hidden bootstrap magic.
  • Verification aimed at restart, degraded control plane, and publish-boundary correctness.
Storage Story

One data plane, two deployment shapes

NoKV does not fork into separate standalone and distributed engines. The distributed layer grows on top of the same underlying DB workdir.

That is why migration can be a protocol instead of a dump/import afterthought.
Runtime Ownership

Replication with clear ownership

Store owns the node runtime, Peer owns a region replica runtime, RaftAdmin is the execution plane, and Coordinator stays in the control plane.

The system avoids mixing local truth, local recovery metadata, and cluster control metadata.
Migration Primitive

Logical region snapshots

Raft durable snapshot metadata is split from logical region state snapshots, which keeps migration, add-peer install, and recovery semantics clean.

This is a correctness-first design, not a one-shot performance shortcut.
Validation Surface

System-level validation

The project is tested beyond unit semantics: migration flow, restart safety, degraded Coordinator behavior, transport chaos, and context propagation are all exercised.

The goal is to verify lifecycle and recovery behavior, not just happy-path RPCs.
Benchmark methodology and result snapshots live in ../benchmark/README.md. The docs site keeps architecture and operating guidance separate from benchmark storytelling.

Fastest Path

Try NoKV In Five Minutes

Fastest Demo Loop

Boot a cluster, front it with Redis, inspect the runtime.

If you only want one practical path, this is the shortest route from clone to “I can see the system running”. It uses the local cluster helper, the Redis-compatible gateway, and the built-in runtime inspection commands.

local cluster redis gateway runtime inspect
01

Start the cluster

Use the shared topology file and bring up the local Coordinator + store layout.

02

Expose a familiar interface

Run the Redis-compatible gateway so you can talk to NoKV with an off-the-shelf client.

03

Inspect the system

Query stats and region ownership so the demo ends with visibility instead of blind writes.

# 1. Start a local cluster from the shared topology file
./scripts/dev/cluster.sh --config ./raft_config.example.json

# 2. In another terminal, front it with the Redis-compatible gateway
go run ./cmd/nokv-redis \
  --addr 127.0.0.1:6380 \
  --raft-config ./raft_config.example.json

# 3. Talk to NoKV with any Redis client
redis-cli -p 6380 set hello world
redis-cli -p 6380 get hello

# 4. Inspect the running cluster
go run ./cmd/nokv stats --expvar http://127.0.0.1:9100
go run ./cmd/nokv regions --workdir ./artifacts/cluster/store-1

Read This Next

Documentation Guide

If you only read three pages, read these first:

  1. Getting Started for the shortest path to a running cluster.
  2. Raftstore for runtime ownership and distributed boundaries.
  3. Migration for the standalone → cluster bridge that makes NoKV distinct.

Getting Started

Run NoKV locally, understand the topology file, and boot your first store or local cluster.

Raftstore

Read the distributed runtime layout: server wiring, store ownership, peer lifecycle, snapshots, and recovery surfaces.

Migration

Follow the standalone → seeded → cluster path, including SST snapshot install and membership rollout.

Testing

See how deterministic integration, failpoints, restart recovery, and distributed fault matrix coverage are organized.

Choose Your Route

Read By Interest

Storage Engine

Read this route if you care about WAL discipline, MemTable/flush, manifest semantics, and ValueLog GC.

Architecture · WAL · Flush · Value Log

Distributed Runtime

Read this route if Store/Peer ownership, transport, snapshots, and Coordinator are the parts you want to reason about.

Raftstore · Coordinator · Runtime

Migration & Operations

Read this route if the bridge from standalone workdir to replicated region is the part you want to demo or operate.

Migration · Scripts · CLI

Testing & Validation

Read this route if you want to see how NoKV verifies correctness under restart, degraded Coordinator, chaos, and failpoint boundaries.

Testing · Notes

Common Paths

Jump Points

Layer View

Architecture Sketch

%%{init: {
  "themeVariables": { "fontSize": "18px" },
  "flowchart": { "nodeSpacing": 45, "rankSpacing": 62, "curve": "basis" }
}}%%
graph TD
    Client["Client / App / Redis"] -->|RPC / RESP| Server["Node Server"]
    Client -->|Route / TSO / control queries| Coordinator["Coordinator"]

    subgraph "Distributed Runtime"
        Server --> Store["Store runtime root"]
        Store --> Peer["Peer runtime"]
        Store --> Admin["RaftAdmin"]
        Store --> Meta["Local recovery metadata"]
        Peer --> Raft["Raft durable state"]
        Peer --> Snap["Logical region snapshot"]
    end

    subgraph "Shared Data Plane"
        Peer --> DB["NoKV DB"]
        Snap --> DB
        DB --> LSM["LSM / WAL / VLog / MVCC"]
    end
The central design choice is simple: NoKV is not a separate standalone engine and distributed product glued together later. The distributed system is built over the same storage core, with migration and snapshot semantics made explicit instead of implicit.