A microVM sandbox runtime for AI agent fan-out. Children fork from a warmed parent snapshot, inheriting its address space copy-on-write instead of cold-booting their own kernel.
forkd is built on Firecracker. The parent VM boots once, imports
your runtime (Python + your dependencies, a JIT-warmed JVM, an
already-loaded ML model) and is paused to disk. Each child is a
separate Firecracker process that mmaps the parent's memory image
with MAP_PRIVATE; the kernel implements copy-on-write at the page
level, so children share the parent's resident memory until they
diverge.
The result is two properties at once: per-child KVM isolation, and a
spawn cost that's closer to fork(2) than to a cold-boot VM.
- Hardware isolation. Each child is its own Firecracker microVM
backed by KVM. Escape requires a hypervisor or kernel vulnerability,
not a
runcregression. - Warmed runtimes inherit for free. Imports, JIT compilation, model weights, prefetched caches — anything the parent did is already resident in the child.
- Real Linux per child. Multi-vCPU, full TCP networking,
apt install, outbound HTTPS. Unlike function-level snapshot runtimes that trade single-vCPU + serial-I/O for raw spawn speed, forkd children can run real Python servers, model inference, or any workload that needs a full kernel. - Multi-tenant by construction. Per-child network namespace, per-
child cgroup v2 memory limit, independent
/dev/urandomre-seeded byvmgenid(Linux 5.20+). - Built for agent fan-out. AI agent workloads that fan out into
many short-lived sandboxes — code-interpreter, tool-use, evaluation
rollouts — are the design point. The warmed parent collapses the
per-request
import numpy/import torchcost across the entire cohort. - Operable. Daemon process owning state, REST API on Unix or TCP,
Prometheus
/metrics, append-only JSON audit log, systemd unit. - Open source. Apache 2.0, no vendor SDK.
Same Linux host (Ubuntu 24.04, Linux 6.14, 20 vCPU, 30 GiB, KVM).
Workload: spawn 100 sandboxes that each run import numpy; numpy.zeros(5).tolist().
| Backend | Wall-clock at N=100 | Memory delta per sandbox | Notes |
|---|---|---|---|
| forkd | 101 ms | 0.12 MiB | fork-from-warm via snapshot CoW |
| CubeSandbox¹ | 1.06 s | 5 MiB | RustVMM microVM, cold-boot (pool fast path) |
| BoxLite² | 113.2 s | — | KVM microVM, cold-boot OCI rootfs |
| OpenSandbox³ | 122.0 s | — | Docker runtime via abstraction layer |
| Firecracker cold-boot | 759 ms | 84 MiB | raw VM boot, no orchestration |
| gVisor (runsc) | 288.6 s | — | userspace kernel container |
| Docker (runc) | 335.3 s | 4 MiB | standard container runtime |
¹ CubeSandbox: 1.06 s wall-clock is the fast-path N=100 figure on
this host (1056 ± 14 ms over five runs, 100 % success every run),
measured with a bench script that pre-warms Python's
ThreadPoolExecutor to keep client-side lazy-init out of the
timing. An earlier slow-path measurement on the same host returned
20.3 s with 77/100 success — that template had a 2 GiB
writable-layer size that didn't match the default 1 GiB pool, so
every sandbox went through a live mkfs.ext4 + reflink-copy; after
the upstream maintainer at
#235
clarified the distinction, we added 2Gi to
pool_default_format_size_list and re-ran. The host runs cube
v0.2.0, which carries a ~50 ms latency regression that
PR #234
fixes in v0.2.1; the value above is the v0.2.0 baseline. Cube
advertises <60 ms single-instance cold-start on a 96 vCPU
host; we did not retest that shape. See
bench/CUBESANDBOX.md for the full
methodology, both rows, and the cmdTimeout race we filed two PRs
upstream against
(#236 /
#237).
² BoxLite is optimised for one long-lived stateful Box per workload,
not 100 concurrent fresh microVMs. The cold fan-out is included for
direct comparability. See bench/BOXLITE.md.
³ OpenSandbox is an abstraction layer over Docker / K8s / gVisor /
Kata / Firecracker; the number is for its default Docker runtime.
See bench/OPENSANDBOX.md.
Reproduce: bench/bench-spawn-100.sh then bench/generate_charts.py.
For one sandbox doing the same numpy expression two ways:
| Call | Time | What it does |
|---|---|---|
sandbox.eval("numpy.zeros(5).tolist()") |
1 ms | Reuses the warmed Python in PID 1 |
sandbox.commands.run("python3 -c '...'") |
96 ms | Cold subprocess re-imports numpy |
flowchart TB
%% ─── parent ───────────────────────────────────────────────
subgraph PARENT["Parent VM (booted once, warmed)"]
direction TB
runtime["PID 1<br/>Python + numpy + your deps<br/>imported into RAM"]
end
PARENT -- "pause + snapshot" --> SNAP["Snapshot on disk<br/>memory.bin (CoW source)<br/>vmstate (vCPU + devices)"]
%% ─── controller ───────────────────────────────────────────
CLIENT["Client (CLI / Python SDK)"] -- "POST /v1/sandboxes n=100" --> CTL["forkd-controller<br/>REST · auth · audit · /metrics"]
CTL -- "restore_many_with(...)" --> SNAP
%% ─── children ─────────────────────────────────────────────
subgraph CHILDREN["100 Child Firecracker processes (kernel CoW per page)"]
direction LR
subgraph NS1["netns forkd-child-1"]
C1["Child 1<br/>mmap MAP_PRIVATE<br/>cgroup memory.max"]
end
subgraph NS2["netns forkd-child-2"]
C2["Child 2<br/>mmap MAP_PRIVATE<br/>cgroup memory.max"]
end
subgraph NSN["netns forkd-child-100"]
CN["Child 100<br/>mmap MAP_PRIVATE<br/>cgroup memory.max"]
end
end
SNAP -. "shared file<br/>(read-mostly)" .-> C1
SNAP -. "shared file" .-> C2
SNAP -. "shared file" .-> CN
%% ─── network ──────────────────────────────────────────────
C1 -- "veth" --> BR["host bridge forkd-br0<br/>MASQUERADE"]
C2 -- "veth" --> BR
CN -- "veth" --> BR
BR --> UPLINK(("uplink → internet"))
%% styling
classDef parent fill:#e8f3ec,stroke:#4c956c,color:#1f2933;
classDef snap fill:#fff3df,stroke:#d4a259,color:#1f2933;
classDef ctl fill:#e6efff,stroke:#5b7dba,color:#1f2933;
classDef child fill:#ffffff,stroke:#52606d,color:#1f2933;
classDef net fill:#f1f3f5,stroke:#8d99ae,color:#1f2933;
class PARENT,runtime parent;
class SNAP snap;
class CTL,CLIENT ctl;
class NS1,NS2,NSN,C1,C2,CN child;
class BR,UPLINK net;
See DESIGN.md for the full design and the open
problems the architecture leaves on the table.
The sandbox-runtime space has a wide spread of designs. The table below summarises positioning of forkd against the most-cited open-source projects. Numbers in quotes are as advertised by the upstream project unless they match a row in our benchmark chart above. forkd does not measure other projects on workloads they were not designed for.
| Project | Primitive | Cold-start (N=100) | Fork-from-warm | Quotas | Auth / TLS | License |
|---|---|---|---|---|---|---|
| forkd | Firecracker + snapshot CoW | 101 ms | ✓ | cgroup memory.max |
bearer + rustls | Apache 2.0 |
| CubeSandbox | RustVMM + KVM microVM | 1.06 s¹ | "coming soon" | <5 MiB / instance | not in OSS | Apache 2.0 |
| Daytona | OCI workspace | <90 ms² | ✗ | per workspace | API keys (platform) | AGPL-3.0 |
| OpenSandbox | Docker / K8s + gVisor / Kata / FC | 122 s | ✗ | via runtime | gateway (k8s) | Apache 2.0 |
| E2B | Firecracker (in infra) | not in OSS | ✗ | platform | API keys (cloud) | Apache 2.0 |
| BoxLite | KVM / Hypervisor.framework + OCI | 113 s | ✗ stateful Box | KVM + seccomp | egress policy only | Apache 2.0 |
| Modal | proprietary snapshot fork | not public | ✓ | ✓ | ✓ | proprietary |
| Firecracker raw | microVM only | 759 ms | manual | n/a | n/a | Apache 2.0 |
| Docker (runc) | OCI container | 335 s | ✗ | cgroups | n/a | Apache 2.0 |
| gVisor (runsc) | userspace kernel | 289 s | ✗ | cgroups | n/a | Apache 2.0 |
¹ Wall-clock at N=100 concurrent on this bare-metal host (systemd-detect-virt: none, i7-12700, 20 vCPU, no nested virt). This is the fast-path number — pool_default_format_size_list was extended to include the template's writable-layer size, so each sandbox reuses a pre-formatted pool entry rather than going through a live mkfs.ext4 + reflink-copy. 1056 ± 14 ms over five runs, 100 % success every run, measured with a bench script that pre-warms Python's ThreadPoolExecutor to keep client-side lazy-init out of the timing. Host runs cube v0.2.0, which carries a ~50 ms latency regression that PR #234 fixes in v0.2.1 — the figure above is the v0.2.0 baseline. An earlier slow-path measurement on the same host (writable-layer size that didn't match the default pool) returned 20.3 s with 77/100 success — that mismatch was on our side and the maintainer corrected it at #235. Cube advertises <60 ms single-instance cold-start (P99 200 ms at N=100 concurrent) under the fast-path configuration on a 96 vCPU host — that figure isn't disputed and we did not retest it here. Note also that this row compares fork-from-warm (forkd) with cold-start (every other project); they're different operating points by design, not equivalent primitives. See bench/CUBESANDBOX.md for the full methodology, both rows, and the upstream cmdTimeout race we filed PRs #236 / #237 against.
² Daytona's advertised number; we did not measure it (workspace runtime, not a fan-out-comparable shape).
Where forkd fits.
- Code interpreters and Jupyter-kernel sandboxes. Each conversation
turn or tool call spawns a fresh kernel; the warmed parent carries
the SciPy / ML runtime, so per-request
import numpy/import torchcollapses to zero. This is the design point — the workload shape Anthropic / OpenAI / Modal code-interpreter products are all on. - Evaluation harnesses. Hundreds of repository checkouts or test rollouts in parallel — SWE-bench-style — without paying Docker cold-start per task.
- Per-user code execution at fan-out scale. Many short-lived sandboxes sharing one warmed parent, each child KVM-isolated by construction.
- Untrusted-code execution in CI.
git clone,pip install,pytestinside a real Linux VM, not a container namespace. - Self-hosted alternative to managed sandbox SaaS. One Linux box with KVM, single-binary daemon, Apache 2.0 — no per-second cloud fees, no vendor lock-in.
Where the others fit better. CubeSandbox: faster pure cold-start (<60 ms advertised). Daytona: workspace runtimes where each user owns one long-lived sandbox. OpenSandbox: one orchestration API across multiple isolation backends. BoxLite: embeddable, daemon-less, cross-platform (macOS via Hypervisor.framework). Modal: the closed- source managed system with the same primitive.
Where forkd is wrong. Function-level snapshot runtimes that give
up real Linux (single-vCPU, serial I/O only) beat forkd's ~100 ms by
an order of magnitude — at the cost of not running real Python
servers, apt install, or outbound HTTPS.
Skim answers for platform / procurement teams scoping forkd:
Can we deploy on Kubernetes? Yes — one forkd-controller Pod hosts N sandbox children; the K8s scheduler runs once at Pod creation regardless of fan-out (vs one Pod-per-sandbox in Kata / Firecracker-on-K8s designs). A starter manifest ships at packaging/k8s/. Requires nodes with /dev/kvm + cgroup v2; managed K8s (GKE / EKS / AKS) typically needs a metal SKU or explicit nested-virt to qualify.
How many sandboxes fit in one Pod? With a 512 MiB warmed Python+numpy parent, rough sizing:
- ~1 actively-running agent per vCPU (compute-bound bottleneck)
- ~50 idle-pooled agents per 8 GiB Pod RAM (process-state bottleneck, not memory)
Measured CoW overhead at N=100 is 0.12 MiB / child on top of the parent (bench/), so memory rarely caps fan-out — vCPU + process count dominate. Heavier parents (browser, ML inference) hit the ceiling sooner; measure with yours.
How do existing agents connect?
- REST —
POST /v1/sandboxes n=100, language-agnostic, bearer-token auth - Python SDK —
from forkd import Sandbox(drop-in forfrom e2b import Sandbox) - LangGraph / AutoGen / CrewAI — through the Python SDK, no special glue
- MCP —
pip install forkd-mcpships an MCP server for Claude Desktop / Claude Code / Cursor / Cline. Seesdk/mcp/
Production case shapes (from production users + this repo's recipes):
- AI code interpreter — one warmed parent (SciPy / torch pre-imported), fork-per-conversation-turn. Recipe:
e2b-codeinterpreter/ - SWE-bench-style parallel evals — N parallel repo checkouts, each child runs
pytestisolated. Recipe:coding-agent/ - Per-user code exec at scale — shared warmed parent, child KVM-isolated per user
- Untrusted CI —
git clone + pip install + pytestinside a real Linux VM, not a container namespace - Fork-per-test isolated databases — recipe:
postgres-fixture/— ready-to-query postgres at ~10 ms per child instead of ~2 s of freshinitdb
Requires: x86_64 Linux with KVM, Ubuntu 22.04 or newer.
# 1. Host setup: KVM, Firecracker, Rust, KSM, hugepages, tap device.
sudo bash scripts/setup-host.sh
sudo bash scripts/host-tap.sh
cargo build --release
sudo install -m 0755 target/release/{forkd,forkd-controller} /usr/local/bin/
# 2. Build a warmed rootfs from a Docker image.
sudo bash scripts/build-rootfs.sh python:3.12-slim python-rootfs.ext4 1536 python3-numpy
# 3. Fetch a kernel.
curl -O https://s3.amazonaws.com/spec.ccfc.min/firecracker-ci/v1.10/x86_64/vmlinux-6.1.141
# 4. Run a one-shot sandbox.
sudo -E forkd run --image python:3.12-slim --kernel ./vmlinux-6.1.141 \
-- python3 -c "import numpy; print(numpy.zeros(5).sum())"
# 0.0# Provision N per-child network namespaces (one-time per N).
sudo bash scripts/netns-setup.sh 100
# Create a tagged parent snapshot.
sudo forkd snapshot --tag pyagent \
--kernel ./vmlinux-6.1.141 \
--rootfs ./python-rootfs.ext4 \
--tap forkd-tap0
# Fork 100 children sharing the parent's memory.
sudo -E forkd fork --tag pyagent -n 100 --per-child-netns --memory-limit-mib 256
# Talk to one of them.
sudo forkd eval --child forkd-child-42 -- "numpy.zeros(100).sum()"In-guest agent (E2B-compatible):
from forkd import Sandbox # drop-in for `from e2b import Sandbox`
with Sandbox() as sb:
print(sb.commands.run("uname -a").stdout)
print(sb.eval("numpy.zeros(5).tolist()")) # reuses warmed PID 1Controller daemon (lifecycle + branching):
from forkd import Controller
c = Controller() # http://127.0.0.1:8889
children = c.spawn_sandboxes("pyagent", n=1, per_child_netns=True)
sb_id = children[0]["id"]
# … drive sb_id via in-guest Sandbox, then branch before a risky step:
branch = c.branch_sandbox(sb_id, tag="checkpoint-1")
grandchildren = c.spawn_sandboxes(branch["tag"], n=5) # speculative fan-outSee docs/design/branching.md for the
fork-from-warm tree model and use cases.
For Claude Desktop, Claude Code, Cursor, and any other MCP-aware client:
pip install forkd-mcp
# then add to claude_desktop_config.json:
# "mcpServers": { "forkd": { "command": "forkd-mcp" } }The server exposes spawn_sandboxes, exec_command, eval_code,
and five other tools — the agent can fork and drive forkd microVMs
directly. See sdk/mcp/README.md.
Skip the rootfs design step — pick one of the recipes/
and run its build.sh:
| Recipe | When to pick |
|---|---|
python-numpy/ |
Reproduce the benchmark; lightest Python + numpy |
e2b-codeinterpreter/ |
AI code-interpreter agents (E2B SDK-compatible) |
jupyter-kernel/ |
Notebook / SciPy stack pre-imported; ~1 ms per kernel |
coding-agent/ |
SWE-bench / coding agents with git + dev tools |
nodejs/ |
JS / TS workloads, Playwright fan-out |
playwright-browser/ |
Browser-driving agents (computer-use, web research, UI test gen). Fork warmed Chromium at ~10 ms |
agent-workbench/ |
Kitchen sink — browser + VSCode + Jupyter + MCP |
postgres-fixture/ |
Fork-per-test isolated postgres; ready-to-query in ~10 ms instead of ~2 s of fresh initdb |
Once a parent snapshot is built (yours or someone else's), forkd pack
ships it as a single .tar.zst file with a manifest + per-file sha256.
Other hosts pull and resume forking in seconds, no Docker round-trip:
# producer host
forkd pack --tag pyagent --out pyagent.forkd-snapshot.tar.zst
# 23× compression typical — pyagent's 512 MiB memory.bin → ~22 MiB on disk
# upload to an R2 / S3 bucket
forkd push --tag pyagent "https://<your-presigned-PUT-url>"
# consumer host
forkd pull https://hub.example.com/pyagent.forkd-snapshot.tar.zst
forkd fork --tag pyagent -n 100 --per-child-netns # already warmforkd images lists local snapshots with their sizes. Integrity is
verified on unpack via the manifest's sha256s.
The controller daemon owns the registry of snapshots and live sandboxes, exposes the REST API, and writes structured audit logs. Recommended for any deployment beyond local development.
sudo install -m 0644 packaging/systemd/forkd-controller.service /etc/systemd/system/
sudo mkdir -p /etc/forkd
sudo bash -c 'head -c 32 /dev/urandom | base64 > /etc/forkd/token'
sudo chmod 600 /etc/forkd/token
sudo systemctl enable --now forkd-controllerThen drive it over HTTP:
TOKEN=$(sudo cat /etc/forkd/token)
curl -H "Authorization: Bearer $TOKEN" -X POST http://127.0.0.1:8889/v1/sandboxes \
-H 'Content-Type: application/json' \
-d '{"snapshot_tag":"pyagent","n":5,"per_child_netns":true,"memory_limit_mib":256}'
# [{"id":"sb-67a1b3-0000","pid":...,...}, ...]
curl -H "Authorization: Bearer $TOKEN" http://127.0.0.1:8889/metrics
# forkd_sandboxes_active 5Full API reference: docs/API.md.
Operator runbook: docs/RUNBOOK.md.
Security posture: docs/SECURITY.md.
crates/
forkd-vmm/ Firecracker wrapper: BootConfig, Vm, Snapshot, cgroup
forkd-cli/ `forkd` binary (snapshot, fork, run, exec, eval,
pack/unpack/pull/push/images, cleanup)
forkd-controller/ `forkd-controller` daemon: REST, registry, audit
rootfs-init/
forkd-init.sh PID 1 inside the guest; mounts pseudo-fs, launches agent
forkd-agent.py TCP server on :8888 inside the guest (ping/exec/eval)
sdk/python/ E2B-compatible Python SDK
sdk/mcp/ MCP server (`forkd-mcp`) — drive forkd from
Claude Desktop / Claude Code / any MCP client
scripts/ Host-side helpers (KVM, Firecracker, netns, rootfs)
packaging/systemd/ systemd unit for the controller
packaging/k8s/ Starter Kubernetes manifest for forkd-controller
recipes/ Pre-built parent-rootfs recipes (python-numpy,
e2b-codeinterpreter, jupyter-kernel, coding-agent,
nodejs, playwright-browser, agent-workbench,
postgres-fixture). See recipes/README.md.
bench/ Benchmark harness, chart generators, results
docs/ API.md, SECURITY.md, RUNBOOK.md
Alpha. The fork-on-write primitive, controller daemon, REST API, auth, audit logging, cgroup memory limits, Prometheus metrics, and Python SDK are in place and exercised by 25 unit + integration tests in CI. On-disk formats and API shapes may still change before 1.0.
Production-readiness items not yet in this release:
- Multi-node scheduling (one daemon = one host).
- Default-deny egress on per-child netns (today: shared MASQUERADE
rule; users wanting an allow-list policy add their own
iptablesrules per netns). - cpu.max, io.max, pids.max quotas beyond the existing
memory.max. - Third-party security audit.
Roadmap (v0.3 candidates + production-readiness gaps): docs/ROADMAP.md. Issue-level tracking: GitHub issues. Release notes per version: CHANGELOG.md. Security posture and past advisories: docs/SECURITY.md.
0.1.4 contains daemon security fixes. Two HIGH-class validation gaps in
POST /v1/sandboxes(path-traversal viasnapshot_tag) andpackaging/k8s/(placeholder bearer token accepted at startup) affected 0.1.0–0.1.3 inclusive; users on those versions should upgrade. Full advisories in docs/SECURITY.md#past-advisories. 0.1.3 also fixed a CLI--tagpath-traversal (affected 0.1.0–0.1.2).
Pull requests welcome. Before opening one, please:
- Open an issue describing what you want to change. APIs are still moving; we'd rather align early than ask you to rewrite the patch.
cargo fmt --all && cargo clippy --all-targets -- -D warnings && cargo test --alllocally.- Sign-off your commits (
git commit -s).

