`sandboxset`: thread-safe sandbox pool (issue #217) by AmiBuch · Pull Request #425 · open-lambda/open-lambda

AmiBuch · 2026-03-03T19:20:49Z

`sandboxset`: thread-safe sandbox pool (issue #217)

API

GetOrCreateUnpaused returns a *SandboxRef — a handle that wraps the
sandbox with a back-pointer to its parent set. The caller uses
ref.Sandbox() to access the container. When done, set ref.Broken = true
if the sandbox is unhealthy, then always call ref.Put() — it recycles
the sandbox if healthy or destroys it if broken.

set, _ := sandboxset.New(&sandboxset.Config{
    Pool:        myPool,
    CodeDir:     "/path/to/lambda",
    ScratchDirs: myScratchDirs,
})

ref, _ := set.GetOrCreateUnpaused()   // create or reuse a sandbox
sb := ref.Sandbox()                    // access the underlying container
// ... handle request using sb.Client() ...

if broken {
    ref.Broken = true                  // mark unhealthy before returning
}
ref.Put()                              // recycles if healthy; destroys if Broken

set.Close()                            // tear down everything

For explicit teardown with a reason, ref.Destroy("reason") is also available.

SandboxRef

type SandboxRef struct {
	set    *sandboxSetImpl
	Broken bool
	sb    sandbox.Sandbox
	inUse bool
}

func (r *SandboxRef) Sandbox() sandbox.Sandbox    // access the container
func (r *SandboxRef) Put() error                   // return to pool (or destroy if Broken)
func (r *SandboxRef) Destroy(reason string) error  // explicitly destroy and remove

SandboxSet interface

type SandboxSet interface {
    GetOrCreateUnpaused() (*SandboxRef, error)
    Close() error
}

Config

type Config struct {
    Pool        sandbox.SandboxPool  // creates/destroys sandboxes
    Parent      SandboxSet           // optional parent set to fork from
    IsLeaf      bool                 // marks sandboxes as non-forkable
    CodeDir     string               // Lambda handler code directory
    Meta        *sandbox.SandboxMeta // runtime config (memory, packages, etc.)
    ScratchDirs *common.DirMaker     // creates a writable dir per new sandbox
}

Flow

     [created]
         |
         v
     [paused]  <---+
         |         |
         v         |
     [in-use]  ----+  (Put, Broken=false)
         |
         v
   [destroyed]     (Put with Broken=true / Destroy / Close / error)

File structure

go/worker/sandboxset/
    api.go           — SandboxSet interface, Config, New()
    sandboxset.go    — SandboxRef, sandboxSetImpl, all methods
    tests/
        sandboxset_test.go              — Unit tests (MockSandboxPool)
        sandboxset_integration_test.go  — Integration tests (real DockerPool)

Dependencies

sandboxset is a thin layer on top of sandbox — no new abstractions:

sandboxset
    │
    ├── sandbox.Sandbox      (3 of 11 methods used: Pause, Unpause, Destroy)
    ├── sandbox.SandboxPool  (1 method used: Create)
    ├── sandbox.SandboxMeta  (passed through to Create, never read)
    └── common.DirMaker      (1 method used: Make)

Testing

Unit tests (3 tests): Use MockSandboxPool from sandbox/mock.go.
Fast, no Docker needed.
```
cd go && go test ./worker/sandboxset/tests/ -v -race -count=1
```
Integration tests (4 tests): Use real DockerPool with ol-min
image. Gated with //go:build integration. Verify real containers are
created, paused, unpaused, and destroyed.
```
cd go && go test ./worker/sandboxset/tests/ -v -tags=integration -count=1
```

Next steps

Step 2: Replace LambdaInstance goroutines with a SandboxSet
Step 3: Use SandboxSet as the node in the zygote tree

…ace and microservice like architecture

go/worker/sandboxset/sandboxset.go

go/worker/sandboxset/api.go

tylerharter

I think I forgot to click submit on the feedback, sorry!

go/worker/sandboxset/api.go

go/worker/sandboxset/tests/sandboxset_test.go

go/worker/sandbox/mock.go

tylerharter

Good work! Getting closer.

go/worker/sandboxset/api.go

tylerharter · 2026-03-13T18:31:28Z

go/worker/sandboxset/api.go

+// Callers just ask for a sandbox and don't worry about whether it is
+// freshly created or recycled from a previous request.
+//
+// Sandbox lifecycle inside a SandboxSet:


This is the API, they don't need to know about internals.

The usage is good because that's what API users need. The internals in the flow diagram are not something users of your code should worry about. Those are internal details.

go/worker/sandboxset/api.go

go/worker/sandboxset/sandboxset.go

tylerharter

Great work!

tylerharter · 2026-03-20T18:42:42Z

go/worker/sandboxset/sandboxset.go

+	set       *sandboxSetImpl
+	Broken    bool        // public: caller sets true if request failed; Put will destroy instead of recycle
+	inUse     bool        // true when checked out; false when idle in pool
+	destroyed atomic.Bool // set atomically after Destroy(); guards against concurrent Close + Put


no atomic, then mutex of the containing sandbox set should be used to protect this.

instead of destroyed, perhaps sb can just be nil?

tylerharter · 2026-03-20T18:44:59Z

go/worker/sandboxset/sandboxset.go

+// sandboxSetImpl is the private concrete type returned by New.
+// All mutable state is guarded by mu.
+type sandboxSetImpl struct {
+	mu     sync.Mutex


pattern I like:

things not protected by the lock
mu synt.Mutex // protects below members
things protected by the lock

tylerharter · 2026-03-20T18:58:35Z

go/worker/sandboxset/sandboxset.go

+
+// claimIdle returns an idle ref from the pool, marking it inUse.
+// Caller must hold s.mu.
+func (s *sandboxSetImpl) claimIdle() *SandboxRef {


Recommendation: 3 paths

sandbox ref with healthy sandbox

sandbox ref with nil sandbox (because there was old one that got deleted)

sandbox ref with nil sandbox (because we're adding to the pool)

When a sandbox ref comes back with nil sandbox, GetOrCreateUnpaused can be responsible for giving it a new sandbox.

I was imagining those 3 paths in claimIdle. And I think claimIdle can do the locking. We can condense claimIdle, tryClaimIdle, and appendNilRef to just this function.

tylerharter

API is looking great! Lets iterate on the implementation a bit more.

tylerharter · 2026-03-27T17:20:13Z

go/worker/sandboxset/api.go

+// Callers just ask for a sandbox and don't worry about whether it is
+// freshly created or recycled from a previous request.
+//
+// Sandbox lifecycle inside a SandboxSet:


The usage is good because that's what API users need. The internals in the flow diagram are not something users of your code should worry about. Those are internal details.

tylerharter · 2026-03-27T17:22:06Z

go/worker/sandboxset/sandboxset.go

+	return r.set.Put(r.sb)
+}
+
+// Destroy removes the sandbox from its parent set and destroys it.


I still don't think we need it. Please justify why, or remove.

tylerharter · 2026-03-27T17:24:03Z

go/worker/sandboxset/sandboxset.go

+	closed bool
+}
+
+func newSandboxSet(cfg *Config) (*sandboxSetImpl, error) {


We should return errors for things we expect to happen during normal operation. For bugs, we actually want to just crash. So newSandboxSet should not return an error. You can panic if you get bad args. Or, just let the panic happen itself later when there is a nil deref.

tylerharter · 2026-03-27T17:24:43Z

go/worker/sandboxset/api.go

+
+// New creates a SandboxSet from cfg. Returns an error if any of
+// Pool, CodeDir, or ScratchDirs are missing.
+func New(cfg *Config) (SandboxSet, error) {


No error, should always succeed (or crash).

tylerharter · 2026-03-27T17:26:03Z

go/worker/sandboxset/sandboxset.go

+}
+
+// Put returns the sandbox to its parent set, or destroys it if Broken is true.
+func (r *SandboxRef) Put() error {


I like this being a very simple wrapper around the private put method in sandboxSetImpl. The code is easier to understand if locking is a detail internal to sandboxSetImpl, not split across multiple classes.

The only purpose/logic of this function: identifying which sandboxset should do the put.

tylerharter · 2026-03-27T17:37:16Z

go/worker/sandboxset/sandboxset.go

+// Fast path: claim an idle ref with an existing sandbox via tryClaimIdle, then Unpause outside the lock.
+// Slow path (nil ref): the claimed ref has no sandbox (either destroyed or newly added). A new sandbox is created outside the lock and assigned to the ref.
+func (s *sandboxSetImpl) GetOrCreateUnpaused() (*SandboxRef, error) {
+	for {


Let's not do retry for now. lambdaInstance.go already has retry logic for if Unpause fails. When we use sandbox set at a higher layer, then we can make a coherent decision about the 1 place to do retry (in sandbox set, or in the user of sandbox set).

tylerharter · 2026-03-27T17:39:20Z

go/worker/sandboxset/sandboxset.go

+			nilRef = ref
+		}
+	}
+	if nilRef != nil {


The naming is confusing me here. Why wouldn't a nilRef be nil?

tylerharter · 2026-03-27T17:41:07Z

go/worker/sandboxset/sandboxset.go

+
+// claimIdle returns an idle ref from the pool, marking it inUse.
+// Caller must hold s.mu.
+func (s *sandboxSetImpl) claimIdle() *SandboxRef {


I was imagining those 3 paths in claimIdle. And I think claimIdle can do the locking. We can condense claimIdle, tryClaimIdle, and appendNilRef to just this function.

tylerharter · 2026-03-27T17:43:26Z

go/worker/sandboxset/sandboxset.go

+			}
+		}
+
+		s.mu.Lock()


Do we need locking? I was imagining the design would be that when not inUse, any thread might want to use it, so locking would be needed. But when inUse, it would effectively be reserved by a single thread, so no more locking until returned to not inUse.

tylerharter · 2026-03-27T17:45:51Z

go/worker/sandboxset/sandboxset.go

+// GetOrCreateUnpaused implements SandboxSet.
+// Fast path: claim an idle ref with an existing sandbox via tryClaimIdle, then Unpause outside the lock.
+// Slow path (nil ref): the claimed ref has no sandbox (either destroyed or newly added). A new sandbox is created outside the lock and assigned to the ref.
+func (s *sandboxSetImpl) GetOrCreateUnpaused() (*SandboxRef, error) {


Really, the logic should break into two steps:

get a SandboxRef (with our without a Sandbox)

make sure that SandboxRef has a healthy, unpaused Sandbox inside it

Step 1 requires locking, step 2 does not.

GetOrCreateUnpaused could be very short, calling functions corresponding to the two steps. Or, GetOrCreateUnpaused could call step 1, then do step 2 inline.

AmiBuch and others added 7 commits February 19, 2026 14:09

refactor: separation of package, abstraction included, simpler interf…

4d351bc

…ace and microservice like architecture

Merge branch 'open-lambda:main' into main

47d5d10

refactor: simpler, much much simpler

8037432

fix: simpler diagram

a357a25

fix: commenting, some edge cases

e7fda1f

Merge branch 'open-lambda:main' into main

13ba7a0

fix: recursive Get, DirMaker panic, put after close documented

f85697d

tylerharter requested changes Mar 5, 2026

View reviewed changes

go/worker/sandboxset/sandboxset.go Show resolved Hide resolved

go/worker/sandboxset/api.go Outdated Show resolved Hide resolved

go/worker/sandboxset/api.go Outdated Show resolved Hide resolved

refactor: changed the function distribution and it return type

ae2ac69

tylerharter reviewed Mar 13, 2026

View reviewed changes

go/worker/sandboxset/api.go Outdated Show resolved Hide resolved

go/worker/sandboxset/api.go Outdated Show resolved Hide resolved

go/worker/sandboxset/tests/sandboxset_test.go Show resolved Hide resolved

go/worker/sandbox/mock.go Show resolved Hide resolved

tylerharter reviewed Mar 13, 2026

View reviewed changes

refactor: simpler interface and less comments, fixed race conditions too

34638f4

tylerharter requested changes Mar 20, 2026

View reviewed changes

AmiBuch and others added 2 commits March 21, 2026 14:33

Merge branch 'open-lambda:main' into main

83e27f4

fix: removed unnecessary fields

01ae0cf

AmiBuch marked this pull request as ready for review March 24, 2026 19:17

tylerharter requested changes Mar 27, 2026

View reviewed changes

Conversation

AmiBuch commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

sandboxset: thread-safe sandbox pool (issue #217)

API

SandboxRef

SandboxSet interface

Config

Flow

File structure

Dependencies

Testing

Next steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tylerharter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tylerharter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tylerharter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tylerharter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

AmiBuch commented Mar 3, 2026 •

edited

Loading

`sandboxset`: thread-safe sandbox pool (issue #217)