fix(sandbox): scope provisioner PVC data by user#2973
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a sandbox isolation gap in provisioner mode by scoping PVC-mounted user-data to the effective user_id (not just thread_id), aligning the provisioner’s PVC layout with DeerFlow’s per-user filesystem isolation model.
Changes:
- Forward
user_idfromRemoteSandboxBackendto the provisionerPOST /api/sandboxesrequest payload. - Add
user_id(validated, with a backwards-compatible default) to the provisioner request model and thread/user validation helpers. - Update PVC
subPathfor user-data mounts to./deer-flow/users/{user_id}/threads/{thread_id}/user-dataand add regression tests for user-scoped subPaths.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docker/provisioner/README.md | Updates config docs to describe the new user-scoped PVC subPath layout. |
| docker/provisioner/app.py | Adds user_id to the create request model and applies user-scoped PVC subPath when mounting user-data. |
| docker/docker-compose-dev.yaml | Updates the inline comment to reflect the new user-scoped PVC subPath behavior. |
| backend/packages/harness/deerflow/community/aio_sandbox/remote_backend.py | Sends user_id when creating sandboxes via the provisioner API. |
| backend/tests/test_provisioner_pvc_volumes.py | Extends provisioner PVC tests to assert user-scoped subPath and default fallback behavior. |
| backend/tests/test_aio_sandbox_provider.py | Adds a regression test ensuring RemoteSandboxBackend forwards the effective user_id. |
| resp = requests.post( | ||
| f"{self._provisioner_url}/api/sandboxes", | ||
| json={ | ||
| "sandbox_id": sandbox_id, | ||
| "thread_id": thread_id, | ||
| "user_id": get_effective_user_id(), | ||
| }, |
| | `THREADS_HOST_PATH` | - | **Host machine** path to threads data directory (must be absolute) | | ||
| | `SKILLS_PVC_NAME` | empty (use hostPath) | PVC name for skills volume; when set, sandbox Pods use PVC instead of hostPath | | ||
| | `USERDATA_PVC_NAME` | empty (use hostPath) | PVC name for user-data volume; when set, uses PVC with `subPath: threads/{thread_id}/user-data` | | ||
| | `USERDATA_PVC_NAME` | empty (use hostPath) | PVC name for user-data volume; when set, uses PVC with `subPath: ./deer-flow/users/{user_id}/threads/{thread_id}/user-data` | | ||
| | `KUBECONFIG_PATH` | `/root/.kube/config` | Path to kubeconfig **inside** the provisioner container | | ||
| | `NODE_HOST` | `host.docker.internal` | Hostname that backend containers use to reach host NodePorts | |
9fd1db1 to
7f88bc8
Compare
|
@shenlihust thanks for the contribution. Here are some comments for the code.
|
|
@WillemJiang I pushed a follow-up commit without force-pushing: 2a0c33b. This update addresses the review feedback by:
I also re-ran the targeted backend tests and ruff checks locally. |
Summary
Fix provisioner-mode sandbox PVC mounting so user-data subPaths include the effective user id instead of being keyed only by thread id.
Root Cause
RemoteSandboxBackendcreated provisioner sandboxes with onlysandbox_idandthread_id. WhenUSERDATA_PVC_NAMEwas configured, the provisioner mounted all thread user-data under a thread-only subPath, which could collide across users and bypass the user isolation model used by backend filesystem paths.Changes
RemoteSandboxBackendto the provisioner create API.user_idvalidation and a backwards-compatible default to the provisioner request model../deer-flow/users/{user_id}/threads/{thread_id}/user-data.Tests
cd backend && uv run --group dev ruff check packages/harness/deerflow/community/aio_sandbox/remote_backend.py tests/test_aio_sandbox_provider.py tests/test_provisioner_pvc_volumes.py ../docker/provisioner/app.pycd backend && uv run --group dev pytest tests/test_aio_sandbox_provider.py tests/test_provisioner_pvc_volumes.py -qgit diff --check