Skip to content

fix(sandbox): scope provisioner PVC data by user#2973

Merged
WillemJiang merged 5 commits into
bytedance:mainfrom
shenlihust:fix/provisioner-user-scoped-pvc
May 17, 2026
Merged

fix(sandbox): scope provisioner PVC data by user#2973
WillemJiang merged 5 commits into
bytedance:mainfrom
shenlihust:fix/provisioner-user-scoped-pvc

Conversation

@shenlihust
Copy link
Copy Markdown
Contributor

Summary

Fix provisioner-mode sandbox PVC mounting so user-data subPaths include the effective user id instead of being keyed only by thread id.

Root Cause

RemoteSandboxBackend created provisioner sandboxes with only sandbox_id and thread_id. When USERDATA_PVC_NAME was configured, the provisioner mounted all thread user-data under a thread-only subPath, which could collide across users and bypass the user isolation model used by backend filesystem paths.

Changes

  • Forward the effective user id from RemoteSandboxBackend to the provisioner create API.
  • Add user_id validation and a backwards-compatible default to the provisioner request model.
  • Change PVC user-data subPath to ./deer-flow/users/{user_id}/threads/{thread_id}/user-data.
  • Update provisioner docs and regression coverage for user-scoped PVC mounts.

Tests

  • cd backend && uv run --group dev ruff check packages/harness/deerflow/community/aio_sandbox/remote_backend.py tests/test_aio_sandbox_provider.py tests/test_provisioner_pvc_volumes.py ../docker/provisioner/app.py
  • cd backend && uv run --group dev pytest tests/test_aio_sandbox_provider.py tests/test_provisioner_pvc_volumes.py -q
  • git diff --check

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented May 15, 2026

CLA assistant check
All committers have signed the CLA.

@WillemJiang WillemJiang added the reviewing The PR is in reviewing status label May 15, 2026
@WillemJiang WillemJiang requested a review from Copilot May 15, 2026 14:52
@WillemJiang WillemJiang added this to the 2.0-m1 milestone May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a sandbox isolation gap in provisioner mode by scoping PVC-mounted user-data to the effective user_id (not just thread_id), aligning the provisioner’s PVC layout with DeerFlow’s per-user filesystem isolation model.

Changes:

  • Forward user_id from RemoteSandboxBackend to the provisioner POST /api/sandboxes request payload.
  • Add user_id (validated, with a backwards-compatible default) to the provisioner request model and thread/user validation helpers.
  • Update PVC subPath for user-data mounts to ./deer-flow/users/{user_id}/threads/{thread_id}/user-data and add regression tests for user-scoped subPaths.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docker/provisioner/README.md Updates config docs to describe the new user-scoped PVC subPath layout.
docker/provisioner/app.py Adds user_id to the create request model and applies user-scoped PVC subPath when mounting user-data.
docker/docker-compose-dev.yaml Updates the inline comment to reflect the new user-scoped PVC subPath behavior.
backend/packages/harness/deerflow/community/aio_sandbox/remote_backend.py Sends user_id when creating sandboxes via the provisioner API.
backend/tests/test_provisioner_pvc_volumes.py Extends provisioner PVC tests to assert user-scoped subPath and default fallback behavior.
backend/tests/test_aio_sandbox_provider.py Adds a regression test ensuring RemoteSandboxBackend forwards the effective user_id.

Comment on lines 138 to 144
resp = requests.post(
f"{self._provisioner_url}/api/sandboxes",
json={
"sandbox_id": sandbox_id,
"thread_id": thread_id,
"user_id": get_effective_user_id(),
},
Comment on lines 139 to 143
| `THREADS_HOST_PATH` | - | **Host machine** path to threads data directory (must be absolute) |
| `SKILLS_PVC_NAME` | empty (use hostPath) | PVC name for skills volume; when set, sandbox Pods use PVC instead of hostPath |
| `USERDATA_PVC_NAME` | empty (use hostPath) | PVC name for user-data volume; when set, uses PVC with `subPath: threads/{thread_id}/user-data` |
| `USERDATA_PVC_NAME` | empty (use hostPath) | PVC name for user-data volume; when set, uses PVC with `subPath: ./deer-flow/users/{user_id}/threads/{thread_id}/user-data` |
| `KUBECONFIG_PATH` | `/root/.kube/config` | Path to kubeconfig **inside** the provisioner container |
| `NODE_HOST` | `host.docker.internal` | Hostname that backend containers use to reach host NodePorts |
@shenlihust shenlihust force-pushed the fix/provisioner-user-scoped-pvc branch from 9fd1db1 to 7f88bc8 Compare May 15, 2026 15:05
@WillemJiang
Copy link
Copy Markdown
Collaborator

@shenlihust thanks for the contribution. Here are some comments for the code.

  1. Data migration concern (important). Existing deployments with PVC data at the old path threads/{thread_id}/user-data will lose access to that data after upgrading, because the new subPath is
    ./deer-flow/users/default/threads/{thread_id}/user-data. The PR description doesn't mention a migration strategy. Consider:

    • Adding a startup check or migration script that moves existing PVC data from old to new paths.
    • Or documenting that operators must manually migrate PVC contents before upgrading.
  2. Redundant validation. _validate_thread_id and _validate_user_id are called inside _build_volume_mounts and _build_pod, but these functions are only reachable from create_sandbox() which already validates via the Pydantic CreateSandboxRequest model (which has pattern= on both fields). The double validation is harmless but adds noise. Either remove the inner calls or add a brief comment noting it's defense-in-depth.

  3. Leading ./ in subPath. The subPath ./deer-flow/users/... uses a relative ./ prefix. Kubernetes subPath is always relative to the volume root, so the ./ is technically redundant. It works, but it's unconventional — K8s docs and examples never use a leading ./. Consider dropping it to match convention: deer-flow/users/{user_id}/threads/{thread_id}/user-data.

  4. _validate_thread_id called again in _build_volume_mounts. This is called even though thread_id was already validated by the Pydantic model. More importantly, _build_volume_mounts now validates thread_id redundantly but _build_volumes does not — inconsistent.

  5. Minor: f-string logging. Line ~476: f"Received request to create sandbox '{sandbox_id}' for thread '{thread_id}' user '{user_id}'" — uses eager f-string interpolation instead oflogger.info("... %s ... %s ... %s", sandbox_id, thread_id, user_id).The latter is standard practice for performance (avoids string formatting when log level is disabled). This was pre-existing for the other fields but the PR extends the pattern.

@shenlihust
Copy link
Copy Markdown
Contributor Author

@WillemJiang I pushed a follow-up commit without force-pushing: 2a0c33b.

This update addresses the review feedback by:

  • removing the redundant helper-level thread_id/user_id validation and relying on the Pydantic request model validation at the API boundary;
  • removing the leading ./ from the PVC subPath;
  • switching the create_sandbox log line to lazy logger formatting;
  • updating the provisioner README to document the optional user_id request field and the user-scoped PVC subPath;
  • documenting the PVC upgrade path: existing PVC-backed deployments should run the existing scripts/migrate_user_isolation.py migration against the gateway DeerFlow base directory before relying on the new subPath.

I also re-ran the targeted backend tests and ruff checks locally.

@WillemJiang WillemJiang merged commit e74e126 into bytedance:main May 17, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

reviewing The PR is in reviewing status

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants