Skip to content

perf(webapp): parallelize streaming batch-item ingest#3777

Open
matt-aitken wants to merge 4 commits into
mainfrom
feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408
Open

perf(webapp): parallelize streaming batch-item ingest#3777
matt-aitken wants to merge 4 commits into
mainfrom
feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408

Conversation

@matt-aitken

@matt-aitken matt-aitken commented May 29, 2026

Copy link
Copy Markdown
Member

Problem

The item-streaming endpoint of the two-phase batch API (POST /api/v3/batches/:batchId/items) processed streamed items strictly sequentially. For a batch of many large payloads, each offloaded to object storage inline, this serialized N object-store round-trips inside a single request and could exceed Node's default server.requestTimeout (300s). The webapp then returned 408, which the SDK reads as 408 terminated and retries up to 5 times, turning a slow ingest into a failure that takes tens of minutes to surface.

Fix

Ingest now runs through p-map over the NDJSON async iterable with bounded concurrency (STREAMING_BATCH_INGEST_CONCURRENCY, default 10):

  • p-map pulls lazily from the stream, so at most concurrency items are read and in-flight at once. Peak memory stays bounded to roughly concurrency × STREAMING_BATCH_ITEM_MAXIMUM_SIZE and request-body backpressure is preserved.
  • Set the env to 1 for fully sequential ingestion (escape hatch).

Why this is safe (ordering and idempotency unchanged)

  • Ordering derives from each item's index (enqueue timestamp = batch.createdAt + index), not enqueue order.
  • Dedup is atomic per index in enqueueBatchItem.
  • The NDJSON parser now stamps oversized-item markers with their emit position, removing the consumer's sequential lastIndex assumption (the only order-dependent bit).
  • The count-check and conditional-seal path is untouched.

Scope

This speeds up every batch ingested through the streaming endpoint, not just large-payload batches. Each item does a per-item Redis enqueue regardless of size, and those now overlap. Large payloads benefit most because they add an object-store offload round-trip on top of the enqueue.

Verification

Added an integration test (streamBatchItems.test.ts) that drives the real service against Postgres + Redis + RunEngine and times a 150-item batch at increasing concurrency. Object-store offload is modelled as a fixed per-item latency (local round-trips are too small to compare meaningfully):

runCount=150
  large payloads (10ms/item offload):
    concurrency=1   1739ms
    concurrency=10  192ms  (9.1x faster)
    concurrency=50  57ms   (30.7x faster)
  small payloads (Redis enqueue only, no offload):
    concurrency=1   90ms
    concurrency=10  24ms   (3.7x faster)

The test asserts correctness at every concurrency (all items accepted, sealed, enqueued exactly once), that parallel ingest beats the sequential floor, and that the small-payload case is strictly faster than sequential, so the win is not specific to large payloads.

Also exercised end-to-end over real HTTP against a local server: a 20-item batch (12MB body) ingests and seals, a re-stream of the sealed batch returns sealed: true with zero re-accepted items (idempotent retry), and an oversized item still seals at its correct index.

Existing coverage stays green: concurrent ingest of a 100-item batch, in-flight processing never exceeding the configured concurrency, concurrent dedup on streaming retry, and emit-position marker indexing.

Follow-ups (not in this PR)

  • SDK pre-offload of large item payloads (send application/store refs instead of raw blobs) to remove object-store work from the request hot path and shrink the request body.
  • Optional server.requestTimeout bump as a safety net.

@changeset-bot

changeset-bot Bot commented May 29, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: d566bcb

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai

coderabbitai Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This PR changes Phase 2 streaming batch ingest to process NDJSON items with bounded concurrency (p-map) instead of strict sequential offload+enqueue. It adds STREAMING_BATCH_INGEST_CONCURRENCY (default 10) and wires it into the API route, injects an optional payloadProcessor for tests, extracts per-item logic into a concurrent-safe #processItem, and updates the NDJSON parser to track emit positions and backfill oversized-item indices when extraction fails. Ordering, per-index deduplication, sealing, and idempotency semantics are preserved. Tests cover concurrency bounds, deduplication, oversized-index validation, and parser backfill behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly and concisely summarizes the main change: parallelize streaming batch-item ingest, which is the core objective of the PR.
Description check ✅ Passed The PR description covers all required template sections: problem statement, fix explanation, safety justification, scope, and verification. It also includes a detailed follow-ups section.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

coderabbitai[bot]

This comment was marked as resolved.

@mintlify

mintlify Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
trigger 🟢 Ready View Preview May 29, 2026, 6:09 PM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@pkg-pr-new

pkg-pr-new Bot commented Jun 6, 2026

Copy link
Copy Markdown

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@5f3b930

trigger.dev

npm i https://pkg.pr.new/trigger.dev@5f3b930

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@5f3b930

@trigger.dev/plugins

npm i https://pkg.pr.new/@trigger.dev/plugins@5f3b930

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@5f3b930

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@5f3b930

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@5f3b930

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@5f3b930

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@5f3b930

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@5f3b930

commit: 5f3b930

@matt-aitken matt-aitken force-pushed the feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408 branch from de3489f to 999ccad Compare June 8, 2026 10:20
matt-aitken and others added 2 commits June 10, 2026 15:13
…273)

Phase 2 of the v3 streaming batch API (POST /api/v3/batches/:batchId/items)
processed streamed items strictly sequentially. For batches of many large
payloads — each offloaded to object storage inline — this serialized N object-store
round-trips inside one request, blowing past Node's default 300s server.requestTimeout.
The webapp then returned 408, which the SDK reads as "408 terminated" and retries 5x,
turning a slow ingest into a ~26-minute failure.

Ingest now runs through p-map over the NDJSON async iterable with bounded concurrency
(STREAMING_BATCH_INGEST_CONCURRENCY, default 10). p-map pulls lazily, so at most
`concurrency` items are read/in-flight at once — bounding peak memory to roughly
concurrency x STREAMING_BATCH_ITEM_MAXIMUM_SIZE while preserving stream backpressure.
Set the env to 1 for fully sequential ingestion.

Safe by construction: run order derives from each item's index (enqueue timestamp =
batch.createdAt + index), and enqueueBatchItem dedups atomically per index — neither
depends on processing order. The NDJSON parser now stamps oversized-item markers with
their emit position, removing the consumer's sequential lastIndex assumption. The
count-check + conditional seal path is unchanged.

Tests: bounded-concurrency ingest of a 100-item batch, in-flight cap assertion,
concurrent dedup on Phase 2 retry, and emit-position marker indexing. Full existing
sealing/idempotency suite still green (42/42).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Enforce positive STREAMING_BATCH_INGEST_CONCURRENCY in the env schema
  (.int().positive()) — p-map requires concurrency >= 1, so 0/negative would
  throw at runtime.
- Apply the same out-of-range index guard to oversized-item markers as normal
  items, so an oversized item with index >= runCount returns a 4xx instead of
  creating a stray pre-failed run. Covered by a new test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@matt-aitken matt-aitken force-pushed the feature/tri-10273-managed-runs-streaming-batchtriggerandwait-fails-with-408 branch from 999ccad to 5f3b930 Compare June 10, 2026 14:13
@matt-aitken matt-aitken changed the title perf(webapp): parallelize Phase 2 streaming batch-item ingest (TRI-10273) perf(webapp): parallelize streaming batch-item ingest Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant