The Task Execution Engine is the core orchestration system responsible for managing the complete lifecycle of task runs in Trigger.dev. It handles task triggering, queuing, execution attempts, state management, retries, concurrency control, and advanced features like checkpoints, waitpoints, and batch processing. The engine coordinates between the webapp API, Redis-backed queues, PostgreSQL state storage, and worker processes that execute user tasks.
For information about the queue system and concurrency management, see Queue Management. For details on how workers execute tasks, see Worker Execution. For batch processing specifics, see Batch Processing.
The Task Execution Engine is implemented as the RunEngine class, which coordinates multiple specialized subsystems to manage task execution. The engine is instantiated as a singleton in the webapp and relies on PostgreSQL for persistent state, Redis for queuing and locking, and background workers for asynchronous job processing.
RunEngine Class Structure
Sources: internal-packages/run-engine/src/engine/index.ts76-387 internal-packages/run-engine/src/engine/systems/
The RunEngine class is the main orchestrator that provides the public API for task execution operations. It initializes all subsystems and coordinates their interactions.
Key Initialization Parameters:
| Parameter | Purpose | Default |
|---|---|---|
prisma | Database client for state persistence | Required |
worker.redis | Redis connection for background jobs | Required |
queue.redis | Redis connection for task queues | Required |
runLock.redis | Redis connection for distributed locks | Required |
machines | Machine preset configurations | Required |
heartbeatTimeoutsMs | Timeout durations for execution states | See defaults |
retryWarmStartThresholdMs | Threshold for checkpoint-based retries | 30000 |
Sources: internal-packages/run-engine/src/engine/types.ts23-112 apps/webapp/app/v3/runEngine.server.ts15-198
All subsystems share a common set of resources defined by the SystemResources interface:
Sources: internal-packages/run-engine/src/engine/systems/systems.ts
The execution lifecycle follows a well-defined flow from triggering a task to its completion or failure. Each stage involves state transitions tracked through execution snapshots.
Task Execution Flow with Code Methods
Sources: internal-packages/run-engine/src/engine/index.ts392-733 internal-packages/run-engine/src/engine/systems/dequeueSystem.ts105-603 internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts298-632
The RunEngine.trigger() method creates a new task run and queues it for execution:
Method Signature:
Key Steps:
debounce param provided, call debounceSystem.handleDebounce()
existing)max_duration_exceeded)debounceClaimId (status: new)prisma.taskRun.create() with:
id: Generated via RunId.fromFriendlyId()status: "DELAYED" if delayUntil provided, else "PENDING"executionSnapshots: Initial snapshot with status "DELAYED" or "RUN_CREATED"associatedWaitpoint: Created via waitpointSystem.buildRunAssociatedWaitpoint()resumeParentOnCompletion and parentTaskRunId:
waitpointSystem.blockRunWithWaitpoint() to block parent rundelayUntil: Call delayedRunSystem.scheduleDelayedRunEnqueuing()enqueueSystem.enqueueRun() and optionally ttlSystem.scheduleExpireRun()eventBus.emit('runCreated')Sources: internal-packages/run-engine/src/engine/index.ts392-733 internal-packages/run-engine/src/engine/types.ts122-194
The DequeueSystem handles retrieving runs from queues and preparing them for execution:
Key Steps:
RunQueue.dequeueMessageFromWorkerQueue() - Get run from Redis queueRunLockerQUEUED or QUEUED_EXECUTING)BackgroundWorkerTask, BackgroundWorker, TaskQueue, WorkerDeploymentTaskRun with lock information (lockedAt, lockedById, lockedToVersionId)PENDING_EXECUTINGDequeuedMessage with execution detailsSources: internal-packages/run-engine/src/engine/systems/dequeueSystem.ts105-603
The RunAttemptSystem.startRunAttempt() method transitions a dequeued run to active execution:
Key Steps:
attemptNumber (starting at 1)MAX_TASK_RUN_ATTEMPTS (100)TaskRun status to EXECUTINGEXECUTINGTaskRunExecution object for workerSources: internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts300-635
The completeRunAttempt() method finalizes an execution attempt, handling both success and failure:
Success Path (attemptSucceeded):
TaskRun status to COMPLETED_SUCCESSFULLYFINISHED snapshotrunAttemptCompleted eventFailure Path (attemptFailed):
retryOutcomeFromCompletion()EnqueueSystem.enqueueRun() with retry metadataWAITING_FOR_DEPLOY or PENDINGCOMPLETED_WITH_ERRORS, SYSTEM_FAILURE, etc.)Sources: internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts637-1120
The execution engine tracks run state using two primary mechanisms: TaskRunStatus (on the TaskRun table) and TaskRunExecutionStatus (on the TaskRunExecutionSnapshot table). The snapshot-based approach provides an audit trail of all state transitions.
The ExecutionSnapshotSystem manages the creation and retrieval of execution snapshots, which track the detailed state of a run at specific points in time.
TaskRunExecutionSnapshot Fields:
| Field | Type | Purpose |
|---|---|---|
id | String | Unique snapshot ID |
runId | String | Associated task run |
executionStatus | Enum | Current execution state |
runStatus | Enum | TaskRun status at snapshot time |
attemptNumber | Int | Attempt number (1-based) |
description | String | Human-readable state description |
previousSnapshotId | String | Previous snapshot (linked list) |
checkpointId | String | Associated checkpoint if applicable |
batchId | String | Batch context if applicable |
completedWaitpoints | Relation | Waitpoints completed at this point |
metadata | JSON | Additional state metadata |
Sources: internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts1-393
Valid TaskRunExecutionStatus Values:
Sources: internal-packages/run-engine/src/engine/statuses.ts1-62
The engine uses helper functions to classify execution states:
Sources: internal-packages/run-engine/src/engine/statuses.ts3-62
The ExecutionSnapshotSystem provides centralized snapshot management with helpers for creating snapshots and retrieving execution state.
Key Methods:
| Method | Purpose |
|---|---|
createExecutionSnapshot() | Creates a new snapshot with state transition |
getLatestExecutionSnapshot() | Retrieves most recent valid snapshot |
getExecutionSnapshotsSince() | Gets all snapshots after a given snapshot |
executionResultFromSnapshot() | Converts snapshot to API result format |
executionDataFromSnapshot() | Converts to full execution data with waitpoints |
Snapshot Heartbeat Monitoring:
The system schedules heartbeat timeout jobs based on execution status:
When a heartbeat timeout expires, the heartbeatSnapshot worker job is executed to handle stalled runs.
Sources: internal-packages/run-engine/src/engine/systems/executionSnapshotSystem.ts226-393
The RunAttemptSystem manages the lifecycle of individual execution attempts, including starting, completing, and handling failures.
Caching Strategy:
The system uses a multi-tier cache (UnkeyCache) with memory and Redis stores to minimize database queries:
| Cache Namespace | Fresh TTL | Stale TTL | Contents |
|---|---|---|---|
orgs | 24h | 48h | Organization info |
projects | 24h | 48h | Project info |
tasks | 24h | 48h | Task metadata |
machinePresets | 24h | 48h | Machine configurations |
deployments | 24h | 48h | Deployment info |
queues | 1h | 2h | Queue configurations |
Retry Logic:
The retryOutcomeFromCompletion() function determines whether a failed attempt should be retried:
attemptNumber < maxAttemptsRetryOutcome with delay and reasonSources: internal-packages/run-engine/src/engine/systems/runAttemptSystem.ts110-1631 internal-packages/run-engine/src/engine/retrying.ts
The DequeueSystem handles the complex process of selecting and preparing runs for execution.
Dequeue Process with Code Methods
Background Worker Resolution:
The system must resolve the following before execution:
BackgroundWorker - Specific deployed versionBackgroundWorkerTask - Task definition within that versionTaskQueue - Queue configurationWorkerDeployment - Container image reference (for production)If any are missing, the run enters WAITING_FOR_DEPLOY status.
Sources: internal-packages/run-engine/src/engine/systems/dequeueSystem.ts88-630 internal-packages/run-engine/src/engine/statuses.ts3-6
The WaitpointSystem enables runs to block and wait for external events or conditions. Waitpoints support orchestration patterns like triggerAndWait, waitForDuration, and waitUntil.
Waitpoint Types:
| Type | Purpose | Completion Trigger |
|---|---|---|
RUN | Wait for another task run to complete | Task run completes |
DATETIME | Wait until a specific time | Worker job at scheduled time |
MANUAL | Wait for manual approval/signal | External API call |
BATCH | Wait for all batch items to complete | All runs in batch complete |
Key Operations:
Blocking Mechanism:
When a run is blocked:
TaskRunWaitpoint table linking run to waitpoint(s)EXECUTING_WITH_WAITPOINTS or SUSPENDEDcontinueRunIfUnblocked jobSources: internal-packages/run-engine/src/engine/systems/waitpointSystem.ts40-622
The CheckpointSystem enables runs to save execution state and resume later, supporting long-running tasks and graceful shutdowns.
Checkpoint Creation Process with Code Methods
Checkpoint Data:
| Field | Type | Purpose |
|---|---|---|
type | Enum | DOCKER or KUBERNETES |
location | String | Storage location (S3, registry, etc.) |
imageRef | String | Container image reference |
reason | String | Why checkpoint was created |
Resume Flow:
continueRunExecution(runId, snapshotId)PENDING_EXECUTINGTaskRun status to EXECUTINGEXECUTING snapshotSources: internal-packages/run-engine/src/engine/systems/checkpointSystem.ts21-250 internal-packages/run-engine/src/engine/statuses.ts21-32
The BatchSystem manages batch operations using a waitpoint-based approach where a batch run waits for all child runs to complete.
Batch Waitpoint Flow:
createBatchWaitpoint() creates a BATCH type waitpointbatchId and resumeParentOnCompletion=trueTaskRunWaitpoint entries linking to batch waitpointincrementCompletedBatchItems() is calleditemCount == completedItemCount, batch waitpoint completesKey Methods:
Sources: internal-packages/run-engine/src/engine/systems/batchSystem.ts
The EnqueueSystem handles adding runs to the execution queue, including retry scenarios.
Enqueue Process:
QUEUED or QUEUED_EXECUTING)RunQueue.enqueue() with organization ID and run IDKey Parameters:
| Parameter | Purpose |
|---|---|
snapshot.status | QUEUED (normal) or QUEUED_EXECUTING (while executing) |
snapshot.metadata | Retry information, delay data, etc. |
previousSnapshotId | Links to previous snapshot |
checkpointId | If resuming from checkpoint |
completedWaitpoints | Carry forward completed waitpoints |
Sources: internal-packages/run-engine/src/engine/systems/enqueueSystem.ts16-92
The DelayedRunSystem handles runs scheduled to start at a future time.
Delayed Run Flow:
Key Operations:
scheduleDelayedRunEnqueuing() - Schedule worker job for future timeenqueueDelayedRun() - Execute scheduled enqueue operationrescheduleDelayedRun() - Change delay time (used by debounce)Sources: internal-packages/run-engine/src/engine/systems/delayedRunSystem.ts14-192
The DebounceSystem implements leading and trailing debounce patterns for task triggering.
Debounce Modes:
| Mode | Behavior |
|---|---|
leading | Execute first trigger immediately, ignore subsequent within delay |
trailing | Schedule execution after delay, reschedule on new triggers |
Implementation:
Uses Redis for coordination:
debounce:{environmentId}:{taskIdentifier}:{debounceKey}{claimId, runId, delayUntil}Trailing Mode Update:
When a debounced run already exists:
TaskRun payload/metadata with latest valuesenqueueDelayedRun to new delay timeSources: internal-packages/run-engine/src/engine/systems/debounceSystem.ts
The TtlSystem handles automatic expiration of runs that exceed their time-to-live.
TTL Expiration Process:
expireRun worker job for TTL timeEXPIRED statusSources: internal-packages/run-engine/src/engine/systems/ttlSystem.ts15-63
The RunLocker class provides distributed locking using Redlock to prevent race conditions when multiple processes access the same run.
Locking Architecture:
Configuration:
| Parameter | Default | Purpose |
|---|---|---|
duration | 5000ms | Lock duration before expiry |
automaticExtensionThreshold | 1000ms | When to auto-extend lock |
retryConfig.maxAttempts | 10 | Maximum acquisition attempts |
retryConfig.baseDelay | 100ms | Initial retry delay |
retryConfig.maxDelay | 3000ms | Maximum retry delay |
retryConfig.backoffMultiplier | 1.8 | Exponential backoff factor |
retryConfig.jitterFactor | 0.15 | Random jitter percentage |
retryConfig.maxTotalWaitTime | 15000ms | Total retry timeout |
Nested Lock Optimization:
When a lock is requested for resources already held, the system reuses the existing lock instead of acquiring a new one, preventing deadlocks and improving performance.
Manual Locking:
For long-running operations, the system supports manual lock management:
Sources: internal-packages/run-engine/src/engine/locking.ts70-497
The RunQueue manages task queuing using a hierarchical Redis-based system with fair queue selection and concurrency control.
Queue Hierarchy:
Fair Queue Selection Algorithm
The FairQueueSelectionStrategy uses a weighted scoring algorithm to select which environment to dequeue from:
Algorithm Components:
| Component | Code Reference | Default | Purpose |
|---|---|---|---|
concurrencyLimitBias | biases.concurrencyLimitBias | 0.75 | Prefer environments with higher concurrency limits |
availableCapacityBias | biases.availableCapacityBias | 0.3 | Prefer environments with more available capacity |
queueAgeRandomization | biases.queueAgeRandomization | 0.25 | Add randomness to prevent starvation |
Concurrency Management Methods:
Each environment has concurrency limits tracked in Redis:
Redis Key Structure:
runqueue:concurrency:{environmentId}runqueue:concurrency:tokens:{environmentId}:{runId}RuntimeEnvironment.maximumConcurrencyLimitRuntimeEnvironment.concurrencyLimitBurstFactorSources: internal-packages/run-engine/src/run-queue/index.ts internal-packages/run-engine/src/run-queue/fairQueueSelectionStrategy.ts1-346
The engine uses a Worker (from @trigger.dev/redis-worker) to process background jobs. Jobs are defined in the worker catalog:
| Job Type | Payload | Purpose |
|---|---|---|
finishWaitpoint | {waitpointId, error?} | Complete a waitpoint |
heartbeatSnapshot | {runId, snapshotId, restartAttempt?} | Handle stalled snapshot |
repairSnapshot | {runId, snapshotId, executionStatus} | Repair invalid snapshot |
expireRun | {runId} | Expire run on TTL |
cancelRun | {runId, completedAt, reason?} | Cancel a run |
queueRunsPendingVersion | {backgroundWorkerId} | Enqueue runs waiting for deployment |
tryCompleteBatch | {batchId} | Attempt batch completion |
continueRunIfUnblocked | {runId} | Resume run if waitpoints completed |
enqueueDelayedRun | {runId} | Enqueue delayed run |
Sources: internal-packages/run-engine/src/engine/workerCatalog.ts1-66
The RunEngine is configured via the RunEngineOptions interface. The webapp instantiates it with environment variables:
Key Configuration Groups:
| Group | Environment Variables | Purpose |
|---|---|---|
| Worker | RUN_ENGINE_WORKER_COUNT, RUN_ENGINE_TASKS_PER_WORKER | Background job processing |
| Queue | RUN_ENGINE_RUN_QUEUE_REDIS_*, DEFAULT_ENV_EXECUTION_CONCURRENCY_LIMIT | Queue connections and limits |
| Locks | RUN_ENGINE_RUN_LOCK_REDIS_*, RUN_ENGINE_RUN_LOCK_DURATION | Distributed locking |
| Timeouts | RUN_ENGINE_TIMEOUT_PENDING_EXECUTING, RUN_ENGINE_TIMEOUT_EXECUTING | Heartbeat timeouts |
| Retries | RUN_ENGINE_RETRY_WARM_START_THRESHOLD_MS | Checkpoint-based retry threshold |
| Batch | BATCH_QUEUE_CONSUMER_COUNT, BATCH_QUEUE_DRR_QUANTUM | Batch processing via DRR |
Example Initialization:
The webapp creates a singleton instance with production configuration loaded from environment variables and platform services.
Sources: apps/webapp/app/v3/runEngine.server.ts15-198 apps/webapp/app/env.server.ts560-694
The RunEngine emits events through an EventBus (Node.js EventEmitter) to notify other parts of the system about state changes:
Key Events:
| Event | Payload | Purpose |
|---|---|---|
runCreated | {time, runId} | New run created |
runLocked | {time, run, organization, project, environment} | Run dequeued and locked |
runAttemptStarted | {time, run, organization, project, environment} | Attempt started |
runAttemptCompleted | {time, run, organization, project, environment} | Attempt finished |
runStatusChanged | {time, run, organization, project, environment} | Status changed |
runEnqueuedAfterDelay | {time, run, organization, project, environment} | Delayed run queued |
runDelayRescheduled | {time, run, organization, project, environment} | Delay time changed |
cachedRunCompleted | {time, span, blockedRunId, hasError, cachedRunId} | Cached run completed (debounce) |
incomingCheckpointDiscarded | {time, run, checkpoint, snapshot} | Invalid checkpoint rejected |
These events can be subscribed to for metrics, logging, and integration with other systems.
Sources: internal-packages/run-engine/src/engine/eventBus.ts
Refresh this wiki
This wiki was recently refreshed. Please wait 5 days to refresh again.