The C# SDK exposes worker metrics via the standard System.Diagnostics.Metrics API, making them
compatible with any .NET metrics listener -- most notably the
OpenTelemetry .NET SDK.
All metrics are registered under the meter named Conductor.Client.
| Name | Type | Labels | Description |
|---|---|---|---|
task_poll_total |
Counter | task_type |
Total task poll attempts |
task_poll_error_total |
Counter | task_type, error_type |
Total task poll errors |
task_execute_error_total |
Counter | task_type, error_type |
Total task execution errors |
task_update_error_total |
Counter | task_type |
Total task update errors (after all retries) |
task_paused_total |
Counter | task_type |
Polls skipped because the worker is paused |
task_execution_queue_full_total |
Counter | task_type |
Polls returning zero capacity (all workers busy) |
thread_uncaught_exceptions_total |
Counter | -- | Uncaught exceptions in worker threads |
workflow_start_error_total |
Counter | workflow_type |
Errors starting workflows |
external_payload_used_total |
Counter | entity_name, operation, payload_type |
External payload storage usage |
task_poll_time_seconds |
Histogram | task_type |
Task poll round-trip duration (seconds) |
task_execute_time_seconds |
Histogram | task_type |
Task execution duration (seconds) |
task_update_time_seconds |
Histogram | task_type |
Task result update duration (seconds) |
task_result_size_bytes |
Histogram | task_type |
Task result payload size (bytes) |
workflow_input_size_bytes |
Histogram | workflow_type, version |
Workflow input payload size (bytes) |
active_workers |
Gauge | task_type |
Workers currently executing tasks |
MetricsCollector is automatically registered as a singleton when you call AddConductorWorker().
No additional setup is needed for the SDK to start recording -- metrics are written to
System.Diagnostics.Metrics instruments immediately.
var host = new HostBuilder()
.ConfigureServices(services =>
{
// MetricsCollector is registered automatically here.
services.AddConductorWorker(config);
services.AddConductorWorkflowTask(new MyWorker());
services.WithHostedService();
})
.Build();To expose the metrics externally (e.g. Prometheus scraping), attach a metrics listener or exporter as shown below.
If you use the one-liner WorkflowTaskHost.CreateWorkerHost(...), metrics are registered
automatically via the same AddConductorWorker() call:
var host = WorkflowTaskHost.CreateWorkerHost(config, workers: new MyWorker());
await host.RunAsync();Add the following NuGet packages to your project:
dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Exporter.Prometheus.HttpListener --prerelease
Then configure a MeterProvider before starting the host:
using OpenTelemetry;
using OpenTelemetry.Metrics;
using Conductor.Client.Telemetry;
var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddMeter(MetricsCollector.MeterName) // "Conductor.Client"
.AddPrometheusHttpListener(options =>
{
options.UriPrefixes = new[] { "http://*:9090/" };
})
.Build();
// ... start the host ...
// Dispose when shutting down.
meterProvider?.Dispose();Metrics are now available at http://localhost:9090/metrics in Prometheus text format.
For quick debugging, the OpenTelemetry console exporter prints metrics to stdout:
dotnet add package OpenTelemetry.Exporter.Console
var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddMeter(MetricsCollector.MeterName)
.AddConsoleExporter()
.Build();Monotonically increasing values. Prometheus exposes them with a _total suffix.
| Name | Labels | Description |
|---|---|---|
task_poll_total |
task_type |
Incremented once per poll round (regardless of how many tasks are returned). |
task_poll_error_total |
task_type, error_type |
Incremented when a poll HTTP call fails. error_type is the exception class name. |
task_execute_error_total |
task_type, error_type |
Incremented when Execute() throws. error_type is the exception class name. |
task_update_error_total |
task_type |
Incremented when all update retries are exhausted. |
task_paused_total |
task_type |
Incremented when a poll is skipped because the worker is paused. |
task_execution_queue_full_total |
task_type |
Incremented when a poll is skipped because all workers are busy (batch size reached). |
thread_uncaught_exceptions_total |
-- | Incremented on any exception in the top-level poll loop that is not an OperationCanceledException. |
workflow_start_error_total |
workflow_type |
Incremented when a workflow start call fails. |
external_payload_used_total |
entity_name, operation, payload_type |
Incremented when external payload storage is used. |
Distribution metrics with sum, count, and bucket breakdowns. All time values are in seconds. All size values are in bytes.
| Name | Labels | Unit | Description |
|---|---|---|---|
task_poll_time_seconds |
task_type |
seconds | Wall-clock time for the poll HTTP call. |
task_execute_time_seconds |
task_type |
seconds | Wall-clock time inside worker.Execute(). |
task_update_time_seconds |
task_type |
seconds | Wall-clock time for the update call (including retries). |
task_result_size_bytes |
task_type |
bytes | JSON-serialized size of TaskResult.OutputData. |
workflow_input_size_bytes |
workflow_type, version |
bytes | Workflow input payload size. |
Point-in-time values sampled by the metrics listener.
| Name | Labels | Description |
|---|---|---|
active_workers |
task_type |
Number of concurrent task executions in progress. Updated on every poll cycle. |
| Label | Used By | Values |
|---|---|---|
task_type |
Most metrics | Task definition name (e.g. "my_worker") |
error_type |
task_poll_error_total, task_execute_error_total |
Exception class name (e.g. "HttpRequestException") |
workflow_type |
workflow_start_error_total, workflow_input_size_bytes |
Workflow definition name |
version |
workflow_input_size_bytes |
Workflow version string |
entity_name |
external_payload_used_total |
Entity name |
operation |
external_payload_used_total |
Operation name |
payload_type |
external_payload_used_total |
Payload type (e.g. "TASK_INPUT", "TASK_OUTPUT") |
When scraped by Prometheus (via the OpenTelemetry exporter), the output looks like:
# HELP task_poll_total Total task poll attempts
# TYPE task_poll_total counter
task_poll_total{task_type="my_worker"} 142
# HELP task_poll_time_seconds Task poll round-trip duration (seconds)
# TYPE task_poll_time_seconds histogram
task_poll_time_seconds_bucket{task_type="my_worker",le="0.005"} 12
task_poll_time_seconds_bucket{task_type="my_worker",le="0.01"} 45
task_poll_time_seconds_bucket{task_type="my_worker",le="0.025"} 98
task_poll_time_seconds_bucket{task_type="my_worker",le="0.05"} 120
task_poll_time_seconds_bucket{task_type="my_worker",le="0.1"} 135
task_poll_time_seconds_bucket{task_type="my_worker",le="0.25"} 140
task_poll_time_seconds_bucket{task_type="my_worker",le="0.5"} 142
task_poll_time_seconds_bucket{task_type="my_worker",le="1"} 142
task_poll_time_seconds_bucket{task_type="my_worker",le="+Inf"} 142
task_poll_time_seconds_sum{task_type="my_worker"} 3.842
task_poll_time_seconds_count{task_type="my_worker"} 142
# HELP task_execute_time_seconds Task execution duration (seconds)
# TYPE task_execute_time_seconds histogram
task_execute_time_seconds_bucket{task_type="my_worker",le="0.25"} 50
task_execute_time_seconds_bucket{task_type="my_worker",le="0.5"} 80
task_execute_time_seconds_bucket{task_type="my_worker",le="1"} 110
task_execute_time_seconds_bucket{task_type="my_worker",le="2.5"} 135
task_execute_time_seconds_bucket{task_type="my_worker",le="+Inf"} 142
task_execute_time_seconds_sum{task_type="my_worker"} 98.553
task_execute_time_seconds_count{task_type="my_worker"} 142
# HELP active_workers Workers currently executing tasks
# TYPE active_workers gauge
active_workers{task_type="my_worker"} 5
# HELP task_execution_queue_full_total Polls returning zero capacity
# TYPE task_execution_queue_full_total counter
task_execution_queue_full_total{task_type="my_worker"} 3
-
Use the OpenTelemetry Prometheus exporter for production. It serves a standard
/metricsendpoint that Prometheus can scrape directly. -
Set histogram bucket boundaries via OpenTelemetry Views if the defaults don't match your workload. For example, if your workers are consistently fast (< 100ms), add more fine-grained lower buckets:
builder.AddView("task_execute_time_seconds", new ExplicitBucketHistogramConfiguration { Boundaries = new double[] { 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5 } });
-
Alert on
task_update_error_total. A non-zero rate means task results are being lost after all retries are exhausted -- this is a critical failure. -
Monitor
task_execution_queue_full_total. A sustained rate indicates the worker needs more capacity (increaseBatchSizeor add replicas). -
Use
rate()on counters, not raw values. For example:rate(task_poll_total{task_type="my_worker"}[5m]) -
Track p99 execution latency using histogram quantiles:
histogram_quantile(0.99, rate(task_execute_time_seconds_bucket[5m])) -
The
MetricsCollectoris available as a singleton via DI. You can inject it into your own services to recordworkflow_start_error_total,external_payload_used_total, or any other metrics that occur outside the poll loop.