METRICS.md

Metrics Documentation

The C# SDK exposes worker metrics via the standard System.Diagnostics.Metrics API, making them compatible with any .NET metrics listener -- most notably the OpenTelemetry .NET SDK.

Quick Reference

All metrics are registered under the meter named Conductor.Client.

Name	Type	Labels	Description
`task_poll_total`	Counter	`task_type`	Total task poll attempts
`task_poll_error_total`	Counter	`task_type`, `error_type`	Total task poll errors
`task_execute_error_total`	Counter	`task_type`, `error_type`	Total task execution errors
`task_update_error_total`	Counter	`task_type`	Total task update errors (after all retries)
`task_paused_total`	Counter	`task_type`	Polls skipped because the worker is paused
`task_execution_queue_full_total`	Counter	`task_type`	Polls returning zero capacity (all workers busy)
`thread_uncaught_exceptions_total`	Counter	--	Uncaught exceptions in worker threads
`workflow_start_error_total`	Counter	`workflow_type`	Errors starting workflows
`external_payload_used_total`	Counter	`entity_name`, `operation`, `payload_type`	External payload storage usage
`task_poll_time_seconds`	Histogram	`task_type`	Task poll round-trip duration (seconds)
`task_execute_time_seconds`	Histogram	`task_type`	Task execution duration (seconds)
`task_update_time_seconds`	Histogram	`task_type`	Task result update duration (seconds)
`task_result_size_bytes`	Histogram	`task_type`	Task result payload size (bytes)
`workflow_input_size_bytes`	Histogram	`workflow_type`, `version`	Workflow input payload size (bytes)
`active_workers`	Gauge	`task_type`	Workers currently executing tasks

Configuration

DI-Based Workers

MetricsCollector is automatically registered as a singleton when you call AddConductorWorker(). No additional setup is needed for the SDK to start recording -- metrics are written to System.Diagnostics.Metrics instruments immediately.

var host = new HostBuilder()
    .ConfigureServices(services =>
    {
        // MetricsCollector is registered automatically here.
        services.AddConductorWorker(config);
        services.AddConductorWorkflowTask(new MyWorker());
        services.WithHostedService();
    })
    .Build();

To expose the metrics externally (e.g. Prometheus scraping), attach a metrics listener or exporter as shown below.

WorkflowTaskHost Convenience API

If you use the one-liner WorkflowTaskHost.CreateWorkerHost(...), metrics are registered automatically via the same AddConductorWorker() call:

var host = WorkflowTaskHost.CreateWorkerHost(config, workers: new MyWorker());
await host.RunAsync();

Prometheus via OpenTelemetry

Add the following NuGet packages to your project:

dotnet add package OpenTelemetry
dotnet add package OpenTelemetry.Exporter.Prometheus.HttpListener --prerelease

Then configure a MeterProvider before starting the host:

using OpenTelemetry;
using OpenTelemetry.Metrics;
using Conductor.Client.Telemetry;

var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter(MetricsCollector.MeterName)   // "Conductor.Client"
    .AddPrometheusHttpListener(options =>
    {
        options.UriPrefixes = new[] { "http://*:9090/" };
    })
    .Build();

// ... start the host ...

// Dispose when shutting down.
meterProvider?.Dispose();

Metrics are now available at http://localhost:9090/metrics in Prometheus text format.

Console Exporter (Development)

For quick debugging, the OpenTelemetry console exporter prints metrics to stdout:

dotnet add package OpenTelemetry.Exporter.Console

var meterProvider = Sdk.CreateMeterProviderBuilder()
    .AddMeter(MetricsCollector.MeterName)
    .AddConsoleExporter()
    .Build();

Metric Types

Counters

Monotonically increasing values. Prometheus exposes them with a _total suffix.

Name	Labels	Description
`task_poll_total`	`task_type`	Incremented once per poll round (regardless of how many tasks are returned).
`task_poll_error_total`	`task_type`, `error_type`	Incremented when a poll HTTP call fails. `error_type` is the exception class name.
`task_execute_error_total`	`task_type`, `error_type`	Incremented when `Execute()` throws. `error_type` is the exception class name.
`task_update_error_total`	`task_type`	Incremented when all update retries are exhausted.
`task_paused_total`	`task_type`	Incremented when a poll is skipped because the worker is paused.
`task_execution_queue_full_total`	`task_type`	Incremented when a poll is skipped because all workers are busy (batch size reached).
`thread_uncaught_exceptions_total`	--	Incremented on any exception in the top-level poll loop that is not an `OperationCanceledException`.
`workflow_start_error_total`	`workflow_type`	Incremented when a workflow start call fails.
`external_payload_used_total`	`entity_name`, `operation`, `payload_type`	Incremented when external payload storage is used.

Histograms

Distribution metrics with sum, count, and bucket breakdowns. All time values are in seconds. All size values are in bytes.

Name	Labels	Unit	Description
`task_poll_time_seconds`	`task_type`	seconds	Wall-clock time for the poll HTTP call.
`task_execute_time_seconds`	`task_type`	seconds	Wall-clock time inside `worker.Execute()`.
`task_update_time_seconds`	`task_type`	seconds	Wall-clock time for the update call (including retries).
`task_result_size_bytes`	`task_type`	bytes	JSON-serialized size of `TaskResult.OutputData`.
`workflow_input_size_bytes`	`workflow_type`, `version`	bytes	Workflow input payload size.

Gauges

Point-in-time values sampled by the metrics listener.

Name	Labels	Description
`active_workers`	`task_type`	Number of concurrent task executions in progress. Updated on every poll cycle.

Labels

Label	Used By	Values
`task_type`	Most metrics	Task definition name (e.g. `"my_worker"`)
`error_type`	`task_poll_error_total`, `task_execute_error_total`	Exception class name (e.g. `"HttpRequestException"`)
`workflow_type`	`workflow_start_error_total`, `workflow_input_size_bytes`	Workflow definition name
`version`	`workflow_input_size_bytes`	Workflow version string
`entity_name`	`external_payload_used_total`	Entity name
`operation`	`external_payload_used_total`	Operation name
`payload_type`	`external_payload_used_total`	Payload type (e.g. `"TASK_INPUT"`, `"TASK_OUTPUT"`)

Example Metrics Output

When scraped by Prometheus (via the OpenTelemetry exporter), the output looks like:

# HELP task_poll_total Total task poll attempts
# TYPE task_poll_total counter
task_poll_total{task_type="my_worker"} 142

# HELP task_poll_time_seconds Task poll round-trip duration (seconds)
# TYPE task_poll_time_seconds histogram
task_poll_time_seconds_bucket{task_type="my_worker",le="0.005"} 12
task_poll_time_seconds_bucket{task_type="my_worker",le="0.01"} 45
task_poll_time_seconds_bucket{task_type="my_worker",le="0.025"} 98
task_poll_time_seconds_bucket{task_type="my_worker",le="0.05"} 120
task_poll_time_seconds_bucket{task_type="my_worker",le="0.1"} 135
task_poll_time_seconds_bucket{task_type="my_worker",le="0.25"} 140
task_poll_time_seconds_bucket{task_type="my_worker",le="0.5"} 142
task_poll_time_seconds_bucket{task_type="my_worker",le="1"} 142
task_poll_time_seconds_bucket{task_type="my_worker",le="+Inf"} 142
task_poll_time_seconds_sum{task_type="my_worker"} 3.842
task_poll_time_seconds_count{task_type="my_worker"} 142

# HELP task_execute_time_seconds Task execution duration (seconds)
# TYPE task_execute_time_seconds histogram
task_execute_time_seconds_bucket{task_type="my_worker",le="0.25"} 50
task_execute_time_seconds_bucket{task_type="my_worker",le="0.5"} 80
task_execute_time_seconds_bucket{task_type="my_worker",le="1"} 110
task_execute_time_seconds_bucket{task_type="my_worker",le="2.5"} 135
task_execute_time_seconds_bucket{task_type="my_worker",le="+Inf"} 142
task_execute_time_seconds_sum{task_type="my_worker"} 98.553
task_execute_time_seconds_count{task_type="my_worker"} 142

# HELP active_workers Workers currently executing tasks
# TYPE active_workers gauge
active_workers{task_type="my_worker"} 5

# HELP task_execution_queue_full_total Polls returning zero capacity
# TYPE task_execution_queue_full_total counter
task_execution_queue_full_total{task_type="my_worker"} 3

Best Practices

Use the OpenTelemetry Prometheus exporter for production. It serves a standard /metrics endpoint that Prometheus can scrape directly.

Set histogram bucket boundaries via OpenTelemetry Views if the defaults don't match your workload. For example, if your workers are consistently fast (< 100ms), add more fine-grained lower buckets:

builder.AddView("task_execute_time_seconds",
    new ExplicitBucketHistogramConfiguration
    {
        Boundaries = new double[] { 0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5 }
    });

Alert on task_update_error_total. A non-zero rate means task results are being lost after all retries are exhausted -- this is a critical failure.
Monitor task_execution_queue_full_total. A sustained rate indicates the worker needs more capacity (increase BatchSize or add replicas).

Use rate() on counters, not raw values. For example:

rate(task_poll_total{task_type="my_worker"}[5m])

Track p99 execution latency using histogram quantiles:

histogram_quantile(0.99, rate(task_execute_time_seconds_bucket[5m]))

The MetricsCollector is available as a singleton via DI. You can inject it into your own services to record workflow_start_error_total, external_payload_used_total, or any other metrics that occur outside the poll loop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics Documentation

Table of Contents

Quick Reference

Configuration

DI-Based Workers

WorkflowTaskHost Convenience API

Prometheus via OpenTelemetry

Console Exporter (Development)

Metric Types

Counters

Histograms

Gauges

Labels

Example Metrics Output

Best Practices

FilesExpand file tree

METRICS.md

Latest commit

History

METRICS.md

File metadata and controls

Metrics Documentation

Table of Contents

Quick Reference

Configuration

DI-Based Workers

WorkflowTaskHost Convenience API

Prometheus via OpenTelemetry

Console Exporter (Development)

Metric Types

Counters

Histograms

Gauges

Labels

Example Metrics Output

Best Practices