open-telemetry.md

OpenTelemetry Integration

The OpenTelemetry integration in Feast provides comprehensive monitoring and observability capabilities for your feature serving infrastructure. This component enables you to track key metrics, traces, and logs from your Feast deployment.

Motivation

Monitoring and observability are critical for production machine learning systems. The OpenTelemetry integration addresses these needs by:

Performance Monitoring: Track CPU and memory usage of feature servers
Operational Insights: Collect metrics to understand system behavior and performance
Troubleshooting: Enable effective debugging through distributed tracing
Resource Optimization: Monitor resource utilization to optimize deployments
Production Readiness: Provide enterprise-grade observability capabilities

Architecture

The OpenTelemetry integration in Feast consists of several components working together:

OpenTelemetry Collector: Receives, processes, and exports telemetry data
Prometheus Integration: Enables metrics collection and monitoring
Instrumentation: Automatic Python instrumentation for tracking metrics
Exporters: Components that send telemetry data to monitoring systems

Key Features

Automated Instrumentation: Python auto-instrumentation for comprehensive metric collection
Metric Collection: Track key performance indicators including:
- Memory usage
- CPU utilization
- Request latencies
- Feature retrieval statistics
Flexible Configuration: Customizable metric collection and export settings
Kubernetes Integration: Native support for Kubernetes deployments
Prometheus Compatibility: Integration with Prometheus for metrics visualization

Setup and Configuration

To add monitoring to the Feast Feature Server, follow these steps:

1. Deploy Prometheus Operator

Follow the Prometheus Operator documentation to install the operator.

2. Deploy OpenTelemetry Operator

Before installing the OpenTelemetry Operator:

Install cert-manager
Validate that the pods are running
Apply the OpenTelemetry operator:

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml

For additional installation steps, refer to the OpenTelemetry Operator documentation.

3. Configure OpenTelemetry Collector

Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file:

metrics:
  enabled: true
  otelCollector:
    endpoint: "otel-collector.default.svc.cluster.local:4317"  # sample
    headers:
      api-key: "your-api-key"

4. Add Instrumentation Configuration

Add the following annotations and environment variables to your deployment.yaml:

template:
  metadata:
    annotations:
      instrumentation.opentelemetry.io/inject-python: "true"

- name: OTEL_EXPORTER_OTLP_ENDPOINT
  value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}}
- name: OTEL_EXPORTER_OTLP_INSECURE
  value: "true"

5. Add Metric Checks

Add metric checks to all manifests and deployment files:

{{ if .Values.metrics.enabled }}
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
  name: feast-instrumentation
spec:
  exporter:
    endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318
  env:
  propagators:
    - tracecontext
    - baggage
  python:
    env:
      - name: OTEL_METRICS_EXPORTER
        value: console,otlp_proto_http
      - name: OTEL_LOGS_EXPORTER
        value: otlp_proto_http
      - name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
        value: "true"
{{end}}

6. Add Required Manifests

Add the following components to your chart:

Instrumentation
OpenTelemetryCollector
ServiceMonitors
Prometheus Instance
RBAC rules

7. Deploy Feast

Deploy Feast with metrics enabled:

helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64=""

Usage

To enable OpenTelemetry monitoring in your Feast deployment:

Set metrics.enabled=true in your Helm values
Configure the OpenTelemetry Collector endpoint
Deploy with proper annotations and environment variables

Example configuration:

metrics:
  enabled: true
  otelCollector:
    endpoint: "otel-collector.default.svc.cluster.local:4317"

Monitoring

Once configured, you can monitor various metrics including:

feast_feature_server_memory_usage: Memory utilization of the feature server
feast_feature_server_cpu_usage: CPU usage statistics
feast_feature_server_request_latency_seconds: Request latency with feature count dimensions
feast_feature_server_online_store_read_duration_seconds: Online store read phase duration
feast_feature_server_transformation_duration_seconds: ODFV read-path transformation duration (per ODFV, requires track_metrics=True)
feast_feature_server_write_transformation_duration_seconds: ODFV write-path transformation duration (per ODFV, requires track_metrics=True)
Additional custom metrics based on your configuration

For the full list of metrics, see the Python Feature Server reference.

These metrics can be visualized using Prometheus and other compatible monitoring tools.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenTelemetry Integration

Motivation

Architecture

Key Features

Setup and Configuration

1. Deploy Prometheus Operator

2. Deploy OpenTelemetry Operator

3. Configure OpenTelemetry Collector

4. Add Instrumentation Configuration

5. Add Metric Checks

6. Add Required Manifests

7. Deploy Feast

Usage

Monitoring

FilesExpand file tree

open-telemetry.md

Latest commit

History

open-telemetry.md

File metadata and controls

OpenTelemetry Integration

Motivation

Architecture

Key Features

Setup and Configuration

1. Deploy Prometheus Operator

2. Deploy OpenTelemetry Operator

3. Configure OpenTelemetry Collector

4. Add Instrumentation Configuration

5. Add Metric Checks

6. Add Required Manifests

7. Deploy Feast

Usage

Monitoring