The OpenTelemetry integration in Feast provides comprehensive monitoring and observability capabilities for your feature serving infrastructure. This component enables you to track key metrics, traces, and logs from your Feast deployment.
Monitoring and observability are critical for production machine learning systems. The OpenTelemetry integration addresses these needs by:
- Performance Monitoring: Track CPU and memory usage of feature servers
- Operational Insights: Collect metrics to understand system behavior and performance
- Troubleshooting: Enable effective debugging through distributed tracing
- Resource Optimization: Monitor resource utilization to optimize deployments
- Production Readiness: Provide enterprise-grade observability capabilities
The OpenTelemetry integration in Feast consists of several components working together:
- OpenTelemetry Collector: Receives, processes, and exports telemetry data
- Prometheus Integration: Enables metrics collection and monitoring
- Instrumentation: Automatic Python instrumentation for tracking metrics
- Exporters: Components that send telemetry data to monitoring systems
- Automated Instrumentation: Python auto-instrumentation for comprehensive metric collection
- Metric Collection: Track key performance indicators including:
- Memory usage
- CPU utilization
- Request latencies
- Feature retrieval statistics
- Flexible Configuration: Customizable metric collection and export settings
- Kubernetes Integration: Native support for Kubernetes deployments
- Prometheus Compatibility: Integration with Prometheus for metrics visualization
To add monitoring to the Feast Feature Server, follow these steps:
Follow the Prometheus Operator documentation to install the operator.
Before installing the OpenTelemetry Operator:
- Install
cert-manager - Validate that the
podsare running - Apply the OpenTelemetry operator:
kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yamlFor additional installation steps, refer to the OpenTelemetry Operator documentation.
Add the OpenTelemetry Collector configuration under the metrics section in your values.yaml file:
metrics:
enabled: true
otelCollector:
endpoint: "otel-collector.default.svc.cluster.local:4317" # sample
headers:
api-key: "your-api-key"Add the following annotations and environment variables to your deployment.yaml:
template:
metadata:
annotations:
instrumentation.opentelemetry.io/inject-python: "true"- name: OTEL_EXPORTER_OTLP_ENDPOINT
value: http://{{ .Values.service.name }}-collector.{{ .Release.namespace }}.svc.cluster.local:{{ .Values.metrics.endpoint.port}}
- name: OTEL_EXPORTER_OTLP_INSECURE
value: "true"Add metric checks to all manifests and deployment files:
{{ if .Values.metrics.enabled }}
apiVersion: opentelemetry.io/v1alpha1
kind: Instrumentation
metadata:
name: feast-instrumentation
spec:
exporter:
endpoint: http://{{ .Values.service.name }}-collector.{{ .Release.Namespace }}.svc.cluster.local:4318
env:
propagators:
- tracecontext
- baggage
python:
env:
- name: OTEL_METRICS_EXPORTER
value: console,otlp_proto_http
- name: OTEL_LOGS_EXPORTER
value: otlp_proto_http
- name: OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED
value: "true"
{{end}}Add the following components to your chart:
- Instrumentation
- OpenTelemetryCollector
- ServiceMonitors
- Prometheus Instance
- RBAC rules
Deploy Feast with metrics enabled:
helm install feast-release infra/charts/feast-feature-server --set metric=true --set feature_store_yaml_base64=""To enable OpenTelemetry monitoring in your Feast deployment:
- Set
metrics.enabled=truein your Helm values - Configure the OpenTelemetry Collector endpoint
- Deploy with proper annotations and environment variables
Example configuration:
metrics:
enabled: true
otelCollector:
endpoint: "otel-collector.default.svc.cluster.local:4317"Once configured, you can monitor various metrics including:
feast_feature_server_memory_usage: Memory utilization of the feature serverfeast_feature_server_cpu_usage: CPU usage statisticsfeast_feature_server_request_latency_seconds: Request latency with feature count dimensionsfeast_feature_server_online_store_read_duration_seconds: Online store read phase durationfeast_feature_server_transformation_duration_seconds: ODFV read-path transformation duration (per ODFV, requirestrack_metrics=True)feast_feature_server_write_transformation_duration_seconds: ODFV write-path transformation duration (per ODFV, requirestrack_metrics=True)- Additional custom metrics based on your configuration
For the full list of metrics, see the Python Feature Server reference.
These metrics can be visualized using Prometheus and other compatible monitoring tools.