This document describes best practices and common patterns for deploying Feast in production environments. It covers registry configuration choices, persistence strategies, deployment topologies, and operational considerations for running Feast at scale. For details on the Kubernetes Operator itself, see Kubernetes Operator. For Helm-based deployments, see Helm Charts.
The registry is the central metadata store that tracks all feature definitions, data sources, and infrastructure configurations. The choice of registry backend significantly impacts production scalability and reliability.
File-based registries serialize metadata to a single file stored in local filesystem, S3, or GCS. This is suitable for small-scale deployments or development environments.
Configuration:
Limitations:
The file-based registry is configured via the path field in infra/feast-operator/api/v1alpha1/featurestore_types.go421-425
SQL-based registries use a database (PostgreSQL, MySQL, or SQLite with WAL mode) to store metadata, enabling concurrent access and improved reliability.
Configuration:
The SQL registry backend provides:
SQL registry configuration is defined in infra/feast-operator/api/v1alpha1/featurestore_types.go438-447 with persistence types including sql and snowflake.registry.
A remote registry server exposes the registry as a gRPC/REST service, allowing clients to access metadata without direct database access.
Diagram: Remote Registry Architecture
Remote registry configuration is defined via the remote field in infra/feast-operator/api/v1alpha1/featurestore_types.go493-501 The operator can deploy a registry server container as defined in infra/feast-operator/internal/controller/services/services.go427-428
Benefits:
Sources: docs/how-to-guides/running-feast-in-production.md33-38 infra/feast-operator/api/v1alpha1/featurestore_types.go402-452 infra/feast-operator/internal/controller/services/services_types.go78-82
Offline stores provide historical feature data for training dataset generation. The choice depends on your existing data infrastructure and scale requirements.
| Persistence Type | Use Case | Configuration |
|---|---|---|
| File-based (Parquet, DuckDB) | Development, small datasets | type: file, type: duckdb |
| Cloud Data Warehouses | Production, large-scale | type: snowflake.offline, type: bigquery, type: redshift |
| Distributed Computing | Custom infrastructure | type: spark, type: ray |
File-based offline stores are configured in infra/feast-operator/api/v1alpha1/featurestore_types.go310-321 with support for dask, duckdb, and file types.
Database-backed offline stores require a Kubernetes Secret containing connection parameters, as defined in infra/feast-operator/api/v1alpha1/featurestore_types.go323-346 Valid types include snowflake.offline, bigquery, redshift, spark, postgres, trino, athena, mssql, couchbase.offline, clickhouse, and ray.
Online stores serve low-latency features for real-time inference. The persistence choice impacts latency, throughput, and operational complexity.
Diagram: Online Store Persistence Options
Ephemeral vs. Persistent:
The operator supports file-based online stores with optional PVC mounting as defined in infra/feast-operator/api/v1alpha1/featurestore_types.go362-369 Database-backed options are configured via Kubernetes Secrets as shown in infra/feast-operator/api/v1alpha1/featurestore_types.go371-400
Registry persistence directly impacts the availability and consistency of your feature store metadata.
File Persistence with PVC:
SQL Database Persistence:
Registry file persistence supports S3 and GCS URIs with additional configuration options in infra/feast-operator/api/v1alpha1/featurestore_types.go421-436 including cache_ttl_seconds and cache_mode settings.
Sources: infra/feast-operator/api/v1alpha1/featurestore_types.go296-453 infra/feast-operator/internal/controller/services/services.go154-221 docs/how-to-guides/running-feast-in-production.md33-38
The most common production pattern deploys all Feast services within a single Kubernetes cluster using the Feast Operator.
Diagram: Single-Cluster Deployment Architecture
The operator creates a single Deployment with multiple containers, as orchestrated by infra/feast-operator/internal/controller/services/services.go330-342 Each service type (registry, online, offline, UI) runs as a separate container within the same pod, configured in infra/feast-operator/internal/controller/services/services.go420-440
Key characteristics:
For large organizations or multi-region deployments, a remote registry pattern allows multiple clusters to share a single source of truth.
Diagram: Multi-Cluster Deployment with Remote Registry
Remote registry configuration uses the RemoteRegistryConfig structure in infra/feast-operator/api/v1alpha1/featurestore_types.go493-501 which supports either a direct hostname or a reference to another FeatureStore CR using FeastRef.
Benefits:
Configuration example:
Hybrid deployments combine Feast services with existing data infrastructure, useful during migration or for specific compliance requirements.
Common patterns:
The operator's flexible configuration supports these patterns through the services specification in infra/feast-operator/api/v1alpha1/featurestore_types.go281-294
Sources: infra/feast-operator/internal/controller/services/services.go52-152 infra/feast-operator/api/v1alpha1/featurestore_types.go486-502 docs/how-to-guides/running-feast-in-production.md1-18
Materialization is the process of moving feature values from offline storage to online storage for low-latency serving.
The most common production pattern uses scheduled jobs to periodically materialize features.
Diagram: Scheduled Batch Materialization
Kubernetes CronJob Pattern:
The operator supports automated materialization via the cronJob specification in infra/feast-operator/api/v1alpha1/featurestore_types.go112-156 By default, it executes feast apply and feast materialize-incremental commands as defined in infra/feast-operator/internal/controller/services/cronjob.go
Example configuration:
Airflow Integration:
For more complex orchestration, Airflow can invoke Feast materialization using the Python SDK, as documented in docs/how-to-guides/running-feast-in-production.md64-101:
Real-time feature updates use the Push API to write features directly to online (and optionally offline) stores.
Push API workflow:
store.push() to write featuresThis is configured using PushSource data sources, which combine a batch source for historical data with streaming ingestion capability.
For large-scale deployments, custom materialization engines distribute the workload across compute clusters.
Available engines:
The choice of materialization engine significantly impacts performance at scale, as discussed in docs/how-to-guides/running-feast-in-production.md56-61
Sources: infra/feast-operator/api/v1alpha1/featurestore_types.go90-156 docs/how-to-guides/running-feast-in-production.md52-116
Production deployments should automate feature definition changes through CI/CD pipelines.
Diagram: CI/CD Pipeline for Feature Definitions
GitHub Actions example:
The registry tracks all changes with timestamps and versions, ensuring audit trails and rollback capability. Registry updates are atomic operations as implemented in the registry interface.
Maintain separate feature store configurations for different environments:
Directory structure:
feature_repo/
├── features/
│ ├── driver_features.py
│ └── customer_features.py
├── staging/
│ └── feature_store.yaml # Staging config
└── production/
└── feature_store.yaml # Production config
Each environment points to a separate registry and potentially different offline/online stores. The operator supports this through multiple FeatureStore CRs in different namespaces.
Sources: docs/how-to-guides/running-feast-in-production.md27-48 docs/how-to-guides/running-feast-in-production.md209-221
The Feast Operator deploys feature servers as Kubernetes Deployments, enabling horizontal pod autoscaling.
Scaling configuration:
However, the current operator implementation uses a single Deployment for all services as shown in infra/feast-operator/internal/controller/services/services.go330-342 For independent scaling, deploy separate FeatureStore CRs for each service type (online-only, offline-only, registry-only).
Worker configuration for online feature server:
The online feature server (using feast serve) supports Gunicorn worker configuration for concurrency tuning, as implemented in infra/feast-operator/internal/controller/services/services.go594-617:
These settings control the server command line arguments generated in infra/feast-operator/internal/controller/services/services.go566-632
SQL-backed registry:
Registry server pattern:
feast-[name]-registry.[namespace].svc.cluster.localThe registry server supports both gRPC and REST API endpoints, configurable in infra/feast-operator/api/v1alpha1/featurestore_types.go540-548 Separate services are created for gRPC and REST as shown in infra/feast-operator/internal/controller/services/services.go232-278
Online store scaling:
Offline store performance:
PVC storage classes:
When using file-based persistence with PVCs, select appropriate storage classes for performance:
PVC configuration is defined in infra/feast-operator/api/v1alpha1/featurestore_types.go454-483 with options for access modes, storage class, and resource requirements.
Sources: infra/feast-operator/internal/controller/services/services.go330-501 infra/feast-operator/api/v1alpha1/featurestore_types.go454-548 docs/how-to-guides/running-feast-in-production.md56-72
The operator supports multiple authorization patterns through the authz configuration in infra/feast-operator/api/v1alpha1/featurestore_types.go52-90
Diagram: Authentication and Authorization Patterns
Kubernetes RBAC:
The operator creates Kubernetes Roles and RoleBindings for fine-grained access control, as implemented in the authorization handler in infra/feast-operator/internal/controller/services/services.go135-147
OIDC Integration:
OIDC configuration properties are defined in infra/feast-operator/internal/controller/services/services_types.go88-94 and require all five properties.
The operator supports TLS for all service endpoints with automatic certificate mounting.
TLS configuration:
TLS configuration is defined in infra/feast-operator/api/v1alpha1/featurestore_types.go605-628 for standard secrets, with OpenShift-specific CA bundle injection support shown in infra/feast-operator/internal/controller/services/services.go61-72
When TLS is enabled:
/tls/ path per infra/feast-operator/internal/controller/services/services_types.go47--key and --cert flags as generated in infra/feast-operator/internal/controller/services/services.go619-624Sensitive configuration (database credentials, API keys) should be stored in Kubernetes Secrets, never in feature_store.yaml or version control.
Pattern for database credentials:
The operator reads the secret and injects it into the generated feature_store.yaml configuration as base64-encoded environment variables, processed in infra/feast-operator/internal/controller/services/repo_config.go30-48
Environment variable injection:
Additional secrets can be injected as environment variables using the env and envFrom fields in container configurations at infra/feast-operator/api/v1alpha1/featurestore_types.go549-577 supporting ConfigMaps and Secrets as sources.
Sources: infra/feast-operator/api/v1alpha1/featurestore_types.go44-90 infra/feast-operator/internal/controller/services/services.go61-72 infra/feast-operator/internal/controller/services/services.go619-624 docs/how-to-guides/running-feast-in-production.md209-221
The operator configures Kubernetes liveness, readiness, and startup probes for all service containers as shown in infra/feast-operator/internal/controller/services/services.go482-495:
These probes ensure that failed pods are automatically restarted and that traffic is only routed to healthy instances.
The online feature server supports Prometheus metrics when enabled:
The metrics port is exposed as a container port when enabled, as configured in infra/feast-operator/internal/controller/services/services.go473-479 Metrics include:
The operator maintains detailed status conditions for each service component, defined in infra/feast-operator/api/v1alpha1/featurestore_types.go32-66:
OfflineStore - Offline store availabilityOnlineStore - Online store availabilityRegistry - Registry availabilityUI - UI server availabilityClient - Client configuration readinessAuthorization - Authorization setup statusCronJob - CronJob statusFeatureStore - Overall readinessThese conditions provide a clear operational view of the feature store's health and can be monitored programmatically or via kubectl:
Sources: infra/feast-operator/internal/controller/services/services.go482-495 infra/feast-operator/api/v1alpha1/featurestore_types.go26-66
This example demonstrates a production configuration with SQL registry, external online store, and scheduled materialization:
| Component | Small (< 100 features) | Medium (< 1000 features) | Large (1000+ features) |
|---|---|---|---|
| Registry Server | 1 CPU, 1Gi RAM | 2 CPU, 2Gi RAM | 4 CPU, 4Gi RAM |
| Online Feature Server | 1 CPU, 1Gi RAM | 2-4 CPU, 2-4Gi RAM | 8+ CPU, 8+ Gi RAM |
| Offline Feature Server | 2 CPU, 2Gi RAM | 4 CPU, 4Gi RAM | 8+ CPU, 8+ Gi RAM |
| Registry PVC | 5Gi | 10Gi | 20Gi |
| Online Store PVC | 5Gi | 20Gi | 50Gi+ |
These are starting points; actual requirements depend on feature cardinality, request rate, and data volume.
Production deployments should use environment variables for dynamic configuration, supported via the feature_store.yaml syntax:
This pattern allows the same feature definitions to work across environments with different credentials, as documented in docs/how-to-guides/running-feast-in-production.md209-221
Sources: infra/feast-operator/api/v1alpha1/featurestore_types.go69-156 docs/how-to-guides/running-feast-in-production.md209-221
Phase 1: Registry Migration
Phase 2: Online Store Migration
Phase 3: Kubernetes Deployment
The operator's rolling update strategy ensures zero-downtime deployments:
Configure deployment strategy in infra/feast-operator/api/v1alpha1/featurestore_types.go288:
Sources: infra/feast-operator/api/v1alpha1/featurestore_types.go281-294 docs/how-to-guides/running-feast-in-production.md1-240
Symptom: Clients see stale feature definitions
Causes:
Solutions:
cache_ttl_seconds in registry configurationkubectl logs -l feast.dev/service-type=registrySymptom: CronJob fails or times out
Causes:
Solutions:
jobSpec.activeDeadlineSeconds for longer jobskubectl logs job/feast-[name]-[timestamp]Symptom: High p99 latency for feature serving
Causes:
Solutions:
registryTTLSeconds for better cache hit rateSymptom: FeatureStore CR rejected by API server
Causes:
Solutions:
kubectl logs -n feast-operator-system deployment/feast-operator-controller-managerSources: infra/feast-operator/api/v1alpha1/featurestore_types.go317-400 docs/how-to-guides/running-feast-in-production.md52-108
Refresh this wiki
This wiki was recently refreshed. Please wait 1 day to refresh again.