Feature Serving

Relevant source files

Purpose

This document provides an overview of Feast's feature serving architecture, which enables low-latency feature retrieval for online inference and batch scoring. Feature serving is the layer that reads features from online and offline stores and delivers them to ML models in production environments.

For detailed information about specific server implementations, see:

Python Feature Server: 4.1
Go Feature Server: 4.2
Java Feature Server: 4.3
Feature Transformations: 4.4
Web UI: 4.5

For information about the underlying stores that feature servers read from, see Online Stores and Offline Stores.

Overview

Feast's feature serving layer provides multiple mechanisms for retrieving features at inference time. The system supports three main patterns:

Online Feature Serving: Low-latency retrieval of pre-computed features from online stores for real-time inference (< 10ms)
Offline Feature Serving: Batch retrieval of historical features from offline stores for batch scoring
On-Demand Transformations: Request-time computation of derived features from stored and request-context features

The serving layer is implemented in three languages (Python, Go, Java), each optimized for different deployment scenarios and ecosystems.

Multi-Language Feature Server Architecture

Multi-Language Feature Server Architecture

Sources: docs/SUMMARY.md38 infra/charts/feast/README.md1-82 README.md225-232

Feature Server Implementations

Language-Specific Characteristics

Server	Language	Primary Use Case	Key Features
Python Feature Server	Python	Feature-rich deployments, experimentation	Native transformations, full SDK access, FastAPI
Go Feature Server	Go	High-performance production	Low latency, minimal memory, efficient concurrency
Java Feature Server	Java/Spring Boot	JVM ecosystems, enterprise	Spring Boot integration, gRPC/HTTP, Helm charts

Python Feature Server

The Python feature server is implemented using FastAPI and supports both HTTP REST and gRPC protocols. It provides the most complete feature set including native support for on-demand transformations.

Key Components:

feast serve CLI command to start the server
FeatureStore.get_online_features() for direct SDK access
Support for all transformation modes (Pandas, Python, Substrait)

Deployment:

Sources: docs/getting-started/quickstart.md136-164 docs/how-to-guides/running-feast-in-production.md185-204 README.md136-164

Go Feature Server

The Go feature server is optimized for high-throughput, low-latency scenarios. It delegates transformation logic to the Python transformation service while handling feature retrieval directly.

Architecture:

Go Feature Server Request Flow

Configuration: The Go server requires transformation_service_endpoint in feature_store.yaml when on-demand features are used.

Sources: infra/charts/feast/charts/feature-server/values.yaml13-16 docs/reference/registries/metadata.md28-29 CHANGELOG.md25

Java Feature Server

The Java feature server is built on Spring Boot and provides a production-ready gRPC and HTTP interface. It's designed for deployment in JVM-based enterprise environments.

Deployment Configuration:

Configuration	Description	Default
`application.yaml`	Default Spring Boot config	Included
`application-override.yaml`	Custom overrides (ConfigMap)	User-provided
`application-secret.yaml`	Secrets (Secret)	User-provided
`javaOpts`	JVM options	None

Helm Chart Structure:

Chart location: infra/charts/feast/
Feature Server sub-chart: charts/feature-server/
Transformation Service sub-chart: charts/transformation-service/
Dependencies: Redis (optional)

Sources: infra/charts/feast/README.md1-82 infra/charts/feast/charts/feature-server/README.md1-68 java/pom.xml30-35

Feature Serving Protocols

Protocol Support Matrix

Server	HTTP REST	gRPC	Protocol Buffers
Python	✓	✓	✓
Go	✓	✓	✓
Java	✓	✓	✓

Request/Response Flow

Online Feature Retrieval Sequence

Sources: docs/getting-started/quickstart.md136-164 docs/how-to-guides/running-feast-in-production.md181-204

Transformation Service

The Transformation Service is a separate Python gRPC server that handles on-demand feature transformations. This separation allows non-Python feature servers (Go, Java) to leverage Python-based transformation logic.

Architecture

Transformation Service Architecture

Deployment: The transformation service is deployed alongside feature servers using Helm:

Service Configuration:

Sources: infra/charts/feast/requirements.yaml7-11 infra/charts/feast/charts/transformation-service/values.yaml1-37 docs/reference/registries/metadata.md27-29

Online Feature Retrieval Patterns

Pattern 1: Direct SDK Access

Use Case: Python microservices, notebooks, local development

Advantages:

No additional infrastructure
Direct connection to online store
Full SDK capabilities

Limitations:

Python-only
SDK version must match feature store version

Sources: docs/getting-started/quickstart.md136-164 README.md136-164

Pattern 2: Feature Server HTTP/gRPC

Use Case: Polyglot services, centralized serving, load balancing

Advantages:

Language-agnostic
Centralized management
Scalable horizontally
Version isolation

Limitations:

Additional network hop
Requires server deployment

Sources: docs/how-to-guides/running-feast-in-production.md185-204

Pattern 3: Kubernetes Deployment

Use Case: Production deployments, high availability, auto-scaling

Deployment Options:

Feast Operator (Kubernetes CRD)
Helm Charts (Java/Python servers)
Custom Kubernetes manifests

Sources: infra/feast-operator/README.md1-166 docs/how-to-guides/feast-on-kubernetes.md1-72 infra/feast-operator/config/samples/kustomization.yaml1-7

Offline Feature Server

The Offline Feature Server provides Arrow Flight-based access to historical features from offline stores. It supports the same query patterns as the Python SDK but over a network protocol.

Offline Feature Server Architecture

Supported Operations:

get_historical_features() - Point-in-time correct training data
pull_all_from_table_or_query() - Full table retrieval
pull_latest_from_table_or_query() - Latest values retrieval
offline_write_batch() - Write features to offline store

CLI Usage:

Kubernetes Deployment:

Sources: docs/reference/feature-servers/offline-feature-server.md1-60 README.md231

Production Deployment Patterns

Single-Service Deployment

Single-Service Deployment

Configuration Example:

Sources: docs/how-to-guides/running-feast-in-production.md209-221

Multi-Service Deployment with Transformations

Multi-Service Deployment with Transformations

Helm Deployment:

Sources: infra/charts/feast/README.md13-58 infra/charts/feast/charts/feature-server/values.yaml1-141

Operator-Managed Deployment

The Feast Operator manages the full lifecycle of Feast deployments using Kubernetes Custom Resources.

FeatureStore CR Example:

Components Created:

Deployment (multi-container pod)
Services (ClusterIP/LoadBalancer)
ConfigMaps (feature_store.yaml)
CronJobs (materialization)
PersistentVolumeClaims (optional)

Sources: infra/feast-operator/README.md1-166 docs/how-to-guides/feast-on-kubernetes.md14-66

Registry Server

The Registry Server exposes the Feast registry as a gRPC/REST service, enabling remote access to feature metadata. This is useful when the registry is stored in a location not directly accessible to all services.

Registry Server Architecture

Configuration:

Sources: README.md232 docs/roadmap.md67

Feature Server Permissions

Feature servers enforce Role-Based Access Control (RBAC) when configured. Permissions are validated at request time.

Endpoint	Resource Type	Permission	Description
`get_online_features`	FeatureView	Read Online	Retrieve online features
`push`	FeatureView	Write Online	Push features to online store
`get_historical_features`	FeatureView	Read Offline	Retrieve historical features
`materialize`	FeatureView	Write Online	Materialize features

Configuration Example:

Sources: docs/reference/feature-servers/offline-feature-server.md44-60 CHANGELOG.md11-12

Scaling Considerations

Horizontal Scaling

Feature servers can be scaled horizontally by increasing replica count. Key considerations:

Stateless Design: All servers are stateless; state is in online stores and registry
Connection Pooling: Each replica maintains its own connection pool to online stores
Registry Caching: Registry is cached in-memory with configurable TTL (cache_ttl_seconds)
Load Balancing: Use Kubernetes Service or external load balancers

Scaling Commands:

Sources: infra/charts/feast/charts/feature-server/values.yaml1-2 docs/how-to-guides/running-feast-in-production.md1-240

Performance Optimization

Registry Caching:

Connection Pooling:

Redis: Managed by redis-py connection pool
DynamoDB: AWS SDK manages connections
Cassandra: Session manages connection pool

JVM Tuning (Java Server):

Sources: infra/charts/feast/charts/feature-server/values.yaml34-35 docs/how-to-guides/running-feast-in-production.md18

Monitoring and Observability

Metrics

Feature servers expose metrics for monitoring:

Request Metrics: Request count, latency (p50, p95, p99)
Store Metrics: Online store read/write latency, error rates
Transformation Metrics: Transformation execution time
Registry Metrics: Registry refresh count, cache hit rate

Go Server OTEL Integration: The Go feature server includes OpenTelemetry instrumentation for distributed tracing.

Sources: CHANGELOG.md103 docs/reference/registries/metadata.md13

Health Checks

Liveness Probe:

Readiness Probe:

Sources: infra/charts/feast/charts/feature-server/values.yaml43-69

Environment Configuration

Feature servers support environment variable interpolation in feature_store.yaml:

This enables the same configuration to work across multiple environments (dev, staging, production).

Sources: docs/how-to-guides/running-feast-in-production.md209-221

Feature Serving

Relevant source files

Purpose

For detailed information about specific server implementations, see:

Python Feature Server: 4.1
Go Feature Server: 4.2
Java Feature Server: 4.3
Feature Transformations: 4.4
Web UI: 4.5

For information about the underlying stores that feature servers read from, see Online Stores and Offline Stores.

Overview

Feast's feature serving layer provides multiple mechanisms for retrieving features at inference time. The system supports three main patterns:

Online Feature Serving: Low-latency retrieval of pre-computed features from online stores for real-time inference (< 10ms)
Offline Feature Serving: Batch retrieval of historical features from offline stores for batch scoring
On-Demand Transformations: Request-time computation of derived features from stored and request-context features

The serving layer is implemented in three languages (Python, Go, Java), each optimized for different deployment scenarios and ecosystems.

Multi-Language Feature Server Architecture

Multi-Language Feature Server Architecture

Sources: docs/SUMMARY.md38 infra/charts/feast/README.md1-82 README.md225-232

Feature Server Implementations

Language-Specific Characteristics

Server	Language	Primary Use Case	Key Features
Python Feature Server	Python	Feature-rich deployments, experimentation	Native transformations, full SDK access, FastAPI
Go Feature Server	Go	High-performance production	Low latency, minimal memory, efficient concurrency
Java Feature Server	Java/Spring Boot	JVM ecosystems, enterprise	Spring Boot integration, gRPC/HTTP, Helm charts

Python Feature Server

Key Components:

feast serve CLI command to start the server
FeatureStore.get_online_features() for direct SDK access
Support for all transformation modes (Pandas, Python, Substrait)

Deployment:

Sources: docs/getting-started/quickstart.md136-164 docs/how-to-guides/running-feast-in-production.md185-204 README.md136-164

Go Feature Server

The Go feature server is optimized for high-throughput, low-latency scenarios. It delegates transformation logic to the Python transformation service while handling feature retrieval directly.

Architecture:

Go Feature Server Request Flow

Configuration: The Go server requires transformation_service_endpoint in feature_store.yaml when on-demand features are used.

Sources: infra/charts/feast/charts/feature-server/values.yaml13-16 docs/reference/registries/metadata.md28-29 CHANGELOG.md25

Java Feature Server

The Java feature server is built on Spring Boot and provides a production-ready gRPC and HTTP interface. It's designed for deployment in JVM-based enterprise environments.

Deployment Configuration:

Configuration	Description	Default
`application.yaml`	Default Spring Boot config	Included
`application-override.yaml`	Custom overrides (ConfigMap)	User-provided
`application-secret.yaml`	Secrets (Secret)	User-provided
`javaOpts`	JVM options	None

Helm Chart Structure:

Chart location: infra/charts/feast/
Feature Server sub-chart: charts/feature-server/
Transformation Service sub-chart: charts/transformation-service/
Dependencies: Redis (optional)

Sources: infra/charts/feast/README.md1-82 infra/charts/feast/charts/feature-server/README.md1-68 java/pom.xml30-35

Feature Serving Protocols

Protocol Support Matrix

Server	HTTP REST	gRPC	Protocol Buffers
Python	✓	✓	✓
Go	✓	✓	✓
Java	✓	✓	✓

Request/Response Flow

Online Feature Retrieval Sequence

Sources: docs/getting-started/quickstart.md136-164 docs/how-to-guides/running-feast-in-production.md181-204

Transformation Service

Architecture

Transformation Service Architecture

Deployment: The transformation service is deployed alongside feature servers using Helm:

Service Configuration:

Sources: infra/charts/feast/requirements.yaml7-11 infra/charts/feast/charts/transformation-service/values.yaml1-37 docs/reference/registries/metadata.md27-29

Online Feature Retrieval Patterns

Pattern 1: Direct SDK Access

Use Case: Python microservices, notebooks, local development

Advantages:

No additional infrastructure
Direct connection to online store
Full SDK capabilities

Limitations:

Python-only
SDK version must match feature store version

Sources: docs/getting-started/quickstart.md136-164 README.md136-164

Pattern 2: Feature Server HTTP/gRPC

Use Case: Polyglot services, centralized serving, load balancing

Advantages:

Language-agnostic
Centralized management
Scalable horizontally
Version isolation

Limitations:

Additional network hop
Requires server deployment

Sources: docs/how-to-guides/running-feast-in-production.md185-204

Pattern 3: Kubernetes Deployment

Use Case: Production deployments, high availability, auto-scaling

Deployment Options:

Feast Operator (Kubernetes CRD)
Helm Charts (Java/Python servers)
Custom Kubernetes manifests

Sources: infra/feast-operator/README.md1-166 docs/how-to-guides/feast-on-kubernetes.md1-72 infra/feast-operator/config/samples/kustomization.yaml1-7

Offline Feature Server

The Offline Feature Server provides Arrow Flight-based access to historical features from offline stores. It supports the same query patterns as the Python SDK but over a network protocol.

Offline Feature Server Architecture

Supported Operations:

get_historical_features() - Point-in-time correct training data
pull_all_from_table_or_query() - Full table retrieval
pull_latest_from_table_or_query() - Latest values retrieval
offline_write_batch() - Write features to offline store

CLI Usage:

Kubernetes Deployment:

Sources: docs/reference/feature-servers/offline-feature-server.md1-60 README.md231

Production Deployment Patterns

Single-Service Deployment

Single-Service Deployment

Configuration Example:

Sources: docs/how-to-guides/running-feast-in-production.md209-221

Multi-Service Deployment with Transformations

Multi-Service Deployment with Transformations

Helm Deployment:

Sources: infra/charts/feast/README.md13-58 infra/charts/feast/charts/feature-server/values.yaml1-141

Operator-Managed Deployment

The Feast Operator manages the full lifecycle of Feast deployments using Kubernetes Custom Resources.

FeatureStore CR Example:

Components Created:

Deployment (multi-container pod)
Services (ClusterIP/LoadBalancer)
ConfigMaps (feature_store.yaml)
CronJobs (materialization)
PersistentVolumeClaims (optional)

Sources: infra/feast-operator/README.md1-166 docs/how-to-guides/feast-on-kubernetes.md14-66

Registry Server

Registry Server Architecture

Configuration:

Sources: README.md232 docs/roadmap.md67

Feature Server Permissions

Feature servers enforce Role-Based Access Control (RBAC) when configured. Permissions are validated at request time.

Endpoint	Resource Type	Permission	Description
`get_online_features`	FeatureView	Read Online	Retrieve online features
`push`	FeatureView	Write Online	Push features to online store
`get_historical_features`	FeatureView	Read Offline	Retrieve historical features
`materialize`	FeatureView	Write Online	Materialize features

Configuration Example:

Sources: docs/reference/feature-servers/offline-feature-server.md44-60 CHANGELOG.md11-12

Scaling Considerations

Horizontal Scaling

Feature servers can be scaled horizontally by increasing replica count. Key considerations:

Stateless Design: All servers are stateless; state is in online stores and registry
Connection Pooling: Each replica maintains its own connection pool to online stores
Registry Caching: Registry is cached in-memory with configurable TTL (cache_ttl_seconds)
Load Balancing: Use Kubernetes Service or external load balancers

Scaling Commands:

Sources: infra/charts/feast/charts/feature-server/values.yaml1-2 docs/how-to-guides/running-feast-in-production.md1-240

Performance Optimization

Registry Caching:

Connection Pooling:

Redis: Managed by redis-py connection pool
DynamoDB: AWS SDK manages connections
Cassandra: Session manages connection pool

JVM Tuning (Java Server):

Sources: infra/charts/feast/charts/feature-server/values.yaml34-35 docs/how-to-guides/running-feast-in-production.md18

Monitoring and Observability

Metrics

Feature servers expose metrics for monitoring:

Request Metrics: Request count, latency (p50, p95, p99)
Store Metrics: Online store read/write latency, error rates
Transformation Metrics: Transformation execution time
Registry Metrics: Registry refresh count, cache hit rate

Go Server OTEL Integration: The Go feature server includes OpenTelemetry instrumentation for distributed tracing.

Sources: CHANGELOG.md103 docs/reference/registries/metadata.md13

Health Checks

Liveness Probe:

Readiness Probe:

Sources: infra/charts/feast/charts/feature-server/values.yaml43-69

Environment Configuration

Feature servers support environment variable interpolation in feature_store.yaml:

This enables the same configuration to work across multiple environments (dev, staging, production).

Sources: docs/how-to-guides/running-feast-in-production.md209-221

Feature Serving

Purpose

Overview

Multi-Language Feature Server Architecture

Feature Server Implementations

Language-Specific Characteristics

Python Feature Server

Go Feature Server

Java Feature Server

Feature Serving Protocols

Protocol Support Matrix

Request/Response Flow

Transformation Service

Architecture

Online Feature Retrieval Patterns

Pattern 1: Direct SDK Access

Pattern 2: Feature Server HTTP/gRPC

Pattern 3: Kubernetes Deployment

Offline Feature Server

Production Deployment Patterns

Single-Service Deployment

Multi-Service Deployment with Transformations

Operator-Managed Deployment

Registry Server

Feature Server Permissions

Scaling Considerations

Horizontal Scaling

Performance Optimization

Monitoring and Observability

Metrics

Health Checks

Environment Configuration

On this page

Feature Serving

Purpose

Overview

Multi-Language Feature Server Architecture

Feature Server Implementations

Language-Specific Characteristics

Python Feature Server

Go Feature Server

Java Feature Server

Feature Serving Protocols

Protocol Support Matrix

Request/Response Flow

Transformation Service

Architecture

Online Feature Retrieval Patterns

Pattern 1: Direct SDK Access

Pattern 2: Feature Server HTTP/gRPC

Pattern 3: Kubernetes Deployment

Offline Feature Server

Production Deployment Patterns

Single-Service Deployment

Multi-Service Deployment with Transformations

Operator-Managed Deployment

Registry Server

Feature Server Permissions

Scaling Considerations

Horizontal Scaling

Performance Optimization

Monitoring and Observability

Metrics

Health Checks

Environment Configuration

On this page