Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions docs/reference/feature-servers/python-feature-server.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,120 @@ requests.post(
data=json.dumps(materialize_data))
```

## Prometheus Metrics

The Python feature server can expose Prometheus-compatible metrics on a dedicated
HTTP endpoint (default port `8000`). Metrics are **opt-in** and carry zero overhead
when disabled.

### Enabling metrics

**Option 1 — CLI flag** (useful for one-off runs):

```bash
feast serve --metrics
```

**Option 2 — `feature_store.yaml`** (recommended for production):

```yaml
feature_server:
type: local
metrics:
enabled: true
```

Either option is sufficient. When both are set, metrics are enabled.

### Per-category control

By default, enabling metrics turns on **all** categories. You can selectively
disable individual categories within the same `metrics` block:

```yaml
feature_server:
type: local
metrics:
enabled: true
resource: true # CPU / memory gauges
request: false # disable endpoint latency & request counters
online_features: true # online feature retrieval counters
push: true # push request counters
materialization: true # materialization counters & duration
freshness: true # feature freshness gauges
```

Any category set to `false` will emit no metrics and start no background
threads (e.g., setting `freshness: false` prevents the registry polling
thread from starting). All categories default to `true`.

### Available metrics

| Metric | Type | Labels | Description |
|--------|------|--------|-------------|
| `feast_feature_server_cpu_usage` | Gauge | — | Process CPU usage % |
| `feast_feature_server_memory_usage` | Gauge | — | Process memory usage % |
| `feast_feature_server_request_total` | Counter | `endpoint`, `status` | Total requests per endpoint |
| `feast_feature_server_request_latency_seconds` | Histogram | `endpoint`, `feature_count`, `feature_view_count` | Request latency with p50/p95/p99 support |
| `feast_online_features_request_total` | Counter | — | Total online feature retrieval requests |
| `feast_online_features_entity_count` | Histogram | — | Entity rows per online feature request |
| `feast_push_request_total` | Counter | `push_source`, `mode` | Push requests by source and mode |
| `feast_materialization_total` | Counter | `feature_view`, `status` | Materialization runs (success/failure) |
| `feast_materialization_duration_seconds` | Histogram | `feature_view` | Materialization duration per feature view |
| `feast_feature_freshness_seconds` | Gauge | `feature_view`, `project` | Seconds since last materialization |

### Scraping with Prometheus

```yaml
scrape_configs:
- job_name: feast
static_configs:
- targets: ["localhost:8000"]
```

### Kubernetes / Feast Operator

Set `metrics: true` in your FeatureStore CR:

```yaml
spec:
services:
onlineStore:
server:
metrics: true
```

The operator automatically exposes port 8000 and creates the corresponding
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the operator also create the PodMonitor / ServiceMonitor resource owned by the FeatureStore resource for Prometheus operator to discover the metrics endpoint and configure the scraping?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a good enhancement, we can have CRD detection guard to avoid crashing on vanilla Kubernetes clusters without the Prometheus Operator. Will raise an issue for this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Service port so Prometheus can discover it.

### Multi-worker and multi-replica (HPA) support

Feast uses Prometheus **multiprocess mode** so that metrics are correct
regardless of the number of Gunicorn workers or Kubernetes replicas.

**How it works:**

* Each Gunicorn worker writes metric values to shared files in a
temporary directory (`PROMETHEUS_MULTIPROCESS_DIR`). Feast creates
this directory automatically; you can override it by setting the
environment variable yourself.
* The metrics HTTP server on port 8000 aggregates all workers'
metric files using `MultiProcessCollector`, so a single scrape
returns accurate totals.
* Gunicorn hooks clean up dead-worker files automatically
(`child_exit` → `mark_process_dead`).
* CPU and memory gauges use `multiprocess_mode=liveall` — Prometheus
shows per-worker values distinguished by a `pid` label.
* Feature freshness gauges use `multiprocess_mode=max` — Prometheus
shows the worst-case staleness (all workers compute the same value).
* Counters and histograms (request counts, latency, materialization)
are automatically summed across workers.

**Multiple replicas (HPA):** Each pod runs its own metrics endpoint.
Prometheus adds an `instance` label per pod, so there is no
duplication. Use `sum(rate(...))` or `histogram_quantile(...)` across
instances as usual.

## Starting the feature server in TLS(SSL) mode

Enabling TLS mode ensures that data between the Feast client and server is transmitted securely. For an ideal production environment, it is recommended to start the feature server in TLS mode.
Expand Down
8 changes: 8 additions & 0 deletions docs/reference/feature-store-yaml.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,14 @@ An example configuration:
```yaml
feature_server:
type: local
metrics: # Prometheus metrics configuration. Also achievable via `feast serve --metrics`.
enabled: true # Enable Prometheus metrics server on port 8000
resource: true # CPU / memory gauges
request: true # endpoint latency histograms & request counters
online_features: true # online feature retrieval counters
push: true # push request counters
materialization: true # materialization counters & duration histograms
freshness: true # per-feature-view freshness gauges
offline_push_batching_enabled: true # Enables batching of offline writes processed by /push. Online writes are unaffected.
offline_push_batching_batch_size: 100 # Maximum number of buffered rows before writing to the offline store.
offline_push_batching_batch_interval_seconds: 5 # Maximum time rows may remain buffered before a forced flush.
Expand Down
Loading
Loading