feast-dev · franciscojavierarceo · Aug 21, 2024 · Jun 21, 2024 · Jun 21, 2024 · Jun 21, 2024
diff --git a/Makefile b/Makefile
@@ -86,14 +86,14 @@ test-python-unit:
 	python -m pytest -n 8 --color=yes sdk/python/tests
 
 test-python-integration:
-	python -m pytest -n 8 --integration --color=yes --durations=10 --timeout=1200 --timeout_method=thread \
+	python -m pytest -n 4 --integration --color=yes --durations=10 --timeout=1200 --timeout_method=thread \
 		-k "(not snowflake or not test_historical_features_main)" \
 		sdk/python/tests
 
 test-python-integration-local:
 	FEAST_IS_LOCAL_TEST=True \
 	FEAST_LOCAL_ONLINE_CONTAINER=True \
-	python -m pytest -n 8 --color=yes --integration --durations=5 --dist loadgroup \
+	python -m pytest -n 4 --color=yes --integration --durations=10 --timeout=1200 --timeout_method=thread --dist loadgroup \
 		-k "not test_lambda_materialization and not test_snowflake_materialization" \
 		sdk/python/tests
 

@@ -16,6 +16,7 @@
   * [Feature retrieval](getting-started/concepts/feature-retrieval.md)
   * [Point-in-time joins](getting-started/concepts/point-in-time-joins.md)
   * [Registry](getting-started/concepts/registry.md)
+  * [Role-Based Access Control (RBAC)](getting-started/architecture/rbac.md)
   * [\[Alpha\] Saved dataset](getting-started/concepts/dataset.md)
 * [Architecture](getting-started/architecture/README.md)
   * [Overview](getting-started/architecture/overview.md)

@@ -23,3 +23,7 @@
 {% content-ref url="model-inference.md" %}
 [model-inference.md](model-inference.md)
 {% endcontent-ref %}
+
+{% content-ref url="rbac.md" %}
+[rbac.md](rbac.md)
+{% endcontent-ref %}
@@ -17,3 +17,7 @@ typically your Offline Store). We are exploring adding a default streaming engin
   write patterns](write-patterns.md) to your application
 
 * We recommend [using Python](language.md) for your Feature Store microservice. As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. Java and Go Clients are also available for online feature retrieval.
+
+* [Role-Based Access Control (RBAC)](rbac.md) is a security mechanism that restricts access to resources based on the roles of individual users within an organization. In the context of the Feast, RBAC ensures that only authorized users or groups can access or modify specific resources, thereby maintaining data security and operational integrity.
+
+
@@ -0,0 +1,56 @@
+# Role-Based Access Control (RBAC) in Feast
+
+## Introduction
+
+Role-Based Access Control (RBAC) is a security mechanism that restricts access to resources based on the roles of individual users within an organization. In the context of the Feast, RBAC ensures that only authorized users or groups can access or modify specific resources, thereby maintaining data security and operational integrity.
+
+## Functional Requirements
+
+The RBAC implementation in Feast is designed to:
+
+- **Assign Permissions**: Allow administrators to assign permissions for various operations and resources to users or groups based on their roles.
+- **Seamless Integration**: Integrate smoothly with existing business code without requiring significant modifications.
+- **Backward Compatibility**: Maintain support for non-authorized models as the default to ensure backward compatibility.
+
+## Business Goals
+
+The primary business goals of implementing RBAC in the Feast are:
+
+1. **Feature Sharing**: Enable multiple teams to share the feature store while ensuring controlled access. This allows for collaborative work without compromising data security.
+2. **Access Control Management**: Prevent unauthorized access to team-specific resources and spaces, governing the operations that each user or group can perform.
+
+## Reference Architecture
+
+Feast operates as a collection of connected services, each enforcing authorization permissions. The architecture is designed as a distributed microservices system with the following key components:
+
+- **Service Endpoints**: These enforce authorization permissions, ensuring that only authorized requests are processed.
+- **Client Integration**: Clients authenticate with feature servers by attaching authorization token to each request.
+- **Service-to-Service Communication**: This is always granted.
+
+![rbac.jpg](rbac.jpg)
+
+## Permission Model
+
+The RBAC system in Feast uses a permission model that defines the following concepts:
+
+- **Resource**: An object within Feast that needs to be secured against unauthorized access.
+- **Action**: A logical operation performed on a resource, such as Create, Describe, Update, Delete, Read, or write operations.
+- **Policy**: A set of rules that enforce authorization decisions on resources. The default implementation uses role-based policies.
+
+
+
+## Authorization Architecture
+
+The authorization architecture in Feast is built with the following components:
+
+- **Token Extractor**: Extracts the authorization token from the request header.
+- **Token Parser**: Parses the token to retrieve user details.
+- **Policy Enforcer**: Validates the secured endpoint against the retrieved user details.
+- **Token Injector**: Adds the authorization token to each secured request header.
+
+
+
+
+
+
+
@@ -19,3 +19,7 @@
 {% content-ref url="provider.md" %}
 [provider.md](provider.md)
 {% endcontent-ref %}
+
+{% content-ref url="authz_manager.md" %}
+[authz_manager.md](authz_manager.md)
+{% endcontent-ref %}
@@ -0,0 +1,102 @@
+# Authorization Manager
+An Authorization Manager is an instance of the `AuthManager` class that is plugged into one of the Feast servers to extract user details from the current request and inject them into the [permissions](../../getting-started/concepts/permissions.md) framework.
+
+{% hint style="info" %}
+**Note**: Feast does not provide authentication capabilities; it is the client's responsibility to manage the authentication token and pass it to
+the Feast server, which then validates the token and extracts user details from the configured authentication server.
+{% endhint %}
+
+Two authorization managers are supported out-of-the-box:
+* One using a configurable OIDC server to extract the user details.
+* One using the Kubernetes RBAC resources to extract the user details.
+
+These instances are created when the Feast servers are initialized, according to the authorization configuration defined in
+their own `feature_store.yaml`.
+
+Feast servers and clients must have consistent authorization configuration, so that the client proxies can automatically inject
+the authorization tokens that the server can properly identify and use to enforce permission validations.
+
+
+## Design notes
+The server-side implementation of the authorization functionality is defined [here](./../../../sdk/python/feast/permissions/server).
+Few of the key models, classes to understand the authorization implementation on the client side can be found [here](./../../../sdk/python/feast/permissions/client).
+
+## Configuring Authorization
+The authorization is configured using a dedicated `auth` section in the `feature_store.yaml` configuration.
+
+**Note**: As a consequence, when deploying the Feast servers with the Helm [charts](../../../infra/charts/feast-feature-server/README.md),
+the `feature_store_yaml_base64` value must include the `auth` section to specify the authorization configuration.
+
+### No Authorization
+This configuration applies the default `no_auth` authorization:
+```yaml
+project: my-project
+auth:
+  type: no_auth
+...
+```
+
+### OIDC Authorization
+With OIDC authorization, the Feast client proxies retrieve the JWT token from an OIDC server (or [Identity Provider](https://openid.net/developers/how-connect-works/))
+and append it in every request to a Feast server, using an [Authorization Bearer Token](https://developer.mozilla.org/en-US/docs/Web/HTTP/Authentication#bearer).
+
+The server, in turn, uses the same OIDC server to validate the token and extract the user roles from the token itself.
+
+Some assumptions are made in the OIDC server configuration:
+* The OIDC token refers to a client with roles matching the RBAC roles of the configured `Permission`s (*)
+* The roles are exposed in the access token passed to the server
+
+(*) Please note that **the role match is case-sensitive**, e.g. the name of the role in the OIDC server and in the `Permission` configuration
+must be exactly the same.
+
+For example, the access token for a client `app` of a user with `reader` role should have the following `resource_access` section:
+```json
+{
+  "resource_access": {
+    "app": {
+      "roles": [
+        "reader"
+      ]
+    },
+}
+```
+
+An example of OIDC authorization configuration is the following: 
+```yaml
+project: my-project
+auth:
+  type: oidc
+  client_id: _CLIENT_ID__
+  client_secret: _CLIENT_SECRET__
+  realm: _REALM__
+  auth_server_url: _OIDC_SERVER_URL_
+  auth_discovery_url: _OIDC_SERVER_URL_/realms/master/.well-known/openid-configuration
+...
+```
+
+In case of client configuration, the following settings must be added to specify the current user:
+```yaml
+auth:
+  ...
+  username: _USERNAME_
+  password: _PASSWORD_
+```
+
+### Kubernetes RBAC Authorization
+With Kubernetes RBAC Authorization, the client uses the service account token as the authorizarion bearer token, and the
+server fetches the associated roles from the Kubernetes RBAC resources.
+
+An example of Kubernetes RBAC authorization configuration is the following: 
+{% hint style="info" %}
+**NOTE**: This configuration will only work if you deploy feast on Openshift or a Kubernetes platform.
+{% endhint %}
+```yaml
+project: my-project
+auth:
+  type: kubernetes
+...
+```
+
+In case the client cannot run on the same cluster as the servers, the client token can be injected using the `LOCAL_K8S_TOKEN` 
+environment variable on the client side. The value must refer to the token of a service account created on the servers cluster
+and linked to the desired RBAC roles.
@@ -28,3 +28,4 @@ A complete Feast deployment contains the following components:
 * **Batch Materialization Engine:** The [Batch Materialization Engine](batch-materialization-engine.md) component launches a process which loads data into the online store from the offline store. By default, Feast uses a local in-process engine implementation to materialize data. However, additional infrastructure can be used for a more scalable materialization process.
 * **Online Store:** The online store is a database that stores only the latest feature values for each entity. The online store is either populated through materialization jobs or through [stream ingestion](../../reference/data-sources/push.md).
 * **Offline Store:** The offline store persists batch data that has been ingested into Feast. This data is used for producing training datasets. For feature retrieval and materialization, Feast does not manage the offline store directly, but runs queries against it. However, offline stores can be configured to support writes if Feast configures logging functionality of served features.
+* **Authorization manager**: The authorization manager detects authentication tokens from client requests to Feast servers and uses this information to enforce permission policies on the requested services.
@@ -31,3 +31,7 @@
 {% content-ref url="dataset.md" %}
 [dataset.md](dataset.md)
 {% endcontent-ref %}
+
+{% content-ref url="permission.md" %}
+[permission.md](permission.md)
+{% endcontent-ref %}
@@ -0,0 +1,112 @@
+# Permission
+
+## Overview
+
+The Feast permissions model allows to configure granular permission policies to all the resources defined in a feature store.
+
+The configured permissions are stored in the Feast registry and accessible through the CLI and the registry APIs.
+
+The permission authorization enforcement is performed when requests are executed through one of the Feast (Python) servers
+- The online feature server (REST)
+- The offline feature server (Arrow Flight)
+- The registry server (gRPC)
+
+Note that there is no permission enforcement when accessing the Feast API with a local provider.
+
+## Concepts
+
+The permission model is based on the following components:
+- A `resource` is a Feast object that we want to secure against unauthorized access.
+  - We assume that the resource has a `name` attribute and optional dictionary of associated key-value `tags`.
+- An `action` is a logical operation executed on the secured resource, like:
+  - `create`: Create an instance.
+  - `describe`: Access the instance state.
+  - `update`: Update the instance state.
+  - `delete`: Delete an instance.
+  - `read`:  Read both online and offline stores.
+  - `read_online`:  Read the online store.
+  - `read_offline`:  Read the offline store.
+  - `write`:  Write on any store.
+  - `write_online`:  Write to the online store.
+  - `write_offline`:  Write to the offline store.
+- A `policy` identifies the rule for enforcing authorization decisions on secured resources, based on the current user.
+  - A default implementation is provided for role-based policies, using the user roles to grant or deny access to the requested actions
+  on the secured resources.
+
+The `Permission` class identifies a single permission configured on the feature store and is identified by these attributes:
+- `name`: The permission name.
+- `types`: The list of protected resource  types. Defaults to all managed types, e.g. the `ALL_RESOURCE_TYPES` alias. All sub-classes are included in the resource match.
+- `name_pattern`: A regex to match the resource name. Defaults to `None`, meaning that no name filtering is applied
+- `required_tags`: Dictionary of key-value pairs that must match the resource tags. Defaults to `None`, meaning that no tags filtering is applied.
+- `actions`: The actions authorized by this permission. Defaults to `ALL_VALUES`, an alias defined in the `action` module.
+- `policy`: The policy to be applied to validate a client request.
+
+To simplify configuration, several constants are defined to streamline the permissions setup:
+- In module `feast.feast_object`:
+  - `ALL_RESOURCE_TYPES` is the list of all the `FeastObject` types.
+  - `ALL_FEATURE_VIEW_TYPES` is the list of all the feature view types, including those not inheriting from `FeatureView` type like 
+  `OnDemandFeatureView`.
+- In module `feast.permissions.action`:
+  - `ALL_ACTIONS` is the list of all managed actions.
+  - `READ` includes all the read actions for online and offline store.
+  - `WRITE` includes all the write actions for online and offline store.
+  - `CRUD` includes all the state management actions to create, describe, update or delete a Feast resource.
+
+Given the above definitions, the feature store can be configured with granular control over each resource, enabling partitioned access by 
+teams to meet organizational requirements for service and data sharing, and protection of sensitive information.
+
+The `feast` CLI includes a new `permissions` command to list the registered permissions, with options to identify the matching resources for each configured permission and the existing resources that are not covered by any permission.
+
+{% hint style="info" %}
+**Note**: Feast resources that do not match any of the configured permissions are not secured by any authorization policy, meaning any user can execute any action on such resources.
+{% endhint %}
+
+## Definition examples
+This permission definition grants access to the resource state and the ability to read all of the stores for any feature view or
+feature service to all users with the role `super-reader`:
+```py
+Permission(
+    name="feature-reader",
+    types=[FeatureView, FeatureService],
+    policy=RoleBasedPolicy(roles=["super-reader"]),
+    actions=[AuthzedAction.DESCRIBE, READ],
+)
+```
+
+This example grants permission to write on all the data sources with `risk_level` tag set to `high` only to users with role `admin` or `data_team`:
+```py
+Permission(
+    name="ds-writer",
+    types=[DataSource],
+    required_tags={"risk_level": "high"},
+    policy=RoleBasedPolicy(roles=["admin", "data_team"]),
+    actions=[AuthzedAction.WRITE],
+)
+```
+
+{% hint style="info" %}
+**Note**: When using multiple roles in a role-based policy, the user must be granted at least one of the specified roles.
+{% endhint %}
+
+
+The following permission grants authorization to read the offline store of all the feature views including `risky` in the name, to users with role `trusted`:
+
+```py
+Permission(
+    name="reader",
+    types=[FeatureView],
+    name_pattern=".*risky.*",
+    policy=RoleBasedPolicy(roles=["trusted"]),
+    actions=[AuthzedAction.READ_OFFLINE],
+)
+```
+
+## Authorization configuration
+In order to leverage the permission functionality, the `auth` section is needed in the `feature_store.yaml` configuration.
+Currently, Feast supports OIDC and Kubernetes RBAC authorization protocols.
+
+The default configuration, if you don't specify the `auth` configuration section, is `no_auth`, indicating that no permission
+enforcement is applied.
+
+The `auth` section includes a `type` field specifying the actual authorization protocol, and protocol-specific fields that
+are specified in [Authorization Manager](../components/authz_manager.md).
Original file line number	Diff line number	Diff line change
Expand Up		@@ -17,3 +17,7 @@ typically your Offline Store). We are exploring adding a default streaming engin
		write patterns](write-patterns.md) to your application

		* We recommend [using Python](language.md) for your Feature Store microservice. As mentioned in the document, precomputing features is the recommended optimal path to ensure low latency performance. Reducing feature serving to a lightweight database lookup is the ideal pattern, which means the marginal overhead of Python should be tolerable. Because of this we believe the pros of Python outweigh the costs, as reimplementing feature logic is undesirable. Java and Go Clients are also available for online feature retrieval.

		* [Role-Based Access Control (RBAC)](rbac.md) is a security mechanism that restricts access to resources based on the roles of individual users within an organization. In the context of the Feast, RBAC ensures that only authorized users or groups can access or modify specific resources, thereby maintaining data security and operational integrity.