This installation guide will demonstrate three ways of installing Feast:
- ****Docker Compose (Quickstart): Fastest way to get Feast up and running. Provides a pre-installed Jupyter Notebook with the Feast Python SDK and sample code.
- Minikube: This installation has no external dependencies, but does not have a historical feature store installed. It allows users to quickly get a feel for Feast.
- Google Kubernetes Engine: This guide installs a single cluster Feast installation on Google's GKE. It has Google Cloud specific dependencies like BigQuery, Dataflow, and Google Cloud Storage.
A docker compose file is provided to quickly test Feast with the official docker images. There is no hard dependency on GCP, unless batch serving is required. Once you have set up Feast using Docker Compose, you will be able to:
- Create, register, and manage feature sets
- Ingest feature data into Feast
- Retrieve features for online serving
{% hint style="info" %} The docker compose setup uses Direct Runner for the Apache Beam jobs. Running Beam with the Direct Runner means it does not need a dedicated runner like Flink or Dataflow, but this comes at the cost of performance. We recommend the use of a full runner when running Feast with very large workloads. {% endhint %}
- Docker compose should be installed.
- TCP ports 6565, 6566, 8888, and 9094 should not be in use. Otherwise, modify the port mappings in
infra/docker-compose/docker-compose.ymlto use unoccupied ports. - (for batch serving only) For batch serving you will also need a GCP service account key that has access to GCS and BigQuery. Port 6567 will be used for the batch serving endpoint.
- (for batch serving only) Google Cloud SDK installed, authenticated, and configured to the project you will use.
Clone the Feast repository and navigate to the docker-compose sub-directory:
git clone https://github.com/gojek/feast.git && \
cd feast && export FEAST_HOME_DIR=$(pwd) && \
cd infra/docker-composeMake a copy of the .env.sample file:
cp .env.sample .envStart Feast:
docker-compose up -dA Jupyter notebook is now available to use Feast:
http://localhost:8888/notebooks/feast-notebooks/feast-quickstart.ipynb
Clone the Feast repository and navigate to the docker-compose sub-directory:
git clone https://github.com/gojek/feast.git && \
cd feast && export FEAST_HOME_DIR=$(pwd) && \
cd infra/docker-composeCreate a service account from the GCP console and copy it to the gcp-service-accounts folder:
cp my-service-account.json ${FEAST_HOME_DIR}/infra/docker-compose/gcp-service-accountsCreate a Google Cloud Storage bucket. Make sure that your service account above has read/write permissions to this bucket:
gsutil mb gs://my-feast-staging-bucketMake a copy of the .env.sample file:
cp .env.sample .envCustomize the .env file based on your environment. At the very least you have to modify:
- FEAST_CORE_GCP_SERVICE_ACCOUNT_KEY: This should be your service account file name without the .json extension.
- FEAST_BATCH_SERVING_GCP_SERVICE_ACCOUNT_KEY: This should be your service account file name without the .json extension.
- FEAST_JUPYTER_GCP_SERVICE_ACCOUNT_KEY: This should be your service account file name without the .json extension.
- FEAST_JOB_STAGING_LOCATION: Google Cloud Storage bucket that Feast will use to stage data exports and batch retrieval requests.
We will also need to customize the bq-store.yml file inside infra/docker-compose/serving/ to configure the BigQuery storage configuration as well as the feature sets that the store subscribes to. At a minimum you will need to set:
- project_id: This is you GCP project id.
- dataset_id: This is the name of the BigQuery dataset that tables will be created in. Each feature set will have one table in BigQuery.
Start Feast:
docker-compose -f docker-compose.yml -f docker-compose.batch.yml up -dA Jupyter notebook is now available to use Feast:
http://localhost:8888/notebooks/feast-notebooks
This guide will install Feast into Minikube. Once Feast is installed you will be able to:
- Define and register features.
- Load feature data from both batch and streaming sources.
- Retrieve features for online serving.
{% hint style="warning" %} This Minikube installation guide is for demonstration purposes only. It is not meant for production use, and does not install a historical feature store. {% endhint %}
The following software should be installed prior to starting:
- Minikube should be installed.
- Kubectl installed and configured to work with Minikube.
- Helm (2.16.0 or greater). Helm 3 has not been tested yet.
Start Minikube. Note the minimum cpu and memory below:
minikube start --cpus=3 --memory=4096 --kubernetes-version='v1.15.5'Set up your Feast environmental variables
export FEAST_IP=$(minikube ip)
export FEAST_CORE_URL=${FEAST_IP}:32090
export FEAST_SERVING_URL=${FEAST_IP}:32091Clone the Feast repository and navigate to the charts sub-directory:
git clone https://github.com/gojek/feast.git && \
cd feast && export FEAST_HOME_DIR=$(pwd) && \
cd infra/charts/feastCopy the values-demo.yaml file for your installation:
cp values-demo.yaml my-feast-values.yamlUpdate all occurrences of the domain feast.example.com inside of my-feast-values.yaml with your Minikube IP. This is to allow external access to the services in the cluster. You can find your Minikube IP by running the following command minikube ip, or simply replace the text from the command line:
sed -i "s/feast.example.com/${FEAST_IP}/g" my-feast-values.yamlInstall Tiller:
helm initInstall the Feast Helm chart:
helm install --name feast -f my-feast-values.yaml .Ensure that the system comes online. This will take a few minutes
watch kubectl get podsNAME READY STATUS RESTARTS AGE
pod/feast-feast-core-666fd46db4-l58l6 1/1 Running 0 5m
pod/feast-feast-serving-online-84d99ddcbd 1/1 Running 0 6m
pod/feast-kafka-0 1/1 Running 0 3m
pod/feast-kafka-1 1/1 Running 0 4m
pod/feast-kafka-2 1/1 Running 0 4m
pod/feast-postgresql-0 1/1 Running 0 5m
pod/feast-redis-master-0 1/1 Running 0 5m
pod/feast-zookeeper-0 1/1 Running 0 5m
pod/feast-zookeeper-1 1/1 Running 0 5m
pod/feast-zookeeper-2 1/1 Running 0 5mInstall the Python SDK using pip:
pip install -e ${FEAST_HOME_DIR}/sdk/pythonConfigure the Feast Python SDK:
feast config set core_url ${FEAST_CORE_URL}
feast config set serving_url ${FEAST_SERVING_URL}That's it! You can now start to use Feast!
This guide will install Feast into a Kubernetes cluster on GCP. It assumes that all of your services will run within a single K8s cluster. Once Feast is installed you will be able to:
- Define and register features.
- Load feature data from both batch and streaming sources.
- Retrieve features for model training.
- Retrieve features for online serving.
{% hint style="info" %} This guide requires Google Cloud Platform for installation.
- BigQuery is used for storing historical features.
- Cloud Dataflow is used for running data ingestion jobs.
- Google Cloud Storage is used for intermediate data storage. {% endhint %}
- Google Cloud SDK installed, authenticated, and configured to the project you will use.
- Kubectl installed.
- Helm (2.16.0 or greater) installed on your local machine with Tiller installed in your cluster. Helm 3 has not been tested yet.
First define the environmental variables that we will use throughout this installation. Please customize these to reflect your environment.
export FEAST_GCP_PROJECT_ID=my-gcp-project
export FEAST_GCP_REGION=us-central1
export FEAST_GCP_ZONE=us-central1-a
export FEAST_BIGQUERY_DATASET_ID=feast
export FEAST_GCS_BUCKET=${FEAST_GCP_PROJECT_ID}_feast_bucket
export FEAST_GKE_CLUSTER_NAME=feast
export FEAST_S_ACCOUNT_NAME=feast-saCreate a Google Cloud Storage bucket for Feast to stage data during exports:
gsutil mb gs://${FEAST_GCS_BUCKET}Create a BigQuery dataset for storing historical features:
bq mk ${FEAST_BIGQUERY_DATASET_ID}Create the service account that Feast will run as:
gcloud iam service-accounts create ${FEAST_SERVICE_ACCOUNT_NAME}
gcloud projects add-iam-policy-binding ${FEAST_GCP_PROJECT_ID} \
--member serviceAccount:${FEAST_S_ACCOUNT_NAME}@${FEAST_GCP_PROJECT_ID}.iam.gserviceaccount.com \
--role roles/editor
gcloud iam service-accounts keys create key.json --iam-account \
${FEAST_S_ACCOUNT_NAME}@${FEAST_GCP_PROJECT_ID}.iam.gserviceaccount.comEnsure that Dataflow API is enabled:
gcloud services enable dataflow.googleapis.com{% hint style="warning" %} Provisioning a GKE cluster can expose your services publicly. This guide does not cover securing access to the cluster. {% endhint %}
Create a GKE cluster:
gcloud container clusters create ${FEAST_GKE_CLUSTER_NAME} \
--machine-type n1-standard-4Create a secret in the GKE cluster based on your local key key.json:
kubectl create secret generic feast-gcp-service-account --from-file=key.jsonFor this guide we will use NodePort for exposing Feast services. In order to do so, we must find an internal IP of at least one GKE node.
export FEAST_IP=$(kubectl describe nodes | grep ExternalIP | awk '{print $2}' | head -n 1)
export FEAST_CORE_URL=${FEAST_IP}:32090
export FEAST_ONLINE_SERVING_URL=${FEAST_IP}:32091
export FEAST_BATCH_SERVING_URL=${FEAST_IP}:32092Confirm that you are able to access this node (please make sure that no firewall rules are preventing access to these ports):
ping $FEAST_IPPING 10.123.114.11 (10.203.164.22) 56(84) bytes of data.
64 bytes from 10.123.114.11: icmp_seq=1 ttl=63 time=54.2 ms
64 bytes from 10.123.114.11: icmp_seq=2 ttl=63 time=51.2 msRun the following command to provide Tiller with authorization to install Feast:
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
EOFInstall Tiller:
helm init --service-account tillerClone the Feast repository and navigate to the charts sub-directory:
git clone https://github.com/gojek/feast.git && cd feast && \
git checkout 0.3-dev && \
export FEAST_HOME_DIR=$(pwd) && \
cd infra/charts/feastMake a copy of the Helm values.yaml so that it can be customized for your Feast deployment:
cp values.yaml my-feast-values.yamlUpdate my-feast-values.yaml based on your GCP and GKE environment.
- Required fields are paired with comments which indicate whether they need to be replaced.
- All occurrences of
feast.example.comshould be replaced with either your domain name or the IP stored in$FEAST_IP.
Install the Feast Helm chart:
helm install --name feast -f my-feast-values.yaml .Ensure that the system comes online. This will take a few minutes
watch kubectl get podsNAME READY STATUS RESTARTS AGE
pod/feast-feast-core-666fd46db4-l58l6 1/1 Running 0 5m
pod/feast-feast-serving-online-84d99ddcbd 1/1 Running 0 6m
pod/feast-kafka-0 1/1 Running 0 3m
pod/feast-kafka-1 1/1 Running 0 4m
pod/feast-kafka-2 1/1 Running 0 4m
pod/feast-postgresql-0 1/1 Running 0 5m
pod/feast-redis-master-0 1/1 Running 0 5m
pod/feast-zookeeper-0 1/1 Running 0 5m
pod/feast-zookeeper-1 1/1 Running 0 5m
pod/feast-zookeeper-2 1/1 Running 0 5mInstall the Python SDK using pip:
pip install -e ${FEAST_HOME_DIR}/sdk/pythonConfigure the Feast Python SDK:
feast config set core_url ${FEAST_CORE_URL}
feast config set serving_url ${FEAST_ONLINE_SERVING_URL}That's it! You can now start to use Feast!