Skip to content

GKE Ingestion Jobs Failing  #427

@lgvital

Description

@lgvital

Expected Behavior

Ingestion jobs successfully run and bigquery table (feast.customer_project_customer_transactions_v1) gets populated with data.

Current Behavior

>>> client.ingest("customer_transactions", customer_features)
Waiting for feature set to be ready for ingestion...
  0%|                                                                                                | 0/155 [04:48<?, ?rows/s]
Ingestion complete!

Ingestion statistics:
Success: 0/155
Removing temporary file(s)...

BQ table is empty:

SELECT COUNT(*) FROM `feast.customer_project_customer_transactions_v1`;

is 0.

Steps to reproduce

Follow latest GKE setup docs with the following differences:

  • my-project as project ID
  • Use image 0.4.2
  • Use NodePort to expose service to local client
  • In basic example, use $FEAST_CORE_URL and $FEAST_BATCH_SERVING_URL
  • Open up nodeports in GCP firewall 32090, 32091, 32092:
gcloud compute firewall-rules create feast-core-port --allow tcp:32090
gcloud compute firewall-rules create feast-online-port --allow tcp:32091
gcloud compute firewall-rules create feast-batch-port --allow tcp:32092

my-feast-values.yaml ends up looking like:

feast-core:
  enabled: true
  image:
    tag: "0.4.2"
  jvmOptions:
  - -Xms1024m
  - -Xmx1024m
  resources:
    requests:
      cpu: 1000m
      memory: 1024Mi
  service:
    type: NodePort
    grpc:
      nodePort: 32090
  gcpServiceAccount:
    useExistingSecret: true
feast-serving-online:
  enabled: true
  redis:
    enabled: true
  image:
    tag: "0.4.2"
  jvmOptions:
  - -Xms1024m
  - -Xmx1024m
  resources:
    requests:
      cpu: 500m
      memory: 1024Mi
  service:
    type: NodePort
    grpc:
      nodePort: 32091
  store.yaml:
    name: redis
    type: REDIS
    redis_config:
      port: 6379
    subscriptions:
    - name: "*"
      project: "*"
      version: "*"
feast-serving-batch:
  enabled: true
  redis:
    enabled: false
  image:
    tag: "0.4.2"
  jvmOptions:
  - -Xms1024m
  - -Xmx1024m
  resources:
    requests:
      cpu: 500m
      memory: 1024Mi
  service:
    type: NodePort
    grpc:
      nodePort: 32092
  gcpServiceAccount:
    useExistingSecret: true
  application.yaml:
    feast:
      jobs:
        staging-location: gs://my-project_feast_bucket/serving/batch
        store-type: REDIS
        store-options:
          host: localhost
          port: 6379
  store.yaml:
    name: bigquery
    type: BIGQUERY
    bigquery_config:
      project_id: my-project
      dataset_id: feast
    subscriptions:
    - name: "*"
      project: "*"
      version: "*"

Specifications

  • Version: 0.4.2, latest master python SDK (2a33f7b)
  • Platform: Installing GKE on local Mac OSX
  • Subsystem: Python 3.7.6, helm v2.16.1, kubectl v1.17.0

Possible Solution

Default Kafka configs might need adjusting? Maybe related to NodePort config?

kubectl logs for feast-feast-core appear to show kafka jobs were successfully created and sent to DirectRunner:

22:57:52 [pool-5-thread-1] INFO  feast.ingestion.ImportJob - Starting import job with settings:
Current Settings:
  appName: DirectRunnerJobManager
  blockOnRun: false
  enforceEncodability: true
  enforceImmutability: true
  featureSetJson: [{
  "name": "customer_transactions",
  "version": 1,
  "entities": [{
    "name": "customer_id",
    "valueType": "INT64"
  }],
  "features": [{
    "name": "total_transactions",
    "valueType": "INT64"
  }, {
    "name": "daily_transactions",
    "valueType": "DOUBLE"
  }],
  "maxAge": "86400s",
  "source": {
    "type": "KAFKA",
    "kafkaSourceConfig": {
      "bootstrapServers": "feast-kafka:9092",
      "topic": "feast"
    }
  },
  "project": "customer_project"
}]
  gcsPerformanceMetrics: false
  optionsId: 0
  project:
  runner: class org.apache.beam.runners.direct.DirectRunner
  stableUniqueNames: WARNING
  storeJson: [{
  "name": "bigquery",
  "type": "BIGQUERY",
  "subscriptions": [{
    "name": "*",
    "version": "*",
    "project": "*"
  }],
  "bigqueryConfig": {
    "projectId": "my-project",
    "datasetId": "feast"
  }
}]

22:57:54 [pool-5-thread-1] INFO  feast.ingestion.utils.StoreUtil - Writing to existing BigQuery table 'my-project:feast.customer_project_customer_transactions_v1'
22:57:54 [pool-6-thread-1] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Partitions assigned to split 0 (total 1): feast-0
2020-01-09 22:57:54.984 AUDIT feast-feast-core-dc485b44d-qg75w --- [pool-6-thread-1] f.c.l.AuditLogger                        : {action=STATUS_CHANGE, detail=Job submitted to runner DirectRunner with ext id kafka-to-redis1578610671583., id=kafka-to-redis1578610671583, resource=JOB, timestamp=Thu Jan 09 22:57:54 UTC 2020}
22:57:55 [pool-5-thread-1] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Partitions assigned to split 0 (total 1): feast-0
2020-01-09 22:57:55.373 AUDIT feast-feast-core-dc485b44d-qg75w --- [pool-5-thread-1] f.c.l.AuditLogger                        : {action=STATUS_CHANGE, detail=Job submitted to runner DirectRunner with ext id kafka-to-bigquery1578610671583., id=kafka-to-bigquery1578610671583, resource=JOB, timestamp=Thu Jan 09 22:57:55 UTC 2020}
22:57:55 [pool-2-thread-1] INFO  feast.core.service.JobCoordinatorService - Updating feature set status
22:57:58 [direct-runner-worker] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Reader-0: reading from feast-0 starting at offset 0
22:57:58 [direct-runner-worker] INFO  org.apache.beam.sdk.io.kafka.KafkaUnboundedSource - Reader-0: reading from feast-0 starting at offset 0

And a feast topic was successfully created according to the kafka-config pod:

Waiting for Zookeeper...
Waiting for Kafka...
Applying runtime configuration using confluentinc/cp-kafka:5.0.1
Created topic "feast".
Configs for topic 'feast' are 

But, I'm getting errors thrown related to topics in feast-kafka-0 logs:

[2020-01-09 22:57:57,799] INFO [Log partition=__consumer_offsets-2, dir=/opt/kafka/data/logs] Truncating to 0 has no effect as the largest offset in the log is -1 (kafka.log.Log)
[2020-01-09 22:57:57,804] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition __consumer_offsets-8 at offset 0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
[2020-01-09 22:57:57,804] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition __consumer_offsets-35 at offset 0 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server does not host this topic-partition.
[2020-01-09 22:57:57,805] ERROR [ReplicaFetcher replicaId=0, leaderId=1, fetcherId=0] Error for partition __consumer_offsets-41 at offset 0 (kafka.server.ReplicaFetcherThread)

Happy to share any other relevant logs. I'm admittedly not familiar with Kafka, so I could be off here. Just trying to get the GKE feast guide + basic example working end to end. Once it works, happy to put up a PR to update the guide for 0.4.X.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions