Skip to content

Bytewax materialization can run infinitely #3788

@james-crabtree-sp

Description

@james-crabtree-sp

Expected Behavior

Bytewax materialization should run all pods once successfully and then set job status as success

Current Behavior

In the event that a node crashes, successful pod records can be lost and the job will rerun all of those lost pods. If these node crashes occur often enough, this can result in a job continuously rerunning successful pods and never completing.

Steps to reproduce

Run a materialization job against a multi-node kubernetes cluster. Terminate one of the nodes, observe that pods are lost and rerun

Specifications

  • Version: 0.31
  • Platform: fedora linux
  • Subsystem: bytewax batch_engine

Possible Solution

For safety, the job should have a configurable activeDeadlineSeconds. The larger job should also be able to be split into smaller batches to mitigate the effect a node crash can have

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions