Skip to content

Use memory efficiently in feature materialization #2071

@judahrand

Description

@judahrand

Is your feature request related to a problem? Please describe.
Currently, the materialization process loads all the data from the Offline Store to an Arrow table, then converts all the data to Protobuf, then writes all the data to the Online Store. This process requires holding the entire dataset in memory which is not practical.

Describe the solution you'd like
Instead of returning an Arrow table yield a series of RecordBatches from the OfflineStore and process each batch individually.

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions