Modularize ingestion distributed compute engine support

This is a companion to #402 and the larger topic of storage engine modularization which was realized in #529 and subsequent PRs that implemented the new interfaces.

Just as adding support for new storage engines tends to cause a dependency explosion for Feast `ingestion` & `serving`, the same is true for Beam Runner / job management adapter glue in `core` (this all could move to `serving` with future plans, but that won't change the fundamental problem this issue is about).

So for both storage and compute engines, I feel that some modularity strategy is needed for loose binding at build time, configurable for runtime. The goals would be to:

- Minimize dependency pains that developers and contributors to Feast need to deal with if they are not actively working on a particular stack. The dependency trees are often large and fragile, especially in the Hadoop ecosystem, such as Hive and Spark.
- Reduce deployment bloat if operators wish to package Feast internally with only the module JARs they need to support their organization's stack. (IIRC last I checked, `hadoop-common` or `hadoop-client` leave you with close to 200MB of jars, and `beam-runners-spark` and `beam-sdks-java-io-hcatalog` among others have these deps [as provided scope, but the point stands I believe]).

Possibilities might be OSGi or `java.util.ServiceLoader` (and Spring integration or alternatives thereof). Open to other ideas!

Relates to #362 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularize ingestion distributed compute engine support #444

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Modularize ingestion distributed compute engine support #444

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions