This is a companion to #402 and the larger topic of storage engine modularization which was realized in #529 and subsequent PRs that implemented the new interfaces.
Just as adding support for new storage engines tends to cause a dependency explosion for Feast ingestion & serving, the same is true for Beam Runner / job management adapter glue in core (this all could move to serving with future plans, but that won't change the fundamental problem this issue is about).
So for both storage and compute engines, I feel that some modularity strategy is needed for loose binding at build time, configurable for runtime. The goals would be to:
- Minimize dependency pains that developers and contributors to Feast need to deal with if they are not actively working on a particular stack. The dependency trees are often large and fragile, especially in the Hadoop ecosystem, such as Hive and Spark.
- Reduce deployment bloat if operators wish to package Feast internally with only the module JARs they need to support their organization's stack. (IIRC last I checked,
hadoop-common or hadoop-client leave you with close to 200MB of jars, and beam-runners-spark and beam-sdks-java-io-hcatalog among others have these deps [as provided scope, but the point stands I believe]).
Possibilities might be OSGi or java.util.ServiceLoader (and Spring integration or alternatives thereof). Open to other ideas!
Relates to #362
This is a companion to #402 and the larger topic of storage engine modularization which was realized in #529 and subsequent PRs that implemented the new interfaces.
Just as adding support for new storage engines tends to cause a dependency explosion for Feast
ingestion&serving, the same is true for Beam Runner / job management adapter glue incore(this all could move toservingwith future plans, but that won't change the fundamental problem this issue is about).So for both storage and compute engines, I feel that some modularity strategy is needed for loose binding at build time, configurable for runtime. The goals would be to:
hadoop-commonorhadoop-clientleave you with close to 200MB of jars, andbeam-runners-sparkandbeam-sdks-java-io-hcatalogamong others have these deps [as provided scope, but the point stands I believe]).Possibilities might be OSGi or
java.util.ServiceLoader(and Spring integration or alternatives thereof). Open to other ideas!Relates to #362