Use bzip2 compressed feature set json as pipeline option#466
Use bzip2 compressed feature set json as pipeline option#466feast-ci-bot merged 3 commits intofeast-dev:masterfrom
Conversation
|
There are a lot of formatting changes in this. Which IDE + settings are you using? I have imported the IntelliJ settings as described here: but I don't think it matches what you've submitted. |
It's the maven spotless plugin. |
|
|
||
| public class ProtoUtil { | ||
|
|
||
| public static String toJson(List<FeatureSetProto.FeatureSet> featureSets) throws IOException { |
There was a problem hiding this comment.
Can we avoid creating non-generic utility methods. ProtoUtil and toJson seem like a generic class and method, but the implementation is specific to FeatureSetProtos.
Either we need to rename this to be more specific and generalize later, or move this functionality out.
There was a problem hiding this comment.
Rename seems more of a band aid solution, so i refactored my commits such that it is no longer under util, and can be extended for other compression strategies.
ingestion/src/main/java/feast/ingestion/utils/CompressionUtil.java
Outdated
Show resolved
Hide resolved
551652a to
0fe5654
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: khorshuheng, woop The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Use bzip2 compressed feature set json as pipeline option * Make decompressor and compressor more generic and extensible * Avoid code duplication in test
* Use bzip2 compressed feature set json as pipeline option * Make decompressor and compressor more generic and extensible * Avoid code duplication in test
What this PR does / why we need it:
Dataflow runner has a limit of 256kb for pipeline option. As we are storing feature sets as json string in pipeline option, the size will grow proportionally to the number of feature set versions. Compressing the feature set json will help us to support more feature sets.
Which issue(s) this PR fixes:
None
Does this PR introduce a user-facing change?:
Users will be able to have more feature set before dataflow job submission fails. However, this depends on the compression ratio, which in turn depends on how much repetition exists in ithe feature set json.