Skip to content

Use bzip2 compressed feature set json as pipeline option#466

Merged
feast-ci-bot merged 3 commits intofeast-dev:masterfrom
khorshuheng:compressed-pipeline
Feb 14, 2020
Merged

Use bzip2 compressed feature set json as pipeline option#466
feast-ci-bot merged 3 commits intofeast-dev:masterfrom
khorshuheng:compressed-pipeline

Conversation

@khorshuheng
Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:
Dataflow runner has a limit of 256kb for pipeline option. As we are storing feature sets as json string in pipeline option, the size will grow proportionally to the number of feature set versions. Compressing the feature set json will help us to support more feature sets.

Which issue(s) this PR fixes:
None

Does this PR introduce a user-facing change?:
Users will be able to have more feature set before dataflow job submission fails. However, this depends on the compression ratio, which in turn depends on how much repetition exists in ithe feature set json.

@Yanson
Copy link
Copy Markdown
Contributor

Yanson commented Feb 7, 2020

There are a lot of formatting changes in this. Which IDE + settings are you using?

I have imported the IntelliJ settings as described here:
https://github.com/gojek/feast/blob/master/docs/contributing.md#code-conventions

but I don't think it matches what you've submitted.

@khorshuheng
Copy link
Copy Markdown
Collaborator Author

There are a lot of formatting changes in this. Which IDE + settings are you using?

I have imported the IntelliJ settings as described here:
https://github.com/gojek/feast/blob/master/docs/contributing.md#code-conventions

but I don't think it matches what you've submitted.

It's the maven spotless plugin.


public class ProtoUtil {

public static String toJson(List<FeatureSetProto.FeatureSet> featureSets) throws IOException {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid creating non-generic utility methods. ProtoUtil and toJson seem like a generic class and method, but the implementation is specific to FeatureSetProtos.

Either we need to rename this to be more specific and generalize later, or move this functionality out.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename seems more of a band aid solution, so i refactored my commits such that it is no longer under util, and can be extended for other compression strategies.

@woop
Copy link
Copy Markdown
Member

woop commented Feb 14, 2020

/lgtm
/approve

@feast-ci-bot
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit bbea7c2 into feast-dev:master Feb 14, 2020
khorshuheng added a commit to khorshuheng/feast that referenced this pull request Feb 14, 2020
* Use bzip2 compressed feature set json as pipeline option

* Make decompressor and compressor more generic and extensible

* Avoid code duplication in test
khorshuheng added a commit that referenced this pull request Feb 14, 2020
* Use bzip2 compressed feature set json as pipeline option

* Make decompressor and compressor more generic and extensible

* Avoid code duplication in test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants