Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use bzip2 compressed feature set json as pipeline option #466

Merged
merged 3 commits into from
Feb 14, 2020

Conversation

khorshuheng
Copy link
Collaborator

What this PR does / why we need it:
Dataflow runner has a limit of 256kb for pipeline option. As we are storing feature sets as json string in pipeline option, the size will grow proportionally to the number of feature set versions. Compressing the feature set json will help us to support more feature sets.

Which issue(s) this PR fixes:
None

Does this PR introduce a user-facing change?:
Users will be able to have more feature set before dataflow job submission fails. However, this depends on the compression ratio, which in turn depends on how much repetition exists in ithe feature set json.

@Yanson
Copy link
Contributor

Yanson commented Feb 7, 2020

There are a lot of formatting changes in this. Which IDE + settings are you using?

I have imported the IntelliJ settings as described here:
https://github.com/gojek/feast/blob/master/docs/contributing.md#code-conventions

but I don't think it matches what you've submitted.

@khorshuheng
Copy link
Collaborator Author

There are a lot of formatting changes in this. Which IDE + settings are you using?

I have imported the IntelliJ settings as described here:
https://github.com/gojek/feast/blob/master/docs/contributing.md#code-conventions

but I don't think it matches what you've submitted.

It's the maven spotless plugin.


public class ProtoUtil {

public static String toJson(List<FeatureSetProto.FeatureSet> featureSets) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid creating non-generic utility methods. ProtoUtil and toJson seem like a generic class and method, but the implementation is specific to FeatureSetProtos.

Either we need to rename this to be more specific and generalize later, or move this functionality out.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename seems more of a band aid solution, so i refactored my commits such that it is no longer under util, and can be extended for other compression strategies.

@woop
Copy link
Member

woop commented Feb 14, 2020

/lgtm
/approve

@feast-ci-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, woop

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@feast-ci-bot feast-ci-bot merged commit bbea7c2 into feast-dev:master Feb 14, 2020
khorshuheng added a commit to khorshuheng/feast that referenced this pull request Feb 14, 2020
* Use bzip2 compressed feature set json as pipeline option

* Make decompressor and compressor more generic and extensible

* Avoid code duplication in test
khorshuheng added a commit that referenced this pull request Feb 14, 2020
* Use bzip2 compressed feature set json as pipeline option

* Make decompressor and compressor more generic and extensible

* Avoid code duplication in test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants