Encode feature row before storing in Redis by khorshuheng · Pull Request #530 · feast-dev/feast

khorshuheng · 2020-03-11T08:15:04Z

What this PR does / why we need it:
Currently, feature names are stored together with the values in Redis, which result in excessive memory foot print in cases where feature names are long strings.

This PR will encode the Feature Row prior to storing them in Redis. Encoding strategy:

Entities are removed from the Redis Value as it already exist in the key.
Feature names are removed, leave only the values. This can then be reconstructed during serving, based on the corresponding feature set spec. Since feature set information is present on the key, the spec is retrievable.

In cases where the Feature Row are from external Kafka source, where it is possible for the feature row to be malformed, there are a few checks in place:

It's fine for the feature row to have feature fields in a different order than that specified in the feature set spec, as the encoding process will sort the feature in alphabetical order based on name.
Duplicated fields are tolerated.

Good to have, but out of scope for this PR:

Under normal circumstances, the decoding process should always succeed, unless the feature set spec obtained during serving is somehow inconsistent with the feature set spec used to encode the feature row due to some bugs. There should be a way to verify this during run time. One approach, is to store checksum/fingerprint of the FeatureSet configuration along with the value in Redis. Not implemented in this PR as there needs to be more discussion on the implementation details.

Which issue(s) this PR fixes:

Fixes #515.

Does this PR introduce a user-facing change?:

After this patch, the ingestion job will start storing the feature row in encoded format. However, the serving job is still able to read the non encoded feature row, hence it is not necessary to re-ingest the data prior to the application of the patch, unless the user wish to reduce the memory footprint of the existing keys.

khorshuheng · 2020-03-11T08:24:28Z

/test test-core-and-ingestion

woop · 2020-03-11T08:33:32Z

+  private byte[] getValue(FeatureRow featureRow) {
+    FeatureRowEncoder encoder =
+        new FeatureRowEncoder(featureSets.get(featureRow.getFeatureSet()).getSpec());
+    return encoder.encode(featureRow).toByteArray();


Would this make more sense as a static method encode(featureRow, featureSet)?

Yup, a static method does make more sense. Unless we need to support multiple encoding mechanism, which is unlikely.

woop · 2020-03-11T08:42:01Z

+   * @return boolean
+   */
+  public Boolean isEncodingValid(FeatureRow featureRow) {
+    return featureRow.getFieldsList().size() == spec.getFeaturesList().size();


If you discover an old feature row (that is still within max age), how will you interpret it? Are we using the versions here to be able to interpret old feature rows?

Redis Key contains both the feature set name and version. Since both ingestion and serving encode/decode based on the feature set name and version specified in Redis Key, the spec should be consistent, and therefore isEncodingValid will always return true.

Originally i added this method to partially address the concern raised in #515:

If field values are only going to be associated with field names at runtime by external configuration has any thought been given to a method for ensuring that the same configuration that was used to write the data is the configuration used to read the data? Something such as a checksum/fingerprint of the FeatureSet configuration stored alongside the data in Redis (or in the key) will help prevent a mismatch of configuration and data due to a bug somewhere else in the system.

Perhaps i should remove this method from this PR?

For this thread and the PR as a whole especially the part quoted below, does this approach fall down and need to be reworked as versions get removed?

In cases where the Feature Row are from external Kafka source, where it is possible for the feature row to be malformed, there are a few checks in place:

It's fine for the feature row to have feature fields in a different order than that specified in the feature set spec, as the encoding process will sort the feature in alphabetical order based on name.

Oh yeah, @zhilingc has brought this up in a comment thread on her versions removal RFC. Ouch 🤕

khorshuheng · 2020-03-11T10:03:49Z

/test test-end-to-end-batch

zhilingc · 2020-03-13T03:25:16Z

/lgtm

zhilingc · 2020-03-13T07:03:50Z

/approve

feast-ci-bot · 2020-03-13T07:03:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: khorshuheng, zhilingc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [zhilingc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Encode feature row before storing in Redis * Include encoding as part of RedisMutationDoFn Co-authored-by: Khor Shu Heng <khor.heng@gojek.com>

Encode feature row before storing in Redis

67f5bc9

khorshuheng requested review from pradithya and zhilingc as code owners March 11, 2020 08:15

feast-ci-bot added the size/XL label Mar 11, 2020

woop reviewed Mar 11, 2020

View reviewed changes

Comment thread ingestion/src/main/java/feast/store/serving/redis/FeatureRowToRedisMutationDoFn.java

woop reviewed Mar 11, 2020

View reviewed changes

Include encoding as part of RedisMutationDoFn

9252360

feast-ci-bot added size/L and removed size/XL labels Mar 11, 2020

feast-ci-bot assigned zhilingc Mar 13, 2020

feast-ci-bot added the lgtm label Mar 13, 2020

feast-ci-bot added the approved label Mar 13, 2020

feast-ci-bot merged commit e7a1a39 into feast-dev:master Mar 13, 2020

khorshuheng mentioned this pull request Mar 16, 2020

Encode feature row before storing in Redis (#530) #540

Merged

khorshuheng added a commit that referenced this pull request Mar 16, 2020

Encode feature row before storing in Redis (#530)

2e9b93a

* Encode feature row before storing in Redis * Include encoding as part of RedisMutationDoFn Co-authored-by: Khor Shu Heng <khor.heng@gojek.com>

khorshuheng mentioned this pull request Mar 18, 2020

Optimize Redis memory foot print for ingestion / serving #515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode feature row before storing in Redis#530

Encode feature row before storing in Redis#530
feast-ci-bot merged 2 commits into
feast-dev:masterfrom
khorshuheng:reduce-memory-footprint

khorshuheng commented Mar 11, 2020

Uh oh!

khorshuheng commented Mar 11, 2020

Uh oh!

Uh oh!

woop Mar 11, 2020

Uh oh!

khorshuheng Mar 11, 2020

Uh oh!

woop Mar 11, 2020

Uh oh!

khorshuheng Mar 11, 2020

Uh oh!

ches May 4, 2020

Uh oh!

ches May 4, 2020

Uh oh!

khorshuheng commented Mar 11, 2020

Uh oh!

zhilingc commented Mar 13, 2020

Uh oh!

zhilingc commented Mar 13, 2020

Uh oh!

feast-ci-bot commented Mar 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

khorshuheng commented Mar 11, 2020

Uh oh!

khorshuheng commented Mar 11, 2020

Uh oh!

Uh oh!

woop Mar 11, 2020

Choose a reason for hiding this comment

Uh oh!

khorshuheng Mar 11, 2020

Choose a reason for hiding this comment

Uh oh!

woop Mar 11, 2020

Choose a reason for hiding this comment

Uh oh!

khorshuheng Mar 11, 2020

Choose a reason for hiding this comment

Uh oh!

ches May 4, 2020

Choose a reason for hiding this comment

Uh oh!

ches May 4, 2020

Choose a reason for hiding this comment

Uh oh!

khorshuheng commented Mar 11, 2020

Uh oh!

zhilingc commented Mar 13, 2020

Uh oh!

zhilingc commented Mar 13, 2020

Uh oh!

feast-ci-bot commented Mar 13, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants