-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Delete entity key from Redis only when all attached feature views are gone #2240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: pyalex <moskalenko.alexey@gmail.com>
… gone Signed-off-by: pyalex <moskalenko.alexey@gmail.com>
Signed-off-by: pyalex <moskalenko.alexey@gmail.com>
c59dd35 to
9124605
Compare
Signed-off-by: pyalex <moskalenko.alexey@gmail.com>
Signed-off-by: pyalex <moskalenko.alexey@gmail.com>
Codecov Report
@@ Coverage Diff @@
## master #2240 +/- ##
===========================================
- Coverage 85.24% 60.28% -24.96%
===========================================
Files 107 107
Lines 8605 8631 +26
===========================================
- Hits 7335 5203 -2132
- Misses 1270 3428 +2158
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if, for example, you have two different FeatureViews both with the same join_key. Now you want to delete one of those FeatureViews but not the other.
In the Redis data model the features from both feature views are stored as keys under the same HASH. With these changes the extra features won't be deleted because the 'Entity' is still used.
I think there needs to be a method which is able to delete certain Features (from a FeatureView) when that FeatureView is changed or deleted. This method would have to use HDEL.
I guess this does fix #2150 but it introduces a new bug (which is admittedly less system breaking).
|
@judahrand |
Why not? The signature of the method you've renamed to Then once you have that you can also determine the fields for that fields = [_mmh3(f"{table.name}:{feature.name}") for feature in table.features ]Once you've done that you just need to loop over the HASHs which match your prefix and call HDEL for each field. I was thinking that def delete_table_values(self, config: RepoConfig, table: FeatureView):
client = self._get_client(config.online_store)
deleted_count = 0
pipeline = client.pipeline()
prefix = _redis_key_prefix(table.entities)
fields = [_mmh3(f"{table.name}:{feature.name}") for feature in table.features]
for _k in client.scan_iter(
b"".join([prefix, b"*", config.project.encode("utf8")])
):
pipeline.hdel(_k, *fields)
deleted_count += 1
pipeline.execute()
logger.debug(f"Deleted {deleted_count} keys for {table.name}")This would at least delete the correct features when an entire FeatureView was deleted.
Yeah, this is true. And arguably is a bigger problem. Letting the OnlineStore just get less and less efficient/bloated etc should not be acceptable. 🤔 I can see, however, that my proposed solution does still have issues. If the FeatureView which has been deleted has been edited a lot in the past (eg. Feature renames, partial removals, etc.) then it will leave some garbage behind. But that feels better to me than what sort of amounts to a reference counter on the whole key - and the key is only released when there are 0 references (even though there could be thousands of unused fields under that key). |
do you think |
|
@felixwang9817 yes, I hope that with new design and APIs objects like online store will have more control and will receive all information, like about renaming a view or renaming a feature, whereas right now options are very limited. |
felixwang9817
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: felixwang9817, pyalex The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Add backticks to left_table_query_string (#2250) Signed-off-by: david <davidmiller252@gmail.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Delete entity key from Redis only when all attached feature views are gone (#2240) * Delete entity from redis when the last attached feature view is deleted Signed-off-by: pyalex <moskalenko.alexey@gmail.com> * Delete entity key from Redis only when all attached feature views are gone Signed-off-by: pyalex <moskalenko.alexey@gmail.com> * make lint happy Signed-off-by: pyalex <moskalenko.alexey@gmail.com> * make lint happy Signed-off-by: pyalex <moskalenko.alexey@gmail.com> * one more try with mypy Signed-off-by: pyalex <moskalenko.alexey@gmail.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * historical_field_mappings2 merge for one sign off commit (#2252) Signed-off-by: Michelle Rascati <michelle.rascati@sailpoint.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Correct inconsistent dependency (#2255) Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Add snowflake environment variables to allow testing on snowflake infra (#2258) * add snowflake environment vars to test framework Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * add snowflake environment vars to test framework Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Return `UNIX_TIMESTAMP` as Python `datetime` (#2244) * Refactor `UNIX_TIMESTAMP` conversion Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com> * Return `UNIX_TIMESTAMP` types as `datetime` to user Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com> * Fix linting errors Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com> * Rename variable to something more sensible Signed-off-by: Judah Rand <17158624+judahrand@users.noreply.github.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Feast plan clean up (#2256) * Run validation and inference on views and entities during plan Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Do not log objects that are unchanged Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Rename Fco to FeastObject Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Remove useless method Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Lint Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Always initialize registry during feature store initialization Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Fix usage test Signed-off-by: Felix Wang <wangfelix98@gmail.com> * Remove print statements Signed-off-by: Felix Wang <wangfelix98@gmail.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Squash commits Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Add error type and refactor query execution to have retries Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Handle more snowflake errors Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Fix lint errors Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Fix lint errors Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Fix lint errors Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * Fix wrong import Signed-off-by: Danny Chiao <danny@tecton.ai> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * modify registry.db s3 object initialization to work in S3 subdirectory with Java Feast Server (#2259) Signed-off-by: NalinGHub <nalinm01@gmail.com> Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * clean up docs Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * lint-python Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * fixed historical test Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> * fixed historical test Signed-off-by: sfc-gh-madkins <miles.adkins@snowflake.com> Co-authored-by: David Miller <david@patagona.ca> Co-authored-by: Oleksii Moskalenko <moskalenko.alexey@gmail.com> Co-authored-by: Michelle Rascati <44408275+michelle-rascati-sp@users.noreply.github.com> Co-authored-by: Judah Rand <17158624+judahrand@users.noreply.github.com> Co-authored-by: Felix Wang <wangfelix98@gmail.com> Co-authored-by: Danny Chiao <danny@tecton.ai> Co-authored-by: Nalin Mehra <37969183+NalinGHub@users.noreply.github.com>
What this PR does / why we need it:
This PR adds the test for the use case described in #2150 as well as the fix for it.
Which issue(s) this PR fixes:
Fixes #2150
Does this PR introduce a user-facing change?: