feat: Only call relevant registry endpoint when getting feature views…#289
Conversation
… by feature refs or service
…' into range-query-latency-improvements
go/internal/feast/featurestore.go
Outdated
| if featureService != nil { | ||
| requestedFeatureViews, requestedSortedFeatureViews, _, err = | ||
| onlineserving.GetFeatureViewsToUseByService(featureService, fs.registry, fs.config.Project) | ||
| // TODO: When sorted feature views are supported by feature services, we'll need to implement |
There was a problem hiding this comment.
It works currently with feature services. Only caveat is that feature service should only have Sorted Feature View features as far as I know.
There was a problem hiding this comment.
I re-added the support for feature services by adding the GetSortedFeatureViewsToUseByService method.
| assert.Equal(t, `{"error":"GetOnlineFeaturesRange does not support standard feature views [all_dtypes]","status_code":400}`, responseRecorder.Body.String(), "Response body does not match expected error message") | ||
| } | ||
|
|
||
| func TestGetOnlineFeaturesRange_Http_withInvalidFeatureView(t *testing.T) { |
There was a problem hiding this comment.
Can we change this test to a invalid sorted feature view?
| fv, fvErr := registry.GetFeatureView(projectName, featureViewName) | ||
|
|
||
| if fvErr == nil { |
There was a problem hiding this comment.
| fv, fvErr := registry.GetFeatureView(projectName, featureViewName) | |
| if fvErr == nil { | |
| if fv, fvErr := registry.GetFeatureView(projectName, featureViewName); fvErr == nil { |
| addFeaturesToValidationMap(fv.Base.Name, fv.Base.Features, viewToFeaturesValidationMap) | ||
| if !viewToFeaturesValidationMap[fv.Base.Name][featureName] { | ||
| invalidFeatures = append(invalidFeatures, featureRef) | ||
| } else { | ||
| validFeatures := make([]string, 0) | ||
| seenFeatures := make(map[string]bool) | ||
| for _, featureName := range requestedFeatureNames { | ||
| if !viewToFeaturesValidationMap[fv.Base.Name][featureName] { | ||
| invalidFeatures = append(invalidFeatures, fmt.Sprintf("%s:%s", featureViewName, featureName)) | ||
| } else if !seenFeatures[featureName] { | ||
| validFeatures = append(validFeatures, featureName) | ||
| seenFeatures[featureName] = true | ||
| } | ||
| } | ||
|
|
||
| if len(validFeatures) > 0 { | ||
| if viewAndRef, ok := viewNameToViewAndRefs[fv.Base.Name]; ok { | ||
| viewAndRef.FeatureRefs = addStringIfNotContains(viewAndRef.FeatureRefs, featureName) | ||
| for _, feat := range validFeatures { | ||
| viewAndRef.FeatureRefs = addStringIfNotContains(viewAndRef.FeatureRefs, feat) | ||
| } |
There was a problem hiding this comment.
all this logic can be moved to a different function and can be reused across.
Function with two args - requested features and feature view features. Validate and fail if invalid feature exists. We can fail fast instead of collecting all invalid features and throw error.
There was a problem hiding this comment.
Same comments apply in ODFV, SortedFeatureView logic below
There was a problem hiding this comment.
I'm now validating features in the serarate validateFeatures method.
| for _, feat := range validFeatures { | ||
| viewAndRef.FeatureRefs = addStringIfNotContains(viewAndRef.FeatureRefs, feat) | ||
| } |
There was a problem hiding this comment.
I don't understand why this logic is for.
There was a problem hiding this comment.
EDIT: Remove duplicates in viewNameToFeatureNames construction and use it in else block directly.
There was a problem hiding this comment.
I think the addStringIfNotContains use in the GetFeatureViewsToUseByFeatureRefs was motivated by the desire to match the style to how it's used in the GetFeatureViewsToUseByService method (where it actually makes sense). But yeah doing the dedupe directly within viewNameToFeatureNames makes more sense, so we dont need to dedupe it within the for loop.
| viewNameToViewAndRefs[featureProjection.NameToUse()] = &FeatureViewAndRefs{ | ||
| View: view, | ||
| FeatureRefs: []string{}, | ||
| if _, exists := viewNameToProjections[featureViewName]; !exists { |
There was a problem hiding this comment.
can we reuse code in GetFeatureViewsToUseByFeatureRefs() instead of rewriting most of the same logic?
go/internal/feast/integration_tests/scylladb/http/http_integration_test.go
Show resolved
Hide resolved
| func TestGetOnlineFeaturesRange_Http_withFeatureService(t *testing.T) { | ||
| requestJson := []byte(`{ | ||
| "feature_service": "test_service", | ||
| "feature_service": "test_sorted_service", |
There was a problem hiding this comment.
You can also have the previous test case which is invalid feature service passed to range query endpoint.
There was a problem hiding this comment.
You mean adding TestGetOnlineFeaturesRange_Http_withInvalidFeatureService right?
| Field(name="array_int_val", dtype=Array(Int32)), | ||
| Field(name="array_long_val", dtype=Array(Int64)), | ||
| Field(name="array_float_val", dtype=Array(Float32)), | ||
| Field(name="array_double_val", dtype=Array(Float64)), | ||
| Field(name="array_string_val", dtype=Array(String)), | ||
| Field(name="array_boolean_val", dtype=Array(Bool)), | ||
| Field(name="array_byte_val", dtype=Array(Bytes)), | ||
| Field(name="array_timestamp_val", dtype=Array(UnixTimestamp)), | ||
| Field(name="null_array_timestamp_val", dtype=Array(UnixTimestamp)), |
There was a problem hiding this comment.
why do we need this shuffle? If not needed, please revert this change.
There was a problem hiding this comment.
The reason for this shuffle is to align with the order that the valid_response.json expects the features to be in. The GetSortedFeatureViewsToUseByFeatureRefs and GetFeatureViewsToUseByFeatureRefs methods return the sfv's FeatureRefs in the order of their appearance in the projection. The order of the projection is the order in which the features appear in the schema when defining the sfv.
There was a problem hiding this comment.
Ok. I would change valid_response.json instead of this.
| return viewNameToProjections, viewNameOrder | ||
| } | ||
|
|
||
| func buildDeduplicatedFeatureNamesMap(features []string) (map[string][]string, []string, error) { |
There was a problem hiding this comment.
If I understand that correctly, viewNameOrder is helping you preserve the order. You can get rid of map and create a type object with feature View Name and List of Features. You can create a List of type object and avoid the problem you are facing and It simplifies the code as well. You can add a method to find if feature exists or not for that object to quickly check as well.
There was a problem hiding this comment.
Okay I'll add this type instead.
| return ×tamppb.Timestamp{} | ||
| } | ||
|
|
||
| func buildViewNameToProjectionsMap(featureService *model.FeatureService) (map[string][]*model.FeatureViewProjection, []string) { |
There was a problem hiding this comment.
This can return just list of FeatureRefs. That can be passed as argument to GetFeatureViewsToUseByFeatureRefs() method to get the output. This reuses the code.
There was a problem hiding this comment.
So you mean extract the feature refs from the feature service, then call Get{Sorted}FeatureViewsToUseByFeatureRefs from within the Get{Sorted}FeatureViewsToUseByService method?
There was a problem hiding this comment.
We had offline discussion. Feature Service doesn't return the list of features directly. We can revert Get{Sorted}FeatureViewsToUseByService function changes and only do split between FV and SFV for now.
go/internal/test/go_test_utils.go
Outdated
| return model.NewFeatureServiceFromProto(fsProto) | ||
| } | ||
|
|
||
| func CreateFeatureServiceWithOrderedProjections(name string, orderedProjections []OrderedProjection) *model.FeatureService { |
There was a problem hiding this comment.
I assume this shouldn't be fixed at test level.
There was a problem hiding this comment.
Can you clarify this point? I'm using this util in the TestGetSortedFeatureViewsToUseByService_MaintainsOrder test, so that the feature service is constructed with the specific projection order. That way we can ensure the TestGetSortedFeatureViewsToUseByService method returns the feature views and their references in the order we expect.
There was a problem hiding this comment.
Switching the data structure from Map to Go struct should fix the problem. Is that a right assumption?
|
|
||
| odFvsToUse := make([]*model.OnDemandFeatureView, 0) | ||
| fvsToUse := make([]*FeatureViewAndRefs, 0, len(viewNameToViewAndRefs)) | ||
| seen := make(map[string]bool) |
There was a problem hiding this comment.
seen variable is not needed.
| for name, viewAndRef := range viewNameToViewAndRefs { | ||
| if !seen[name] { | ||
| fvsToUse = append(fvsToUse, viewAndRef) | ||
| } | ||
| } |
There was a problem hiding this comment.
This code is not required.
| for _, vf := range viewFeatures { | ||
| if viewAndRef, ok := viewNameToViewAndRefs[vf.ViewName]; ok && !seen[vf.ViewName] { | ||
| fvsToUse = append(fvsToUse, viewAndRef) | ||
| seen[vf.ViewName] = true | ||
| } | ||
| } |
There was a problem hiding this comment.
Instead of constructing viewNameToViewAndRefs, directly construct fvsToUse.
|
|
||
| existed in the registry | ||
| */ | ||
| func GetSortedFeatureViewsToUseByFeatureRefs( |
There was a problem hiding this comment.
Directly construct sortedFvsToUse list instead of using map
There was a problem hiding this comment.
We discussed this offline. We're still using a map to avoid adding duplicate views.
| @@ -92,87 +102,167 @@ existed in the registry | |||
| func GetFeatureViewsToUseByService( | |||
There was a problem hiding this comment.
Revert to previous logic and fix the ordering issue which is caused by Map.
There was a problem hiding this comment.
We discussed this offline. We're still using a map to avoid adding duplicate views.
cf522e1
into
feature/range-query-improvements
* fix: Update Docker images to use official sources for integration tests (#260) * fix: Update Docker images to use official sources and improve workflow triggers * docs: Update README to include instructions for running integration tests --------- Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com> * fix: Only add sort key filter models to list if they meet the conditi… (#261) * fix: Only add sort key filter models to list if they meet the conditions. Split read range int tests into its own file. * add pauses in int test to allow for apply/materialize to compelete * set int tests to not run in parallel * fix: Adding Snake Case to Include Metadata Check (#264) * adding snake case to include metadata check * fixing formatting * added trim space to string function * fix: Move use_write_time_for_ttl config to sfv instead of an online store config (#267) * fix: Move use_write_time_for_ttl config to SortedFeatureView instead of an online store config * re-order lines in proto so indexes are in order * fix: Pack and unpack repeated list values into and out of arrow array… (#263) * fix: Pack and unpack repeated list values into and out of arrow arrays. Restructure integration tests to properly separate concerns * Throw error when requested feature service includes both view types * Re-use CopyProtoValuesToArrowArray instead of duplicating switch logic. * fix: fix misplaced reset indexes (#266) * feat: Adding Changes to Check if FV and SFV have Valid Updates (#254) * Adding update changes * fixing tests * fixing linting * addressing PR comments * fixing unit test * Added in sort order check * using feast object * removing return type from validation * Adding another exception type * Addressing PR comments to add an entity check * fixing tests * Addressing PR comments * fixing test assertion * fixing formatting * fix: Validate requested features exist in view. (#269) * fix: Validate requested features exist in view. * add test cases for invalid features in feature service * reduce time complexity and duplicate checks for feature validation * use if-else blocks * fix: Clean Up Error Messages (#274) * writing clearer error messages * addressing PR comments: fixing tests * feat: Adding Alter Table to Support Schema Updates for Cassandra (#262) * Adding update changes to alter table if fv schema changes * addressing PR comments * removing redundant lines * addressing PR comments * editing makefile to get python tests to run * fixing go integration tests * fixing formatting * fixing failing tests * changing number of workers to 1 * removing the igmore to run all python tests * adding back the ignore statement, seems to be running tests twice * reverting back Makefile * removing ignore statement * reverting back * aadding to unit_tests.yml file * aadding to unit_tests.yml file * aadding to unit_tests.yml file * MacOS tests removed due to MacOS runners not supporting docker * feat: Separate entities from features in Range Query response (#265) * feat: seperate entities into their own field in range query response * feat: seperate entities from features in http range query response * change range query response name back to results instead of features * fix http server unit test * add debugging logs * modify getOnlineFeaturesRange logic * fix grpc range query code and integration tests * fix: Javadoc errors (#270) * update the function description to fix javdoc errors. * fix: Formatting --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: dont add entities to feature vectors * fix: Formatting javadocs (#271) * update the function description to fix javdoc errors. * fix: Formatting * formatting * fix: formatting * fix linting errors * fix params --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: Formatting (#272) * update the function description to fix javdoc errors. * fix: Formatting * formatting * fix: formatting * fix linting errors * fix params * fix javadoc error --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: Formatting (#273) * update the function description to fix javdoc errors. * fix: Formatting * formatting * fix: formatting * fix linting errors * fix params * fix javadoc error * Update FeastClient.java --------- Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * feat: Add getOnlineFeature and getOnlineFeaturesRange overloaded methods wi… (#275) * feat: add getOnlineFeature and getOnlineFeaturesRange overloaded methods without project Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * update java client for separate entities in response * Update tests to reflect changes * fix processFeatureVector tests * throw exception when more rows than entity values * change based on pr feedback * pr feedback --------- Co-authored-by: omiranda <omiranda@expediagroup.com> Co-authored-by: vanitabhagwat <92561664+vanitabhagwat@users.noreply.github.com> Co-authored-by: vbhagwat <vbhagwat@expediagroup.com> * fix: Keep duplicate requested features in response for range query. (#276) * fix: Http range timestamp values should return in rfc3339 format (#277) * fix: Http range timestamp values should return in rfc3339 format to match how get features timestamps are returned * use the exact timestamp format arrow marshalling uses * fix: Exception Handling in Updates Feature (#278) * added another exception type * raising exceptions in the http methods instead * calling handle_exception method instead of a raise * Added a TODO comment * fix: Timestamps should be seconds percision. Return null status for found null values. (#279) * fix: Timestamps should be seconds percision. Return null status for found null values. * convert UnixTimestamp sort filters as time.Time * use consistent time for type conversion test * fix materializing timestamps as seconds and java converting time values into epoch seconds * fix linting * fix test * fix test * fix: Go server errors should be status errors with proper status codes. (#281) * fix: Go server errors should be status errors with proper status codes. FeastClient handles status exceptions as differentiated FeastExceptions. * fix and expand tests * add more test cases * fix: Validation order for multiple missing end keys should pass (#283) * fix: Validation order for multiple missing end keys should pass * fix int test and create test for only an equals filter * add conversions of int32 to higher values for http * fix: Handle null values in Arrow list conversion to Proto values (#280) * fix: Handle null values in Arrow list conversion to Proto values * fix: Fixed the nil representation in http * fix: Correct variable name for null check in ArrowListToProtoList function * fix: Null value handling for types.Values * fix: Fixed the nil representation in http * fix: Fixed the issue with NOT_FOUND values representation and repeated null values * fix: Fixed the nil representation * fix: Fixed the nil representation in http * fix: Removed duplicate test * fix: Added status handling for feature values and ensured consistency in value-status mapping * fix: Fixed issue with timestamps in HTTP * fix: Fixed failing tests * fix: Simplified null value handling in type conversion --------- Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com> * fix: When given a string for a bytes entity/sort key filter, attempt to convert as base64 string (#285) * fix: Improve Arrow to proto conversion to handle null and empty arrays (#286) Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com> * feat: Add versioning information to the server and expose it via a ne… (#288) * feat: Add versioning information to the server and expose it via a new endpoint 1. Version information is exposed via endpoints 2. It published information to datadog as well 3. Version information is printed on pod startup * fix: fixed test failures * fix: fixed test failures * fix: added comment * fix: update GetVersionInfo method and related proto definitions * fix: set server type in version information for HTTP, gRPC, and hybrid servers * fix: refactor Datadog version info publishing to a separate function * fix: improve error handling for statsd client flush and close operations --------- Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com> * fix: Add http int tests and fix http error codes (#287) * fix: Update Feature View Not Catching SortedFeatureView/FeatureView Errors (#291) * adding a check to see if HTTPStatusError is being raised within _handle_exception and if it is, to return the same. * addressing feedback * feat: Only call relevant registry endpoint when getting feature views… (#289) * feat: Only call relevant registry endpoint when getting feature views by feature refs or service * update reference to old method signatures * fix integration tests * dedupe feature refs like methods used to * deleting test that tries to reject fvs passed to rq, no longer doing this check * small changes based on PR feedback * fix test * maintain order of fvs and features * define sorted feature service for integration tests * reuse regular fv reponse json for sorted fv integration test * add sort key to expected features in integration test * reorder feature schema to match order in expected response * add unit tests for GetSortedFeatureView methods * add util methods needed to construct fs with ordering * changes based on pr feedback * fix response json order * reorder request jsoons * reuse common feature variables * fix other json objects * refactor old code to use list instead of map * add odfvs to fvsToUse more explicitly * refactor again --------- Co-authored-by: omiranda <omiranda@expediagroup.com> * feat: Streaming ingestion latency improvements (#292) * feat: Streaming ingestion latency improvements * feat: remove blank lines * refactor process method * feat: leaving 1 core is enough for multiprocessing --------- Co-authored-by: Bhargav Dodla <13788369+EXPEbdodla@users.noreply.github.com> Co-authored-by: Bhargav Dodla <bdodla@expediagroup.com> Co-authored-by: Manisha Sudhir <30449541+Manisha4@users.noreply.github.com> Co-authored-by: kpulipati29 <kpulipati29@gmail.com> Co-authored-by: omirandadev <136642003+omirandadev@users.noreply.github.com> Co-authored-by: omiranda <omiranda@expediagroup.com> Co-authored-by: vanitabhagwat <92561664+vanitabhagwat@users.noreply.github.com> Co-authored-by: vbhagwat <vbhagwat@expediagroup.com>
What this PR does / why we need it:
This PR adds a new method GetSortedFeatureViewsToUseByFeatureRefs to the onlineserving package. This method is called within the GetOnlineFeaturesRange method of the Go Feature Store. Adding this new method is needed because previously the GetOnlineFeatures and GetOnlineFeaturesRange methods called the same GetFeatureViewsToUseByFeatureRefs helper method. The issue with this was that that method first attempted to find the feature views referenced in the feature references by calling the registry's method GetFeatureView and if that failed it called GetSortedFeatureView. This guaranteed at least one additional unnecessary call to the registry that is guaranteed to return an error when /get-online-features-range endpoint is called. This PR ensures that the GetOnlineFeatures method only attempts to find features views by calling the GetFeatureViews registry endpoint and the GetOnlineFeaturesRange method only attempts to find the sorted feature views by the GetSortedFeatureViews method.
Long term, we may need to find an analogous solution for On Demand Feature Views, which currently also are called only after the GetFeatureView method fails within the GetFeatureViewsToUseByFeatureRefs method.
Additionally, this PR groups the (sorted) feature views referenced in the provided feature references, so that only 1 registry call is made per referenced feature view, as opposed to making N calls to the registry where N is the number of feature references passed.
Which issue(s) this PR fixes:
Misc