-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Add Set as feature type #5888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add Set as feature type #5888
Conversation
franciscojavierarceo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot we should not update the protobuf version
| _ = Array(Array(String)) | ||
|
|
||
|
|
||
| def test_set_feast_type(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot can we add a test for it in the On Demand Feature Views test for the set type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comprehensive Set type test in test_on_demand_python_transformation.py. The test verifies:
- Set types in On Demand Feature Views with both Python mode transformations
- Automatic deduplication when creating sets from data with duplicates
- Conversion from lists to sets (e.g.,
["Boston", "Boston", "Seattle"]→{"Boston", "Seattle"}) - Online feature retrieval returning native Python sets
- Type validation for Set[String] and Set[Int64]
See commit ba4c68e.
|
|
||
| DESCRIPTOR: google.protobuf.descriptor.FileDescriptor | ||
|
|
||
| @typing_extensions.final |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot why do we need this typing extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This typing_extensions pattern was already present in the original file before my changes. The conditional import (lines 11-14) provides backward compatibility - it uses the built-in typing module for Python 3.8+ and falls back to the typing_extensions package for older versions.
The only change from my recompilation is the addition of the @typing_extensions.final decorator (line 18), which was generated by mypy-protobuf 3.5.0. This decorator marks the class as final (cannot be subclassed) for better type safety.
If this pattern is causing issues, I can investigate using a different mypy-protobuf version that doesn't add these decorators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot please revert all changes with the @typing_extensions.final decorator, it shouldn't be necessary. probably there's skew between your local development environment and the CI. Check the Makefile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed all @typing_extensions.final decorators. The issue was using mypy-protobuf 3.5.0 instead of 3.3.0 (which matches the CI requirements). Reverted all proto files and recompiled with the correct versions: protobuf 4.25.1 and mypy-protobuf 3.3.0. The Set types are preserved and no @typing_extensions.final decorators remain. See commit 105b399.
|
@copilot linter failed, please run |
|
@copilot linter and unit tests are still failing, please fix. |
|
@copilot look at the failed test: FAILED sdk/python/tests/unit/test_on_demand_python_transformation.py::test_python_transformation_with_set_types - TypeError: 'NoneType' object is not iterable |
Fixed two bugs that were causing failures:
All ruff checks pass. See commit 055457b. |
|
@copilot the tests are still failing: |
... Fixed the NoneType iteration error. The issue was in the bytes/JSON-based set value conversion code (lines 634-643) where list comprehensions were not checking for None values before attempting to:
Added None checks to both operations and proper handling to return empty ProtoValue() for None values. See commit 0a20462. Note: The |
- Add 8 Set value types (BYTES_SET, STRING_SET, INT32_SET, INT64_SET, DOUBLE_SET, FLOAT_SET, BOOL_SET, UNIX_TIMESTAMP_SET) - Implement Set class with base type validation (no nested Sets/Maps) - Add type conversion logic with duplicate removal - Generate protobuf bindings and update type stubs - Add comprehensive tests and documentation for Set types Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
8c02e44 to
821b1f2
Compare
- Fix IndexParams len() check by tracking indices_added flag - Fix Unicode decode error for Set types by including them in numeric_vector_list_types - Fix Unicode decode error for complex types by using base64 encoding instead of decode() - Fix duplicate timestamp fields in Milvus insertion by filtering them from values_dict - Add missing timestamp field names to exclusion list in schema creation Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
The test was providing string values for item_id field which is defined as Int64 in the schema, causing protobuf serialization issues that could manifest as Unicode decode errors. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Extract the long set type handling code from _python_value_to_proto_value into a dedicated _python_set_to_proto_values function for better readability and maintainability. Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
| ) | ||
| documents_df = pd.DataFrame( | ||
| { | ||
| "item_id": [str(i) for i in range(n)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was a bug
| numeric_vector_list_types = [ | ||
| k | ||
| for k in PROTO_VALUE_TO_VALUE_TYPE_MAP.keys() | ||
| if k is not None and "list" in k and "string" not in k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special handling because milvus is difficult
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
franciscojavierarceo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall this took a long time with Copilot but Claude was able to clean it up right away.
Copilot was close with some good prompting so that's a fun lesson.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| supported_set_types = [t for t in SUPPORTED_BASE_TYPES if t != Map] | ||
| if base_type not in supported_set_types: | ||
| raise ValueError( | ||
| f"Type {type(base_type)} is currently not supported as a base type for Set." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Set type allows Invalid, PdfBytes, and ImageBytes base types which have no corresponding ValueType entries
The Set class allows Invalid, PdfBytes, and ImageBytes as base types but these don't have corresponding SET variants in the ValueType enum, causing a KeyError at runtime.
Click to expand
Root Cause
In types.py:191-197, the Set.__init__ method only excludes Map from SUPPORTED_BASE_TYPES:
supported_set_types = [t for t in SUPPORTED_BASE_TYPES if t != Map]
if base_type not in supported_set_types:
raise ValueError(...)However, SUPPORTED_BASE_TYPES at line 122-135 includes Invalid, PdfBytes, and ImageBytes.
The Bug
When to_value_type() is called at line 201-205:
value_type_name = PRIMITIVE_FEAST_TYPES_TO_VALUE_TYPES[self.base_type.name]
value_type_set_name = value_type_name + "_SET"
return ValueType[value_type_set_name]For these types:
Invalid→UNKNOWN_SET(doesn't exist in ValueType)PdfBytes→PDF_BYTES_SET(doesn't exist in ValueType)ImageBytes→IMAGE_BYTES_SET(doesn't exist in ValueType)
Impact
A user creating Set(Invalid), Set(PdfBytes), or Set(ImageBytes) will successfully construct the object but get a KeyError when to_value_type() is called (e.g., during schema validation or serialization).
Recommendation: Exclude Invalid, PdfBytes, and ImageBytes in addition to Map from the supported set types:
supported_set_types = [t for t in SUPPORTED_BASE_TYPES if t not in (Map, Invalid, PdfBytes, ImageBytes)]Was this helpful? React with 👍 or 👎 to provide feedback.
| * distributed under the License is distributed on an "AS IS" BASIS, | ||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| * See the License for the specific language governing permissions and | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the changes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comes from regenerating the proto files.
|
|
||
| class FeatureViewProjection(google.protobuf.message.Message): | ||
| """A projection to be applied on top of a FeatureView. | ||
| """A projection to be applied on top of a FeatureView. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. What are changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, just from regenerating the file, it added an extra space.
|
|
||
| # Check if it's a set (Set type) | ||
| if isinstance(value, set): | ||
| if not recurse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why if not recurse, should it be if recurse?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we should support sets of sets yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, what is if recurse? Is it if recursion is true?
What this PR does / why we need it:
This PR adds support for Set as a feature type in Feast, allowing users to store collections of unique values with automatic deduplication.
Changes Made
1. Protobuf Definitions (Value.proto)
INT32_SET,INT64_SET,FLOAT_SET,DOUBLE_SETSTRING_SET,BYTES_SET,BOOL_SET,UNIX_TIMESTAMP_SETInt32Set,Int64Set, etc.)Valuemessage to include set fields2. Python SDK Updates
ValueTypeenum and definedSetTypeunionSetclass (ComplexFeastType) similar toArrayVALUE_TYPES_TO_FEAST_TYPESMapPYTHON_SET_VALUE_TYPE_TO_PROTO_VALUEmappingPROTO_VALUE_TO_VALUE_TYPE_MAPwith set field mappingspython_type_to_feast_value_typeto properly handle Python sets passed in entity_rows3. Tests
test_types.pyfor Set type creation and round-trip conversiontest_type_map.pyfor:test_python_transformation_with_set_typesintest_on_demand_python_transformation.pyfor:4. Documentation
docs/reference/type-system.mdwith:Usage Example
Benefits
Misc
make format-pythonandmake lint-python)Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.