Skip to content

Conversation

@ntkathole
Copy link
Member

@ntkathole ntkathole commented Aug 22, 2025

What this PR does / why we need it:

Added support for image search combining text and image queries and fixed Milvus binary data handling with base64 encoding.

Users can now define Field(name="image_bytes", dtype=ImageBytes)

This PR also fixes the vector field inconsistency issue on online_write_batch with milvus.

  • IMAGE_BYTES Type Support
sdk/python/feast/value_type.py: Added IMAGE_BYTES = 21
sdk/python/feast/types.py: Added ImageBytes type and mappings
sdk/python/feast/on_demand_feature_view.py: Added sample JPEG data for testing
  • Image Search Functionality
sdk/python/feast/feature_store.py: Enhanced retrieve_online_documents_v2 with image search parameters
sdk/python/feast/vector_store.py: Added query_image_bytes support
sdk/python/feast/image_utils.py: New file with ImageFeatureExtractor and combine_embeddings

Which issue(s) this PR fixes:

Fixes #5372 and Fixes #5551

@ntkathole ntkathole self-assigned this Aug 22, 2025
@ntkathole ntkathole requested a review from a team as a code owner August 22, 2025 17:47
@ntkathole ntkathole force-pushed the image_support branch 2 times, most recently from 5ca64fc to e1f19f2 Compare August 22, 2025 18:14
query: Optional[List[float]] = None,
query_string: Optional[str] = None,
distance_metric: Optional[str] = "L2",
query_image_bytes: Optional[bytes] = None,
Copy link
Member

@franciscojavierarceo franciscojavierarceo Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the right way to do this? See I would have updated the proto values to support ImageBytes like we support PDFBytes and then just queried embedding the standard way.

You can then enrich the image vector embeddings with text and semantic embeddings and then allow a composite search of both. In that sense, you still would use 1 query but searching across multiple vectors.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm realizing my mistake here. ImageBytes is required for retrieval, what you here is appropriate for image search when the image is passed in for the query.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added IMAGE_BYTES as well 👍

@ntkathole ntkathole force-pushed the image_support branch 2 times, most recently from b88d3a6 to e252cdf Compare August 27, 2025 05:13

for output in outputs:
feature_vector = output.numpy()
normalized = normalize(feature_vector.reshape(1, -1), norm="l2")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel like l2 should be configurable but probably not needed yet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, most of the usecase get covered via l2, we can expand as needed

else:
raise ValueError(
f"Unknown combination strategy: {strategy}. "
f"Supported strategies: weighted_sum, concatenate, average"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit, technically you should just make the strategy an enum so the valueerror can be maintained easier

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small nits but this lgtm and i'd like to include it in the next release as this is a killer new feature. great work as always @ntkathole!

Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
@franciscojavierarceo franciscojavierarceo merged commit 56c5910 into feast-dev:master Sep 7, 2025
16 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Sep 30, 2025
# [0.54.0](v0.53.0...v0.54.0) (2025-09-30)

### Bug Fixes

* Column quoting in query of `PostgreSQLOfflineStore.pull_all_from_table_or_query` ([#5621](#5621)) ([e8eae71](e8eae71))
* Correct column list polars materialization engine ([#5595](#5595)) ([39aeb0c](39aeb0c))
* Fix Go feature server entitykey serialization for version 3 ([#5622](#5622)) ([5ab18a6](5ab18a6))
* Fix hostname resolution for spark tests ([#5610](#5610)) ([8f0e22d](8f0e22d))
* Fixed filtering based on data_source for ODFVs ([#5593](#5593)) ([c3e6c56](c3e6c56))
* Fixed project_description to set in registry and UI ([#5602](#5602)) ([02c3006](02c3006))
* Fixed Registry Cache Refresh Issues ([#5604](#5604)) ([3c7a022](3c7a022))
* Fixed tls issue when running both grpc and rest servers ([#5617](#5617)) ([51c16b1](51c16b1))
* Fixed transaction handling with SQLite registry ([#5588](#5588)) ([0052754](0052754))
* Update the deprecated functions in Go feature server. ([#5632](#5632)) ([a24e06e](a24e06e))
* Updated python packages conflicting with kserve dependencies ([#5580](#5580)) ([d56baf4](d56baf4))

### Features

* Add 'featureView' in global search api result for features. ([#5626](#5626)) ([76590bf](76590bf))
* Add aggregation in OnDemandFeatureView ([#5629](#5629)) ([8715ae8](8715ae8))
* Added codeflare-sdk to requirements ([#5640](#5640)) ([51a0ee6](51a0ee6))
* Added RemoteDatasetProxy that executes Ray Data operations remotely ([7128024](7128024))
* Added support for image search ([#5577](#5577)) ([56c5910](56c5910))
* Enable ingestion without event timestamp ([#5625](#5625)) ([eb51f00](eb51f00))
* Feast dataframe phase1 ([#5611](#5611)) ([2ce4198](2ce4198))
* Feast dataframe phase2 ([#5612](#5612)) ([1d08786](1d08786))
* Feast Namespaces registry for client ConfigMaps availability ([#5599](#5599)) ([728589a](728589a))
* Support hdfs:// uris in to_remote_storage for Spark offline store ([#5635](#5635)) ([5e4b9fd](5e4b9fd))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Milvus Online Store Dimension Mismatch Error in Push API and Materialization Support Image Search

2 participants