Skip to content

Conversation

@franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Mar 21, 2025

What this PR does / why we need it:

This PR updates feature transformations, online store handling, and support for PDFs during transformation.

  • sdk/python/feast/infra/online_stores/milvus_online_store/milvus.py

    • Updated online_read to include a new condition for updating the feature_name_feast_primitive_type_map based on the table.schema.
    • Added support for string_val in the list of value types.
    • Modified error message to correctly reflect the unsupported value type and feature.
  • sdk/python/feast/feature_store.py

    • Enhanced _get_feature_view_and_df_for_online_write to handle feature view transformations differently for singleton and non-singleton cases.
    • Improved handling of transformed data and input data frame creation.
  • sdk/python/feast/on_demand_feature_view.py

    • Added support for PDF_BYTES in the random input construction.
  • sdk/python/feast/transformation/python_transformation.py

    • Improved type inference for nested lists and singleton feature views.
  • sdk/python/feast/types.py

    • Introduced PDF_BYTES to the primitive feast types and updated related mappings and enums.
  • sdk/python/feast/value_type.py

    • Added PDF_BYTES to the ValueType enum.
  • sdk/python/tests/unit/online_store/test_online_retrieval.py

    • Added a new test test_milvus_stored_writes_with_explode to validate storing and retrieving exploded document embeddings with Milvus online store.
    • Renamed test_milvus_lite_get_online_documents_v2 to test_milvus_lite_retrieve_online_documents_v2.
  • sdk/python/tests/unit/test_on_demand_python_transformation.py

    • Added platform and version-specific skip condition for tests.
    • Introduced new tests for PDF handling and document transformation with chunking and embedding.

Which issue(s) this PR fixes:

#5173

Misc:

N/A

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…t unique chunk-id

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…ieval

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@franciscojavierarceo franciscojavierarceo changed the title Enable pdf transforms feat: Enable transformations on PDFs Mar 21, 2025
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review March 21, 2025 02:54
@franciscojavierarceo franciscojavierarceo requested a review from a team as a code owner March 21, 2025 02:54
) -> dict[str, Union[list[Any], Any]]:
rand_dict_value: dict[ValueType, Union[list[Any], Any]] = {
ValueType.BYTES: [str.encode("hello world")],
ValueType.PDF_BYTES: [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to have a new type, maybe just use BYTES?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, running type inference on raw bytes will fail when using a transformation expecting PDF content.

Copy link
Collaborator

@HaoXuAI HaoXuAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, just not sure if PDF_BYTES is a primitive value or not.

@franciscojavierarceo
Copy link
Member Author

We need it for validation of ODFV. Using raw bytes fails when you test with PDF otherwise.

@HaoXuAI
Copy link
Collaborator

HaoXuAI commented Mar 21, 2025

We need it for validation of ODFV. Using raw bytes fails when you test with PDF otherwise.

sounds good

@franciscojavierarceo franciscojavierarceo merged commit 3674971 into master Mar 21, 2025
36 of 37 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Apr 7, 2025
# [0.48.0](v0.47.0...v0.48.0) (2025-04-07)

### Bug Fixes

* Enhance integration logos display and styling in the UI ([#5221](#5221)) ([5799257](5799257))
* Fix space typo in push.md docs ([#5184](#5184)) ([81677b2](81677b2))
* Fixed integration tests for qdrant and milvus ([#5224](#5224)) ([d6b080d](d6b080d))
* Formatting trino ([760ec0e](760ec0e))
* Multiple fixes in retrieval of online documents ([#5168](#5168)) ([66ddd3e](66ddd3e))
* Operator route creation for Feast UI in OpenShift ([e3946b4](e3946b4))
* Remove entity_rows parameter from retrieve_online_documents_v2 call ([#5225](#5225)) ([2a2e304](2a2e304))
* Styling ([#5222](#5222)) ([34c393c](34c393c))
* typo in the chart ([bd3448b](bd3448b))
* Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([#5200](#5200)) ([306acca](306acca))
* Update Qdrant online store paths in repo_config.py ([#5207](#5207)) ([ab35b0b](ab35b0b)), closes [#5206](#5206)
* Update the doc ([#5194](#5194)) ([726464e](726464e))
* Updated the operator-rabc example to test RBAC from a Kubernete pod ([#5147](#5147)) ([d23a1a5](d23a1a5))

### Features

* add `real`(float32) type for trino offline store ([#4749](#4749)) ([0947f96](0947f96))
* Add async DynamoDB timeout and retry configuration ([#5178](#5178)) ([2f3bcf5](2f3bcf5))
* Add CronJob capability to the Operator (feast apply & materialize-incremental) ([#5217](#5217)) ([285c0dc](285c0dc))
* Add RAG tutorial and Use Cases documentation ([#5226](#5226)) ([99f4004](99f4004))
* Added CLI for features, get historical and online features ([#5197](#5197)) ([4ab9f74](4ab9f74))
* Added export support in feast UI ([#5198](#5198)) ([b079553](b079553))
* Added global registry search support in Feast UI ([#5195](#5195)) ([f09ea49](f09ea49))
* Added UI for Features list ([#5192](#5192)) ([cc7fd47](cc7fd47))
* Adding blog on RAG with Milvus ([#5161](#5161)) ([b9e2e6c](b9e2e6c))
* Adding Docling RAG demo ([#5109](#5109)) ([569404b](569404b))
* Allow transformations on writes to output list of entities ([#5209](#5209)) ([955521a](955521a))
* Cache get_any_feature_view results ([#5175](#5175)) ([924b8a3](924b8a3))
* Clickhouse offline store ([#4725](#4725)) ([86794c2](86794c2))
* Enable keyword search for Milvus ([#5199](#5199)) ([ac44967](ac44967))
* Enable transformations on PDFs ([#5172](#5172)) ([3674971](3674971))
* Enable users to use Entity Query as CTE during historical retrieval ([#5202](#5202)) ([fe69eaf](fe69eaf))
* helm support more deployment config ([d575372](d575372))
* Improved CLI file structuring ([#5201](#5201)) ([972ed34](972ed34))
* Kickoff Transformation implementationtransformation code base ([#5181](#5181)) ([0083303](0083303))
* Make keep-alive timeout configurable for async DynamoDB connections ([#5167](#5167)) ([7f3e528](7f3e528))
* Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](d4d7b0d))
* Spark Transformation ([#5185](#5185)) ([be3d85c](be3d85c))
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Jun 7, 2025
Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Jun 7, 2025
# [0.48.0](feast-dev/feast@v0.47.0...v0.48.0) (2025-04-07)

### Bug Fixes

* Enhance integration logos display and styling in the UI ([feast-dev#5221](feast-dev#5221)) ([5799257](feast-dev@5799257))
* Fix space typo in push.md docs ([feast-dev#5184](feast-dev#5184)) ([81677b2](feast-dev@81677b2))
* Fixed integration tests for qdrant and milvus ([feast-dev#5224](feast-dev#5224)) ([d6b080d](feast-dev@d6b080d))
* Formatting trino ([760ec0e](feast-dev@760ec0e))
* Multiple fixes in retrieval of online documents ([feast-dev#5168](feast-dev#5168)) ([66ddd3e](feast-dev@66ddd3e))
* Operator route creation for Feast UI in OpenShift ([e3946b4](feast-dev@e3946b4))
* Remove entity_rows parameter from retrieve_online_documents_v2 call ([feast-dev#5225](feast-dev#5225)) ([2a2e304](feast-dev@2a2e304))
* Styling ([feast-dev#5222](feast-dev#5222)) ([34c393c](feast-dev@34c393c))
* typo in the chart ([bd3448b](feast-dev@bd3448b))
* Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([feast-dev#5200](feast-dev#5200)) ([306acca](feast-dev@306acca))
* Update Qdrant online store paths in repo_config.py ([feast-dev#5207](feast-dev#5207)) ([ab35b0b](feast-dev@ab35b0b)), closes [feast-dev#5206](feast-dev#5206)
* Update the doc ([feast-dev#5194](feast-dev#5194)) ([726464e](feast-dev@726464e))
* Updated the operator-rabc example to test RBAC from a Kubernete pod ([feast-dev#5147](feast-dev#5147)) ([d23a1a5](feast-dev@d23a1a5))

### Features

* add `real`(float32) type for trino offline store ([feast-dev#4749](feast-dev#4749)) ([0947f96](feast-dev@0947f96))
* Add async DynamoDB timeout and retry configuration ([feast-dev#5178](feast-dev#5178)) ([2f3bcf5](feast-dev@2f3bcf5))
* Add CronJob capability to the Operator (feast apply & materialize-incremental) ([feast-dev#5217](feast-dev#5217)) ([285c0dc](feast-dev@285c0dc))
* Add RAG tutorial and Use Cases documentation ([feast-dev#5226](feast-dev#5226)) ([99f4004](feast-dev@99f4004))
* Added CLI for features, get historical and online features ([feast-dev#5197](feast-dev#5197)) ([4ab9f74](feast-dev@4ab9f74))
* Added export support in feast UI ([feast-dev#5198](feast-dev#5198)) ([b079553](feast-dev@b079553))
* Added global registry search support in Feast UI ([feast-dev#5195](feast-dev#5195)) ([f09ea49](feast-dev@f09ea49))
* Added UI for Features list ([feast-dev#5192](feast-dev#5192)) ([cc7fd47](feast-dev@cc7fd47))
* Adding blog on RAG with Milvus ([feast-dev#5161](feast-dev#5161)) ([b9e2e6c](feast-dev@b9e2e6c))
* Adding Docling RAG demo ([feast-dev#5109](feast-dev#5109)) ([569404b](feast-dev@569404b))
* Allow transformations on writes to output list of entities ([feast-dev#5209](feast-dev#5209)) ([955521a](feast-dev@955521a))
* Cache get_any_feature_view results ([feast-dev#5175](feast-dev#5175)) ([924b8a3](feast-dev@924b8a3))
* Clickhouse offline store ([feast-dev#4725](feast-dev#4725)) ([86794c2](feast-dev@86794c2))
* Enable keyword search for Milvus ([feast-dev#5199](feast-dev#5199)) ([ac44967](feast-dev@ac44967))
* Enable transformations on PDFs ([feast-dev#5172](feast-dev#5172)) ([3674971](feast-dev@3674971))
* Enable users to use Entity Query as CTE during historical retrieval ([feast-dev#5202](feast-dev#5202)) ([fe69eaf](feast-dev@fe69eaf))
* helm support more deployment config ([d575372](feast-dev@d575372))
* Improved CLI file structuring ([feast-dev#5201](feast-dev#5201)) ([972ed34](feast-dev@972ed34))
* Kickoff Transformation implementationtransformation code base ([feast-dev#5181](feast-dev#5181)) ([0083303](feast-dev@0083303))
* Make keep-alive timeout configurable for async DynamoDB connections ([feast-dev#5167](feast-dev#5167)) ([7f3e528](feast-dev@7f3e528))
* Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](feast-dev@d4d7b0d))
* Spark Transformation ([feast-dev#5185](feast-dev#5185)) ([be3d85c](feast-dev@be3d85c))

Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants