-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat: Enable transformations on PDFs #5172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…t unique chunk-id Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…ieval Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
| ) -> dict[str, Union[list[Any], Any]]: | ||
| rand_dict_value: dict[ValueType, Union[list[Any], Any]] = { | ||
| ValueType.BYTES: [str.encode("hello world")], | ||
| ValueType.PDF_BYTES: [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to have a new type, maybe just use BYTES?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, running type inference on raw bytes will fail when using a transformation expecting PDF content.
HaoXuAI
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, just not sure if PDF_BYTES is a primitive value or not.
|
We need it for validation of ODFV. Using raw bytes fails when you test with PDF otherwise. |
sounds good |
# [0.48.0](v0.47.0...v0.48.0) (2025-04-07) ### Bug Fixes * Enhance integration logos display and styling in the UI ([#5221](#5221)) ([5799257](5799257)) * Fix space typo in push.md docs ([#5184](#5184)) ([81677b2](81677b2)) * Fixed integration tests for qdrant and milvus ([#5224](#5224)) ([d6b080d](d6b080d)) * Formatting trino ([760ec0e](760ec0e)) * Multiple fixes in retrieval of online documents ([#5168](#5168)) ([66ddd3e](66ddd3e)) * Operator route creation for Feast UI in OpenShift ([e3946b4](e3946b4)) * Remove entity_rows parameter from retrieve_online_documents_v2 call ([#5225](#5225)) ([2a2e304](2a2e304)) * Styling ([#5222](#5222)) ([34c393c](34c393c)) * typo in the chart ([bd3448b](bd3448b)) * Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([#5200](#5200)) ([306acca](306acca)) * Update Qdrant online store paths in repo_config.py ([#5207](#5207)) ([ab35b0b](ab35b0b)), closes [#5206](#5206) * Update the doc ([#5194](#5194)) ([726464e](726464e)) * Updated the operator-rabc example to test RBAC from a Kubernete pod ([#5147](#5147)) ([d23a1a5](d23a1a5)) ### Features * add `real`(float32) type for trino offline store ([#4749](#4749)) ([0947f96](0947f96)) * Add async DynamoDB timeout and retry configuration ([#5178](#5178)) ([2f3bcf5](2f3bcf5)) * Add CronJob capability to the Operator (feast apply & materialize-incremental) ([#5217](#5217)) ([285c0dc](285c0dc)) * Add RAG tutorial and Use Cases documentation ([#5226](#5226)) ([99f4004](99f4004)) * Added CLI for features, get historical and online features ([#5197](#5197)) ([4ab9f74](4ab9f74)) * Added export support in feast UI ([#5198](#5198)) ([b079553](b079553)) * Added global registry search support in Feast UI ([#5195](#5195)) ([f09ea49](f09ea49)) * Added UI for Features list ([#5192](#5192)) ([cc7fd47](cc7fd47)) * Adding blog on RAG with Milvus ([#5161](#5161)) ([b9e2e6c](b9e2e6c)) * Adding Docling RAG demo ([#5109](#5109)) ([569404b](569404b)) * Allow transformations on writes to output list of entities ([#5209](#5209)) ([955521a](955521a)) * Cache get_any_feature_view results ([#5175](#5175)) ([924b8a3](924b8a3)) * Clickhouse offline store ([#4725](#4725)) ([86794c2](86794c2)) * Enable keyword search for Milvus ([#5199](#5199)) ([ac44967](ac44967)) * Enable transformations on PDFs ([#5172](#5172)) ([3674971](3674971)) * Enable users to use Entity Query as CTE during historical retrieval ([#5202](#5202)) ([fe69eaf](fe69eaf)) * helm support more deployment config ([d575372](d575372)) * Improved CLI file structuring ([#5201](#5201)) ([972ed34](972ed34)) * Kickoff Transformation implementationtransformation code base ([#5181](#5181)) ([0083303](0083303)) * Make keep-alive timeout configurable for async DynamoDB connections ([#5167](#5167)) ([7f3e528](7f3e528)) * Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](d4d7b0d)) * Spark Transformation ([#5185](#5185)) ([be3d85c](be3d85c))
Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
# [0.48.0](feast-dev/feast@v0.47.0...v0.48.0) (2025-04-07) ### Bug Fixes * Enhance integration logos display and styling in the UI ([feast-dev#5221](feast-dev#5221)) ([5799257](feast-dev@5799257)) * Fix space typo in push.md docs ([feast-dev#5184](feast-dev#5184)) ([81677b2](feast-dev@81677b2)) * Fixed integration tests for qdrant and milvus ([feast-dev#5224](feast-dev#5224)) ([d6b080d](feast-dev@d6b080d)) * Formatting trino ([760ec0e](feast-dev@760ec0e)) * Multiple fixes in retrieval of online documents ([feast-dev#5168](feast-dev#5168)) ([66ddd3e](feast-dev@66ddd3e)) * Operator route creation for Feast UI in OpenShift ([e3946b4](feast-dev@e3946b4)) * Remove entity_rows parameter from retrieve_online_documents_v2 call ([feast-dev#5225](feast-dev#5225)) ([2a2e304](feast-dev@2a2e304)) * Styling ([feast-dev#5222](feast-dev#5222)) ([34c393c](feast-dev@34c393c)) * typo in the chart ([bd3448b](feast-dev@bd3448b)) * Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([feast-dev#5200](feast-dev#5200)) ([306acca](feast-dev@306acca)) * Update Qdrant online store paths in repo_config.py ([feast-dev#5207](feast-dev#5207)) ([ab35b0b](feast-dev@ab35b0b)), closes [feast-dev#5206](feast-dev#5206) * Update the doc ([feast-dev#5194](feast-dev#5194)) ([726464e](feast-dev@726464e)) * Updated the operator-rabc example to test RBAC from a Kubernete pod ([feast-dev#5147](feast-dev#5147)) ([d23a1a5](feast-dev@d23a1a5)) ### Features * add `real`(float32) type for trino offline store ([feast-dev#4749](feast-dev#4749)) ([0947f96](feast-dev@0947f96)) * Add async DynamoDB timeout and retry configuration ([feast-dev#5178](feast-dev#5178)) ([2f3bcf5](feast-dev@2f3bcf5)) * Add CronJob capability to the Operator (feast apply & materialize-incremental) ([feast-dev#5217](feast-dev#5217)) ([285c0dc](feast-dev@285c0dc)) * Add RAG tutorial and Use Cases documentation ([feast-dev#5226](feast-dev#5226)) ([99f4004](feast-dev@99f4004)) * Added CLI for features, get historical and online features ([feast-dev#5197](feast-dev#5197)) ([4ab9f74](feast-dev@4ab9f74)) * Added export support in feast UI ([feast-dev#5198](feast-dev#5198)) ([b079553](feast-dev@b079553)) * Added global registry search support in Feast UI ([feast-dev#5195](feast-dev#5195)) ([f09ea49](feast-dev@f09ea49)) * Added UI for Features list ([feast-dev#5192](feast-dev#5192)) ([cc7fd47](feast-dev@cc7fd47)) * Adding blog on RAG with Milvus ([feast-dev#5161](feast-dev#5161)) ([b9e2e6c](feast-dev@b9e2e6c)) * Adding Docling RAG demo ([feast-dev#5109](feast-dev#5109)) ([569404b](feast-dev@569404b)) * Allow transformations on writes to output list of entities ([feast-dev#5209](feast-dev#5209)) ([955521a](feast-dev@955521a)) * Cache get_any_feature_view results ([feast-dev#5175](feast-dev#5175)) ([924b8a3](feast-dev@924b8a3)) * Clickhouse offline store ([feast-dev#4725](feast-dev#4725)) ([86794c2](feast-dev@86794c2)) * Enable keyword search for Milvus ([feast-dev#5199](feast-dev#5199)) ([ac44967](feast-dev@ac44967)) * Enable transformations on PDFs ([feast-dev#5172](feast-dev#5172)) ([3674971](feast-dev@3674971)) * Enable users to use Entity Query as CTE during historical retrieval ([feast-dev#5202](feast-dev#5202)) ([fe69eaf](feast-dev@fe69eaf)) * helm support more deployment config ([d575372](feast-dev@d575372)) * Improved CLI file structuring ([feast-dev#5201](feast-dev#5201)) ([972ed34](feast-dev@972ed34)) * Kickoff Transformation implementationtransformation code base ([feast-dev#5181](feast-dev#5181)) ([0083303](feast-dev@0083303)) * Make keep-alive timeout configurable for async DynamoDB connections ([feast-dev#5167](feast-dev#5167)) ([7f3e528](feast-dev@7f3e528)) * Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](feast-dev@d4d7b0d)) * Spark Transformation ([feast-dev#5185](feast-dev#5185)) ([be3d85c](feast-dev@be3d85c)) Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
What this PR does / why we need it:
This PR updates feature transformations, online store handling, and support for PDFs during transformation.
sdk/python/feast/infra/online_stores/milvus_online_store/milvus.pyonline_readto include a new condition for updating thefeature_name_feast_primitive_type_mapbased on thetable.schema.string_valin the list of value types.sdk/python/feast/feature_store.py_get_feature_view_and_df_for_online_writeto handle feature view transformations differently for singleton and non-singleton cases.sdk/python/feast/on_demand_feature_view.pyPDF_BYTESin the random input construction.sdk/python/feast/transformation/python_transformation.pysdk/python/feast/types.pyPDF_BYTESto the primitive feast types and updated related mappings and enums.sdk/python/feast/value_type.pyPDF_BYTESto theValueTypeenum.sdk/python/tests/unit/online_store/test_online_retrieval.pytest_milvus_stored_writes_with_explodeto validate storing and retrieving exploded document embeddings with Milvus online store.test_milvus_lite_get_online_documents_v2totest_milvus_lite_retrieve_online_documents_v2.sdk/python/tests/unit/test_on_demand_python_transformation.pyWhich issue(s) this PR fixes:
#5173
Misc:
N/A