Skip to content

Conversation

@franciscojavierarceo
Copy link
Member

@franciscojavierarceo franciscojavierarceo commented Mar 1, 2025

What this PR does / why we need it:

This PR introduces enhancements to the RAG (Retrieval-Augmented Generation) demo, adds support for PDF transformation with Docling, and updates various components for better feature handling and testing.

  • examples/rag-docling/*

    • Updated the quickstart guide to include Docling and Milvus usage for transforming PDFs and storing/retrieving embeddings.
    • Added a new notebook (docling-demo.ipynb) demonstrating the use of Docling for text extraction from PDFs.
    • Added a new notebook (docling-quickstart.ipynb) showing how to use Feast to ingest and retrieve text data from the online store.
    • Introduced new functions for embedding text and generating chunk IDs.
    • Added support for Docling and Milvus in feature definitions and transformations.
    • Enhanced feature view configurations for PDF handling.
    • Updated the configuration to use Milvus as the online store with vector search capabilities.
  • sdk/python/feast/feature_store.py

    • Enhanced _get_feature_view_and_df_for_online_write to handle feature view transformations differently for singleton and non-singleton cases.
  • sdk/python/tests/unit/online_store/test_online_retrieval.py

    • Uncommented a previously skipped test for local Milvus to ensure it runs in CI.
  • sdk/python/tests/unit/online_store/test_online_writes.py

    • Added a new test class TestOnlineWritesWithTransform to validate PDF handling and feature transformations during online writes.
  • sdk/python/tests/unit/test_on_demand_python_transformation.py

    • Updated the test for docling_transform_docs to handle multiple inputs and validate the transformation logic.

Which issue(s) this PR fixes:

N/A

Misc:

N/A

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…t unique chunk-id

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
…ieval

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@franciscojavierarceo franciscojavierarceo marked this pull request as ready for review April 2, 2025 03:40
@franciscojavierarceo franciscojavierarceo requested a review from a team as a code owner April 2, 2025 03:40
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
@franciscojavierarceo franciscojavierarceo enabled auto-merge (squash) April 2, 2025 15:15
@franciscojavierarceo franciscojavierarceo merged commit 569404b into master Apr 2, 2025
46 of 47 checks passed
franciscojavierarceo pushed a commit that referenced this pull request Apr 7, 2025
# [0.48.0](v0.47.0...v0.48.0) (2025-04-07)

### Bug Fixes

* Enhance integration logos display and styling in the UI ([#5221](#5221)) ([5799257](5799257))
* Fix space typo in push.md docs ([#5184](#5184)) ([81677b2](81677b2))
* Fixed integration tests for qdrant and milvus ([#5224](#5224)) ([d6b080d](d6b080d))
* Formatting trino ([760ec0e](760ec0e))
* Multiple fixes in retrieval of online documents ([#5168](#5168)) ([66ddd3e](66ddd3e))
* Operator route creation for Feast UI in OpenShift ([e3946b4](e3946b4))
* Remove entity_rows parameter from retrieve_online_documents_v2 call ([#5225](#5225)) ([2a2e304](2a2e304))
* Styling ([#5222](#5222)) ([34c393c](34c393c))
* typo in the chart ([bd3448b](bd3448b))
* Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([#5200](#5200)) ([306acca](306acca))
* Update Qdrant online store paths in repo_config.py ([#5207](#5207)) ([ab35b0b](ab35b0b)), closes [#5206](#5206)
* Update the doc ([#5194](#5194)) ([726464e](726464e))
* Updated the operator-rabc example to test RBAC from a Kubernete pod ([#5147](#5147)) ([d23a1a5](d23a1a5))

### Features

* add `real`(float32) type for trino offline store ([#4749](#4749)) ([0947f96](0947f96))
* Add async DynamoDB timeout and retry configuration ([#5178](#5178)) ([2f3bcf5](2f3bcf5))
* Add CronJob capability to the Operator (feast apply & materialize-incremental) ([#5217](#5217)) ([285c0dc](285c0dc))
* Add RAG tutorial and Use Cases documentation ([#5226](#5226)) ([99f4004](99f4004))
* Added CLI for features, get historical and online features ([#5197](#5197)) ([4ab9f74](4ab9f74))
* Added export support in feast UI ([#5198](#5198)) ([b079553](b079553))
* Added global registry search support in Feast UI ([#5195](#5195)) ([f09ea49](f09ea49))
* Added UI for Features list ([#5192](#5192)) ([cc7fd47](cc7fd47))
* Adding blog on RAG with Milvus ([#5161](#5161)) ([b9e2e6c](b9e2e6c))
* Adding Docling RAG demo ([#5109](#5109)) ([569404b](569404b))
* Allow transformations on writes to output list of entities ([#5209](#5209)) ([955521a](955521a))
* Cache get_any_feature_view results ([#5175](#5175)) ([924b8a3](924b8a3))
* Clickhouse offline store ([#4725](#4725)) ([86794c2](86794c2))
* Enable keyword search for Milvus ([#5199](#5199)) ([ac44967](ac44967))
* Enable transformations on PDFs ([#5172](#5172)) ([3674971](3674971))
* Enable users to use Entity Query as CTE during historical retrieval ([#5202](#5202)) ([fe69eaf](fe69eaf))
* helm support more deployment config ([d575372](d575372))
* Improved CLI file structuring ([#5201](#5201)) ([972ed34](972ed34))
* Kickoff Transformation implementationtransformation code base ([#5181](#5181)) ([0083303](0083303))
* Make keep-alive timeout configurable for async DynamoDB connections ([#5167](#5167)) ([7f3e528](7f3e528))
* Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](d4d7b0d))
* Spark Transformation ([#5185](#5185)) ([be3d85c](be3d85c))
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Jun 7, 2025
* feat: Adding Docling RAG demo

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updated demo

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* cleaned up notebook

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* adding chunk id

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* adding quickstart demo that is WIP and updating docling-demo to export unique chunk-id

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* adding current tentative exmaple repo

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* adding current temporary work

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updating demo script to rename things

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updated quickstart

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* added comment

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* checking in progress

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* checking in progress for now, still have some issues with vector retrieval

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* okay think i have most things working

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* removing commenting and unnecessary code

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* uploading demo

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* uploading other files

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updated repo exaxmple

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* checking in current notebook, almost there

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* fixed linter

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* fixed transformation logic:

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* removed print

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* added README with description

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* removing print

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updating

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updating metadata file

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

* updated readme and adding dataset

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Jun 7, 2025
# [0.48.0](feast-dev/feast@v0.47.0...v0.48.0) (2025-04-07)

### Bug Fixes

* Enhance integration logos display and styling in the UI ([feast-dev#5221](feast-dev#5221)) ([5799257](feast-dev@5799257))
* Fix space typo in push.md docs ([feast-dev#5184](feast-dev#5184)) ([81677b2](feast-dev@81677b2))
* Fixed integration tests for qdrant and milvus ([feast-dev#5224](feast-dev#5224)) ([d6b080d](feast-dev@d6b080d))
* Formatting trino ([760ec0e](feast-dev@760ec0e))
* Multiple fixes in retrieval of online documents ([feast-dev#5168](feast-dev#5168)) ([66ddd3e](feast-dev@66ddd3e))
* Operator route creation for Feast UI in OpenShift ([e3946b4](feast-dev@e3946b4))
* Remove entity_rows parameter from retrieve_online_documents_v2 call ([feast-dev#5225](feast-dev#5225)) ([2a2e304](feast-dev@2a2e304))
* Styling ([feast-dev#5222](feast-dev#5222)) ([34c393c](feast-dev@34c393c))
* typo in the chart ([bd3448b](feast-dev@bd3448b))
* Update milvus-quickstart and feature_store.yaml with correct Milvus Config ([feast-dev#5200](feast-dev#5200)) ([306acca](feast-dev@306acca))
* Update Qdrant online store paths in repo_config.py ([feast-dev#5207](feast-dev#5207)) ([ab35b0b](feast-dev@ab35b0b)), closes [feast-dev#5206](feast-dev#5206)
* Update the doc ([feast-dev#5194](feast-dev#5194)) ([726464e](feast-dev@726464e))
* Updated the operator-rabc example to test RBAC from a Kubernete pod ([feast-dev#5147](feast-dev#5147)) ([d23a1a5](feast-dev@d23a1a5))

### Features

* add `real`(float32) type for trino offline store ([feast-dev#4749](feast-dev#4749)) ([0947f96](feast-dev@0947f96))
* Add async DynamoDB timeout and retry configuration ([feast-dev#5178](feast-dev#5178)) ([2f3bcf5](feast-dev@2f3bcf5))
* Add CronJob capability to the Operator (feast apply & materialize-incremental) ([feast-dev#5217](feast-dev#5217)) ([285c0dc](feast-dev@285c0dc))
* Add RAG tutorial and Use Cases documentation ([feast-dev#5226](feast-dev#5226)) ([99f4004](feast-dev@99f4004))
* Added CLI for features, get historical and online features ([feast-dev#5197](feast-dev#5197)) ([4ab9f74](feast-dev@4ab9f74))
* Added export support in feast UI ([feast-dev#5198](feast-dev#5198)) ([b079553](feast-dev@b079553))
* Added global registry search support in Feast UI ([feast-dev#5195](feast-dev#5195)) ([f09ea49](feast-dev@f09ea49))
* Added UI for Features list ([feast-dev#5192](feast-dev#5192)) ([cc7fd47](feast-dev@cc7fd47))
* Adding blog on RAG with Milvus ([feast-dev#5161](feast-dev#5161)) ([b9e2e6c](feast-dev@b9e2e6c))
* Adding Docling RAG demo ([feast-dev#5109](feast-dev#5109)) ([569404b](feast-dev@569404b))
* Allow transformations on writes to output list of entities ([feast-dev#5209](feast-dev#5209)) ([955521a](feast-dev@955521a))
* Cache get_any_feature_view results ([feast-dev#5175](feast-dev#5175)) ([924b8a3](feast-dev@924b8a3))
* Clickhouse offline store ([feast-dev#4725](feast-dev#4725)) ([86794c2](feast-dev@86794c2))
* Enable keyword search for Milvus ([feast-dev#5199](feast-dev#5199)) ([ac44967](feast-dev@ac44967))
* Enable transformations on PDFs ([feast-dev#5172](feast-dev#5172)) ([3674971](feast-dev@3674971))
* Enable users to use Entity Query as CTE during historical retrieval ([feast-dev#5202](feast-dev#5202)) ([fe69eaf](feast-dev@fe69eaf))
* helm support more deployment config ([d575372](feast-dev@d575372))
* Improved CLI file structuring ([feast-dev#5201](feast-dev#5201)) ([972ed34](feast-dev@972ed34))
* Kickoff Transformation implementationtransformation code base ([feast-dev#5181](feast-dev#5181)) ([0083303](feast-dev@0083303))
* Make keep-alive timeout configurable for async DynamoDB connections ([feast-dev#5167](feast-dev#5167)) ([7f3e528](feast-dev@7f3e528))
* Operator mounts the odh-trusted-ca-bundle configmap when deployed on RHOAI or ODH ([d4d7b0d](feast-dev@d4d7b0d))
* Spark Transformation ([feast-dev#5185](feast-dev#5185)) ([be3d85c](feast-dev@be3d85c))

Signed-off-by: Jacob Weinhold <29459386+j-wine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants