Skip to content

Conversation

@jyejare
Copy link
Contributor

@jyejare jyejare commented Jul 21, 2025

What this PR does / why we need it:

  • Non-entity retrieval with optional entity_df. (To start with Postgres Offline Store, will be rolled out for other offline stores sooner)
    • Allows for providing start and end dates for feature retrieval.
    • Point-in-time LATERAL JOINs for accurate temporal data
    • TTL-based automatic date range calculation if start_date is not given
    • Smart date defaulting (end_date = now())
    • SQL optimization with TTL filtering

Which issue(s) this PR fixes:

Fixes #5474

Misc

Now all folowing combinations just works out of the box:

Possible 1 (Returns data during the given timerange):

training_df = store.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    start_date=datetime(2025, 7, 1, 1, 00, 00),
    end_date=datetime(2025, 7, 2, 3, 30, 00),
).to_df()

Possible 2 (Returns data during the end_date minus TTL time in feature view):

training_df = store.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    end_date=datetime(2025, 7, 2, 3, 30, 00),
).to_df()

Possible 3 (Returns data from the start date to now time):

training_df = store.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
    start_date=datetime(2025, 7, 1, 1, 00, 00),
).to_df()

Possible 4 (Returns data during the TTL time in feature view to now time) :

training_df = store.get_historical_features(
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
    ],
).to_df()

Possible 5 (Existing way still works, for ODFV purpose):

entity_df = pd.DataFrame.from_dict(
    {
        "driver_id": [1005],
        "label_driver_reported_satisfaction": [5],
        "val_to_add": [5],
        "val_to_add_2": [20],
        "event_timestamp": [
            datetime(2025, 6, 29, 23, 00, 00),
        ],
    }
)

training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

@jyejare jyejare requested review from a team as code owners July 21, 2025 12:19
@jyejare jyejare changed the title Non entity retrieval Offline Store historical features retrieval without entity df, but based on datatime range Jul 21, 2025
@jyejare jyejare changed the title Offline Store historical features retrieval without entity df, but based on datatime range feat: Offline Store historical features retrieval without entity df, but based on datatime range Jul 21, 2025
@jyejare jyejare force-pushed the non_entity_retrieval branch from 9a60999 to d510325 Compare July 22, 2025 08:13
jyejare added 2 commits July 22, 2025 13:44
Signed-off-by: jyejare <jyejare@redhat.com>
Signed-off-by: jyejare <jyejare@redhat.com>
@jyejare jyejare force-pushed the non_entity_retrieval branch from d510325 to 85f2126 Compare July 22, 2025 08:14
Fixed linting and unit tests

Signed-off-by: jyejare <jyejare@redhat.com>
@jyejare jyejare force-pushed the non_entity_retrieval branch 2 times, most recently from cf3d3bd to 723f8ad Compare July 22, 2025 14:48
if entity_df is None:
# Default to current time if end_date not provided
if end_date is None:
end_date = datetime.now(tz=timezone.utc)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from feast.utils import _utc_now
_utc_now()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH I dont like one-line functions, which don't save even a one line of code :) We should completely avoid them.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally this pattern is useful if for some reason things are changed (e.g., some libraries die). I'm not overly fond of it either but it was the choice the original authors made and I tend to prefer to respect original design.

@ntkathole
Copy link
Member

check path https://github.com/feast-dev/feast/tree/master/sdk/python/feast/infra/offline_stores

@jyejare
Copy link
Contributor Author

jyejare commented Jul 24, 2025

@ntkathole I am not taking care of all Offline stores in this PR. This is a targetted PR for @itay1551 request in Issue #5474 , where uses Postgres. The rollout is required for all other offline stores but gradually, we dont need to handle that in this PR. Especially when the ask is very urgent from @itay1551 .

@jyejare
Copy link
Contributor Author

jyejare commented Jul 24, 2025

✅ Compute Engine Compatibility Analysis

Regarding @ntkathole's question about compute engine changes - no additional compute engine modifications are needed for this PostgreSQL implementation. Here's why:

🏗️ Architecture Analysis:

The PassthroughProvider routes get_historical_features() calls in two ways:

  • Default route (most common): Directly calls OfflineStore.get_historical_features() ← This PR changes apply here
  • Compute engine route: Only when BatchFeatureView has custom batch_engine config

🎯 Current Support Status:

Compute Engine Non-Entity Support Notes
Default (no engine) Full support Direct offline store delegation
LocalComputeEngine Indirect support Calls offline_store.pull_all_from_table_or_query() with start_date/end_date
SparkComputeEngine Indirect support Same delegation pattern via utility functions
SnowflakeComputeEngine ❌ N/A get_historical_features() raises NotImplementedError
LambdaComputeEngine ❌ N/A get_historical_features() raises NotImplementedError

🔧 Key Insight: Indirect Delegation

Local and Spark compute engines actually delegate back to your offline store implementation:

# In compute engines:
retrieval_job = offline_store.pull_all_from_table_or_query(
    start_date=start_time,  # ← Your changes work here!
    end_date=end_time,      # ← Your changes work here!
    # ... other params
)

🚀 Recommendation:

Your PostgreSQL-first approach is architecturally sound. The base interface updates (Provider, OfflineStore abstract classes) ensure compatibility across the ecosystem. When you roll out to other offline stores later, the compute engine layer will automatically inherit the functionality.

No blocking compute engine work required for this PR! 🎉

@jyejare jyejare force-pushed the non_entity_retrieval branch from 723f8ad to 199b632 Compare July 24, 2025 14:33
@jyejare
Copy link
Contributor Author

jyejare commented Jul 24, 2025

@ntkathole All comments addressed:

  • All Offline Stores: Would be rolled out gradually, outside of the PR. No impact of that on exissting stores.
  • FAQ : Updated
  • Compute Engine changes: Not required, details here.

Copy link
Member

@ntkathole ntkathole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of small nits that would be good to fix but otherwise this lgtm

@jyejare jyejare force-pushed the non_entity_retrieval branch from 199b632 to 23ca6a0 Compare July 28, 2025 17:52
… FAQ update

Signed-off-by: jyejare <jyejare@redhat.com>
@jyejare jyejare force-pushed the non_entity_retrieval branch from 23ca6a0 to 8ff71c0 Compare July 29, 2025 11:57
@franciscojavierarceo franciscojavierarceo merged commit df942b9 into feast-dev:master Jul 29, 2025
18 checks passed
HaoXuAI pushed a commit that referenced this pull request Aug 11, 2025
…but based on datatime range (#5527)

* Non entity based feature retrieval

Signed-off-by: jyejare <jyejare@redhat.com>

* Point in time joins and TTS based start date

Signed-off-by: jyejare <jyejare@redhat.com>

* Tests added for non empty retrieval , postgres only

Fixed linting and unit tests

Signed-off-by: jyejare <jyejare@redhat.com>

* API, CLI changes for historical features retrieval without entity_df, FAQ update

Signed-off-by: jyejare <jyejare@redhat.com>

---------

Signed-off-by: jyejare <jyejare@redhat.com>
franciscojavierarceo pushed a commit that referenced this pull request Aug 14, 2025
# [0.52.0](v0.51.0...v0.52.0) (2025-08-14)

### Bug Fixes

* Correct entity value type mapping for aliased feature views ([#5492](#5492)) ([bdf20bb](bdf20bb))
* Correct namespace reference in remote Feast project setup for operator upgrade and previous version tests ([df391ec](df391ec))
* dell pydantic v1 ([1189512](1189512))
* Fixed the entity to on-demand feature view relationship ([1c59bba](1c59bba))
* Make transformers optional ([#5544](#5544)) ([a4eef38](a4eef38))
* Push Source inherits the timestamp fields from Data Source ([#5550](#5550)) ([b7ea5cc](b7ea5cc))
* Remove the devcontainer folder. ([a9815c2](a9815c2))

### Features

* Added API for discovering Feature Views by popular tags ([#5558](#5558)) ([2e5f564](2e5f564))
* Added filtering support for featureView and featureServices api ([#5552](#5552)) ([897b3f3](897b3f3))
* Added global search api and necessary unit tests ([#5532](#5532)) ([dd3061f](dd3061f))
* Added Ray Compute Engine and Ray Offline Store Support ([#5526](#5526)) ([72de088](72de088))
* Added recent visit logging api for registry server ([#5545](#5545)) ([2adcf2c](2adcf2c))
* **auth:** support client-credentials & static token for OIDC client auth ([fc44222](fc44222))
* **auth:** support client-credentials & static token for OIDC client auth ([795fc06](795fc06))
* Implement and enhance remote document retrieval functionality ([#5487](#5487)) ([d095b96](d095b96))
* Implemented consistent error handling ([7f10151](7f10151))
* Offline Store historical features retrieval without entity df, but based on datatime range ([#5527](#5527)) ([df942b9](df942b9))
@jyejare jyejare changed the title feat: Offline Store historical features retrieval without entity df, but based on datatime range feat: Entity-less Offline Store historical features retrieval based on datatime range Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Get all historical features - without specifying particular IDs

3 participants