Skip to content

Conversation

@ntkathole
Copy link
Member

What this PR does / why we need it:

This PR implements tiling for computing time-windowed aggregations efficiently by pre-aggregating data into small, time-bucketed units (tiles) that can be reused across multiple queries. Instead of scanning all raw events for each window, it:

  • Pre-aggregates data into small time buckets (tiles)
  • Computes windowed aggregations by subtracting tiles: window_at_T = tile_at_T - tile_at_(T-window_size)
  • Stores intermediate representations (IRs) for holistic aggregations (avg, std, var) to enable correct merging
  • Integrates tiling into Spark and Ray compute engines for StreamFeatureView

Sawtooth Window Tiling

  • Tile Generation: Events bucketed into hop-sized intervals (e.g., 1-minute tiles)
  • Cumulative Aggregation: Each tile contains cumulative sum/count from start
  • Window Computation: Windowed aggregation = tile_at_T - tile_at_(T-window_size)
  • IR Handling: For holistic aggs, IRs subtracted separately, then final value recomputed

Enable Tiling in StreamFeatureView

sales_features = StreamFeatureView(
    name="sales_features",
    entities=[customer],
    source=kafka_source,
    aggregations=[
        Aggregation(
            column="amount",
            function="avg",
            time_window=timedelta(minutes=5),
        ),
        Aggregation(
            column="amount",
            function="sum",
            time_window=timedelta(minutes=5),
        ),
    ],
    timestamp_field="event_timestamp",
    enable_tiling=True,                      # Enable tiling optimization
    tiling_hop_size=timedelta(minutes=1),    # Generate tiles every 1 minute
)

Architecture

┌─────────────────────────────────────────────────────────────┐
│  StreamFeatureView (enable_tiling=True)                     │
└────────────────────────┬────────────────────────────────────┘
                         │
         ┌───────────────┴───────────────┐
         │                               │
    ┌────▼────┐                    ┌────▼────┐
    │  Spark  │                    │   Ray   │
    │  Node   │                    │  Node   │
    └────┬────┘                    └────┬────┘
         │                               │
         │  Convert to pandas            │
         ▼                               ▼
┌─────────────────────────────────────────────────────────────┐
│  Tiling Logic.                                                        │
│  ┌──────────────────────────────────────────────────────┐   │
│  │  orchestrator.apply_sawtooth_window_tiling()         │   │
│  │  - Group by entity + hop interval                    │   │
│  │  - Compute cumulative aggregations (tiles)           │   │
│  │  - Output: Cumulative tiles at each hop              │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│  ┌──────────────────────▼───────────────────────────────┐   │
│  │  tile_subtraction.convert_cumulative_to_windowed()   │   │
│  │  - Subtract tiles: window = tile_T - tile_(T-W)      │   │
│  │  - Recompute holistic aggs from IRs                  │   │
│  │  - Output: Windowed aggregations                     │   │
│  └──────────────────────┬───────────────────────────────┘   │
│                         │                                    │
│  ┌──────────────────────▼───────────────────────────────┐   │
│  │  tile_subtraction.deduplicate_keep_latest()          │   │
│  │  - Keep latest window per entity                     │   │
│  └──────────────────────────────────────────────────────┘   │
└─────────────────────────┬───────────────────────────────────┘
                          │  Convert back to engine format
                          ▼
                  ┌───────────────┐
                  │ Online Store  │
                  │               │
                  └───────────────┘

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements tiling support for efficient time-windowed aggregations in Feast's streaming feature views. The implementation uses a sawtooth window tiling algorithm with intermediate representations (IRs) to enable correct merging of holistic aggregations (avg, std, var) while providing performance benefits for streaming scenarios.

Key Changes:

  • Adds core tiling logic with IR-based aggregation in new infra/tiling/ module
  • Extends StreamFeatureView with enable_tiling and tiling_hop_size configuration options
  • Integrates tiling into Spark and Ray compute engines with pandas-based processing
  • Updates protobuf definitions and documentation

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 23 comments.

Show a summary per file
File Description
sdk/python/feast/infra/tiling/base.py Defines IR metadata structures for algebraic and holistic aggregations
sdk/python/feast/infra/tiling/orchestrator.py Implements cumulative tile generation using sawtooth window algorithm
sdk/python/feast/infra/tiling/tile_subtraction.py Converts cumulative tiles to windowed aggregations via tile subtraction
sdk/python/feast/infra/tiling/__init__.py Exports tiling module public API
sdk/python/feast/stream_feature_view.py Adds tiling configuration to StreamFeatureView class
protos/feast/core/StreamFeatureView.proto Adds protobuf fields for tiling configuration
sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.py Generated protobuf Python code for tiling fields
sdk/python/feast/protos/feast/core/StreamFeatureView_pb2.pyi Generated protobuf type stubs for tiling fields
sdk/python/feast/infra/compute_engines/spark/nodes.py Implements tiling execution path for Spark engine
sdk/python/feast/infra/compute_engines/spark/feature_builder.py Passes tiling config to Spark aggregation nodes
sdk/python/feast/infra/compute_engines/ray/nodes.py Implements tiling execution path for Ray engine
sdk/python/feast/infra/compute_engines/ray/feature_builder.py Passes tiling config to Ray aggregation nodes
sdk/python/feast/utils.py Extracts input columns from aggregations for feature views
sdk/python/tests/unit/infra/compute_engines/spark/test_nodes.py Updates test to pass required spark_session parameter
docs/getting-started/concepts/tiling.md Comprehensive documentation on tiling concepts and usage
docs/getting-started/concepts/stream-feature-view.md Adds reference to tiling documentation
docs/getting-started/concepts/README.md Adds tiling.md to concepts index

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,27 @@
"""
Tiling for efficient time-windowed aggregations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really cool. Is there a way to merge with the Aggregation interface? Or something like
aggregation/

  • Aggregation -- Base aggregation interface
  • tiling/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, this make sense. Changed now to:

feast/
  - aggregation/
    - __init__.py (Aggregation class)
    - tiling/

@ntkathole ntkathole force-pushed the aggregation_tiling branch 2 times, most recently from 7189d22 to 839c7a4 Compare November 21, 2025 15:51
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Signed-off-by: ntkathole <nikhilkathole2683@gmail.com>
Copy link
Member

@franciscojavierarceo franciscojavierarceo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy Thanksgiving! 🚀🚀🚀

@franciscojavierarceo franciscojavierarceo merged commit 7a99166 into feast-dev:master Nov 28, 2025
19 checks passed
jfw-ppi pushed a commit to jfw-ppi/feast that referenced this pull request Nov 30, 2025
…t-dev#5724)

Signed-off-by: Jacob Weinhold <29459386+jfw-ppi@users.noreply.github.com>
franciscojavierarceo pushed a commit that referenced this pull request Dec 16, 2025
# [0.58.0](v0.57.0...v0.58.0) (2025-12-16)

### Bug Fixes

* Add java proto ([#5719](#5719)) ([fc3ea20](fc3ea20))
* Add possibility to force full features names for materialize ops ([#5728](#5728)) ([55c9c36](55c9c36))
* Fixed file registry cache sync ([09505d4](09505d4))
* Handle hyphon in sqlite project name ([#5575](#5575)) ([#5749](#5749)) ([b8346ff](b8346ff))
* Pinned substrait to fix protobuf issue ([d0ef4da](d0ef4da))
* Set TLS certificate annotation only on gRPC service ([#5715](#5715)) ([75d13db](75d13db))
* SQLite online store deletes tables from other projects in shared registry scenarios ([#5766](#5766)) ([fabce76](fabce76))
* Validate not existing entity join keys for preventing panic ([0b93559](0b93559))

### Features

* Add annotations for pod templates ([534e647](534e647))
* Add Pytorch template ([#5780](#5780)) ([6afd353](6afd353))
* Add support for extra options for stream source ([#5618](#5618)) ([18956c2](18956c2))
* Added matched_tag field search api results with fuzzy search capabilities ([#5769](#5769)) ([4a9ffae](4a9ffae))
* Added support for enabling metrics in Feast Operator ([#5317](#5317)) ([#5748](#5748)) ([a8498c2](a8498c2))
* Configure CacheTTLSecondscache,CacheMode for file-based registry in Feast Operator([#5708](#5708)) ([#5744](#5744)) ([f25f83b](f25f83b))
* Implemented Tiling Support for Time-Windowed Aggregations ([#5724](#5724)) ([7a99166](7a99166))
* Offline Store historical features retrieval based on datetime range for spark ([#5720](#5720)) ([27ec8ec](27ec8ec))
* Offline Store historical features retrieval based on datetime range in dask ([#5717](#5717)) ([a16582a](a16582a))
* Production ready feast operator with v1 apiversion ([#5771](#5771)) ([49359c6](49359c6))
* Support for Map value data type ([#5768](#5768)) ([#5772](#5772)) ([b99a8a9](b99a8a9))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants