Experiment with merge policies

Some possible merge policies:
* If we always merge the smallest batches of size k in a slot, we will pile up batches of size 2k in that slot if new batches of size k keep getting added.
* If we always merge the largest batches, that's inefficient.
* We currently always merge the batches most recently added. The merge result (if it goes to the same slot) becomes the most recently added. So we'll end doing merges of k*n + k => k*(n+1)  until that overfills the slot, if we keep getting new batches of size k. Not ideal either.
* Another policy would be to merge the two least recently added batches. Then we'll do k+k=>2k, ..., k+k=>2k, 2k+2k=>4k, ..., 2k+2k=>4k, 4k+4k=>8k, ... and so on. It could also be good for GC to work with older batches (as you observed). That might be a good policy. (It could be bad for cache locality, since we're working with the oldest data.)
* Another variation would be to merge the least recently added batch with the other batch closest in size. I don't have an intuition about this.

_Originally posted by @blp in https://github.com/feldera/feldera/pull/2115#discussion_r1697432127_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with merge policies #2124

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Experiment with merge policies #2124

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions