Skip to content

Conversation

@thrau
Copy link
Member

@thrau thrau commented Oct 17, 2025

Motivation

While in the process of fixing an issue with the way aws request events are batched from the pro version, I cleaned up some of the implementation by pulling out a generic batching utility, and re-use that across both the GlobalAnalyticsBus as well as the ProServiceRequestAggregator. I also noticed that the CLI tests wold break when importing batcher because of the pydantic requirement that I found was not really necessary if we just used a regular constructor for Batcher. Basically just cleaning up the implementation and de-duplicating some code, nothing major.

Changes

  • no functional behavior, just refactoring
  • generalized the PublisherBuffer concept from the analytics client into an AsyncBatcher utility
  • renamed batch_policy.py to batching.py to be more consistent with our utils naming
  • simplified construction of Batcher and removed the pydantic dependency

@thrau thrau added semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes labels Oct 17, 2025
@github-actions
Copy link

github-actions bot commented Oct 17, 2025

Test Results - Preflight, Unit

22 363 tests  ±0   20 613 ✅ ±0   16m 15s ⏱️ +29s
     1 suites ±0    1 750 💤 ±0 
     1 files   ±0        0 ❌ ±0 

Results for commit b89dd37. ± Comparison against base commit b4c510a.

This pull request removes 14 and adds 14 tests. Note that renamed tests count towards both.
tests.unit.utils.analytics.test_publisher.TestPublisherBuffer ‑ test_basic
tests.unit.utils.analytics.test_publisher.TestPublisherBuffer ‑ test_interval
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_add_multple_items
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_add_single_item
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_deep_copy
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_flush
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_max_count_limit
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_max_count_partial_flush
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_max_window_limit
tests.unit.utils.test_batch_policy.TestBatcher ‑ test_multiple_policies
…
tests.unit.utils.test_batching.TestAsyncBatcher ‑ test_basic
tests.unit.utils.test_batching.TestAsyncBatcher ‑ test_interval
tests.unit.utils.test_batching.TestBatcher ‑ test_add_multiple_items
tests.unit.utils.test_batching.TestBatcher ‑ test_add_single_item
tests.unit.utils.test_batching.TestBatcher ‑ test_deep_copy
tests.unit.utils.test_batching.TestBatcher ‑ test_flush
tests.unit.utils.test_batching.TestBatcher ‑ test_max_count_limit
tests.unit.utils.test_batching.TestBatcher ‑ test_max_count_partial_flush
tests.unit.utils.test_batching.TestBatcher ‑ test_max_window_limit
tests.unit.utils.test_batching.TestBatcher ‑ test_multiple_policies
…

♻️ This comment has been updated with latest results.

@thrau thrau force-pushed the refactor-analytics-batching branch from a9fd893 to 3037d97 Compare October 17, 2025 13:07
@github-actions
Copy link

github-actions bot commented Oct 17, 2025

Test Results (amd64) - Acceptance

7 tests  ±0   5 ✅ ±0   3m 19s ⏱️ -3s
1 suites ±0   2 💤 ±0 
1 files   ±0   0 ❌ ±0 

Results for commit b89dd37. ± Comparison against base commit b4c510a.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

Test Results (amd64) - Integration, Bootstrap

    5 files      5 suites   2h 39m 23s ⏱️
5 252 tests 4 737 ✅ 515 💤 0 ❌
5 258 runs  4 737 ✅ 521 💤 0 ❌

Results for commit b89dd37.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Oct 17, 2025

LocalStack Community integration with Pro

    2 files  ±0      2 suites  ±0   2h 5m 33s ⏱️ + 3m 55s
4 878 tests ±0  4 523 ✅ ±0  355 💤 ±0  0 ❌ ±0 
4 880 runs  ±0  4 523 ✅ ±0  357 💤 ±0  0 ❌ ±0 

Results for commit b89dd37. ± Comparison against base commit b4c510a.

♻️ This comment has been updated with latest results.

@thrau thrau force-pushed the refactor-analytics-batching branch from 3037d97 to 080a8b4 Compare October 20, 2025 16:12
Copy link
Contributor

@gregfurman gregfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the addition and refactor here!

Just some questions regarding usage of the utility, suggestions for how we could integrate the current Batcher into the AsyncBatcher, and some additional context around the initial implementation.

return list(filter(lambda _: condition, items))


def iter_chunks(items: list[_E], chunk_size: int) -> Generator[list[_E], None, None]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this not have identical behaviour to itertools.batched?

(sidenote: now that we're on Python 3.13 we can finally use this utility which makes me so happy 🤓)

Copy link
Member Author

@thrau thrau Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, i did not know this existed. i added this before 3.13 :-) i think it's worth keeping until the only thing we support is 3.13+ or until we use this code exclusively for the runtime

max_batch_size: int
handler: BatchHandler[T]

_buffer: list[T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we not use the original Batcher utlity to collect the batches instead of re-implementing the logic? There are quite a few edge-cases that the original batcher handles that could be useful here IMO

i.e

    batcher: Batcher[T]

    def add(self, item: T):
        with self._flush_lock:
            if self._closed:
                raise ValueError("Batcher is stopped, can no longer add items")
             
             # This call triggers the batcher's policy, meaning we should flush.
             if self.batcher.add(item): 
                self._flush_lock.notify_all()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my main reason was to not entangle the two implementations. the batching logic is ultimately relatively simple and this way it's clearer when reading the code what exactly happens.

self._flush_lock = threading.Condition()
self._closed = False

def add(self, item: T):
Copy link
Contributor

@gregfurman gregfurman Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: I think it'd be useful to know whether a call has triggered a batch, or perhaps some signal indicating that a batch has been sent. Else, how would I know whether the function has been called?

vals = []
buffer = AsyncBatcher(lambda x: vals.append(x), max_batch_size=2, max_flush_interval=1000)

for data in stream_generator():
   if not (is_triggered := buffer.add(data)):
      continue

    # The BatchHandler should have been called
    do_stuff(vals)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some signal indicating that a batch has been sent

the invocation of the handler is the signal. this is fundamentally the difference to Batcher. AsyncBatcher calls the handler for you.

in your case, i would just pass do_stuff as a handler, as it seems that's what you want. no need to collect values first.

AsyncBatcher(do_stuff, max_batch_size=2, max_flush_interval=1000)

# we can call the processor outside the lock so we can continue adding items into the next batch without
# waiting on the processor to return.
try:
self.handler(batch)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the handler fails, that means we lose the entire batch of data. Is this desirable?

Copy link
Member Author

@thrau thrau Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suppose an exception handler would be a nice mechanism to give more flexibility on what should happen with batches that fail. this is not so easy to generalize in a way that is flexible, and would introduce a lot of additional complexity that there's currently no use case for. the assumption therefore is currently that fault tolerance is the handler's responsibility.

Copy link
Contributor

@gregfurman gregfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I'm going to see if I can use this batcher for some of our ESM pollers. Since it's quite a complicated use-case, it'll aid some of our discussions on usage of the new utlity 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs: skip Pull request does not require documentation changes notes: skip Pull request does not have to be mentioned in the release notes semver: minor Non-breaking changes which can be included in minor releases, but not in patch releases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants