Batch get_stat_all calls if more than QUERY_BATCH_SIZE places. by tjann · Pull Request #155 · datacommonsorg/api-python

tjann · 2020-09-08T17:19:49Z

Before code changes:

After code changes:

datacommons/stat_vars.py

beets · 2020-09-10T17:08:46Z

datacommons/stat_vars.py

+        res.update(dict(place_statvar_series))
+
+    if no_data:
+        raise ValueError('No data in responses.')


This is interesting - so we throw an error even if one out of many batches fail. Is that the right response (perhaps we can add more info in the error)? We should also throw earlier instead of doing this outside the loop.

Also thinking about the behavior, shall we just return empty result for the entry with no data in stead of throwing error? For example, if someone get stats all for crime for city, county, state, it seems a bit intrusive to throw error because we don't have data for county?

I think it's actually only throwing error if placeData was not in any of the responses. no_data is initally set to True, then each time placeData exists (the most top level key of REST response), then no_data is set to False. It is never set to True again. I could optimize it s.t. we only assign False if it is currently True, but the logic is the same. This is also why it is outside the loop--so that as long as one batch has data, we return a response.

I will add an else statement that continues the for loop so that we don't have a key error. Will also add a test for this.

@shifucun -- I removed the raise ValueError. We may want to come back and come through other cases in the python/REST libs that we don't want to raise errors. Thanks to your suggestion, I realize we don't actually need another test--the current cases cover, since REST is always returning a placeData, even if place DCIDs are bogus.

shifucun · 2020-09-10T17:42:40Z

datacommons/stat_vars.py

+    # Ceil hack to get # of batches.
+    batches = -(-len(places) // utils._QUERY_BATCH_SIZE)
+    res = {}
+    no_data = True


it might be better to use has_data here?

Let me know if this comment still holds after my response here: #155 (comment)

it would be a logic reversal with no behavioral changes.

here i actually mean a wording change, use has_data as oppose to no_data for variable naming, just readability nit :)

Done, thanks Bo!

shifucun · 2020-09-10T17:45:58Z

datacommons/stat_vars.py

+        res.update(dict(place_statvar_series))
+
+    if no_data:
+        raise ValueError('No data in responses.')


Also thinking about the behavior, shall we just return empty result for the entry with no data in stead of throwing error? For example, if someone get stats all for crime for city, county, state, it seems a bit intrusive to throw error because we don't have data for county?

…ome checks to make sure the REST return format is as expected.

tjann

PTAL. This shows the changes since the last review: https://github.com/datacommonsorg/api-python/pull/155/files/d24bf0597c8b3f2c857770ae04b083e75c4899d3..a8fad527090045e073589ec68c2fec26ca964505

datacommons/stat_vars.py

tjann · 2020-09-15T16:57:48Z

datacommons/stat_vars.py

+    # Ceil hack to get # of batches.
+    batches = -(-len(places) // utils._QUERY_BATCH_SIZE)
+    res = {}
+    no_data = True


Done, thanks Bo!

tjann · 2020-09-15T16:58:34Z

datacommons/stat_vars.py

+        res.update(dict(place_statvar_series))
+
+    if no_data:
+        raise ValueError('No data in responses.')


@shifucun -- I removed the raise ValueError. We may want to come back and come through other cases in the python/REST libs that we don't want to raise errors. Thanks to your suggestion, I realize we don't actually need another test--the current cases cover, since REST is always returning a placeData, even if place DCIDs are bogus.

tjann · 2020-09-15T17:34:13Z

I was on the fence between doing lenient key error handling or throwing errors if REST response didn't follow the expectations.

The former might be nice for us to have more time to fix issues when REST changes, but also makes it difficult for us to know when something is wrong. So I opted for the latter in the spirit of failing fast.

beets · 2020-09-15T18:04:37Z

The error handling here seems reasonable to me, since those are truly errors that should get looked at (and isn't a matter of a 1 missing data out of 1000 calls).

shifucun

thanks for the update!

tjann · 2020-09-15T21:22:15Z

Thanks all for the review!

Batch get_stat_all calls if more than QUERY_BATCH_SIZE places.

91c4424

tjann requested a review from beets September 8, 2020 17:19

Newline at eof.

d24bf05

tjann requested a review from shifucun September 8, 2020 17:22

beets requested changes Sep 10, 2020

View reviewed changes

shifucun reviewed Sep 10, 2020

View reviewed changes

tjann added 2 commits September 15, 2020 10:25

Remove the has_data check to just pass through what REST gives. Add s…

d0545c0

…ome checks to make sure the REST return format is as expected.

Replace mixer reference with REST API reference.

a8fad52

tjann commented Sep 15, 2020

View reviewed changes

tjann requested review from beets and shifucun September 15, 2020 17:32

beets approved these changes Sep 15, 2020

View reviewed changes

shifucun approved these changes Sep 15, 2020

View reviewed changes

tjann merged commit 477977f into datacommonsorg:master Sep 15, 2020

tjann deleted the batch_stats_clean branch September 15, 2020 21:22

tjann mentioned this pull request Sep 16, 2020

Version bump for python and pandas API for get_stat_all batching. #158

Merged

Conversation

tjann commented Sep 8, 2020

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjann Sep 10, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjann left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tjann commented Sep 15, 2020

Uh oh!

beets commented Sep 15, 2020

Uh oh!

shifucun left a comment

Choose a reason for hiding this comment

Uh oh!

tjann commented Sep 15, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tjann Sep 10, 2020 •

edited

Loading