Batch get_stat_all calls if more than QUERY_BATCH_SIZE places.#155
Batch get_stat_all calls if more than QUERY_BATCH_SIZE places.#155tjann merged 4 commits intodatacommonsorg:masterfrom
Conversation
datacommons/stat_vars.py
Outdated
| res.update(dict(place_statvar_series)) | ||
|
|
||
| if no_data: | ||
| raise ValueError('No data in responses.') |
There was a problem hiding this comment.
This is interesting - so we throw an error even if one out of many batches fail. Is that the right response (perhaps we can add more info in the error)? We should also throw earlier instead of doing this outside the loop.
There was a problem hiding this comment.
Also thinking about the behavior, shall we just return empty result for the entry with no data in stead of throwing error? For example, if someone get stats all for crime for city, county, state, it seems a bit intrusive to throw error because we don't have data for county?
There was a problem hiding this comment.
I think it's actually only throwing error if placeData was not in any of the responses. no_data is initally set to True, then each time placeData exists (the most top level key of REST response), then no_data is set to False. It is never set to True again. I could optimize it s.t. we only assign False if it is currently True, but the logic is the same. This is also why it is outside the loop--so that as long as one batch has data, we return a response.
I will add an else statement that continues the for loop so that we don't have a key error. Will also add a test for this.
There was a problem hiding this comment.
@shifucun -- I removed the raise ValueError. We may want to come back and come through other cases in the python/REST libs that we don't want to raise errors. Thanks to your suggestion, I realize we don't actually need another test--the current cases cover, since REST is always returning a placeData, even if place DCIDs are bogus.
datacommons/stat_vars.py
Outdated
| # Ceil hack to get # of batches. | ||
| batches = -(-len(places) // utils._QUERY_BATCH_SIZE) | ||
| res = {} | ||
| no_data = True |
There was a problem hiding this comment.
it might be better to use has_data here?
There was a problem hiding this comment.
Let me know if this comment still holds after my response here: #155 (comment)
it would be a logic reversal with no behavioral changes.
There was a problem hiding this comment.
here i actually mean a wording change, use has_data as oppose to no_data for variable naming, just readability nit :)
datacommons/stat_vars.py
Outdated
| res.update(dict(place_statvar_series)) | ||
|
|
||
| if no_data: | ||
| raise ValueError('No data in responses.') |
There was a problem hiding this comment.
Also thinking about the behavior, shall we just return empty result for the entry with no data in stead of throwing error? For example, if someone get stats all for crime for city, county, state, it seems a bit intrusive to throw error because we don't have data for county?
…ome checks to make sure the REST return format is as expected.
tjann
left a comment
There was a problem hiding this comment.
PTAL. This shows the changes since the last review: https://github.com/datacommonsorg/api-python/pull/155/files/d24bf0597c8b3f2c857770ae04b083e75c4899d3..a8fad527090045e073589ec68c2fec26ca964505
datacommons/stat_vars.py
Outdated
| # Ceil hack to get # of batches. | ||
| batches = -(-len(places) // utils._QUERY_BATCH_SIZE) | ||
| res = {} | ||
| no_data = True |
datacommons/stat_vars.py
Outdated
| res.update(dict(place_statvar_series)) | ||
|
|
||
| if no_data: | ||
| raise ValueError('No data in responses.') |
There was a problem hiding this comment.
@shifucun -- I removed the raise ValueError. We may want to come back and come through other cases in the python/REST libs that we don't want to raise errors. Thanks to your suggestion, I realize we don't actually need another test--the current cases cover, since REST is always returning a placeData, even if place DCIDs are bogus.
|
I was on the fence between doing lenient key error handling or throwing errors if REST response didn't follow the expectations. The former might be nice for us to have more time to fix issues when REST changes, but also makes it difficult for us to know when something is wrong. So I opted for the latter in the spirit of failing fast. |
|
The error handling here seems reasonable to me, since those are truly errors that should get looked at (and isn't a matter of a 1 missing data out of 1000 calls). |
|
Thanks all for the review! |
Before code changes:
After code changes:
