Skip to content

Batch get_stat_all calls if more than QUERY_BATCH_SIZE places.#155

Merged
tjann merged 4 commits intodatacommonsorg:masterfrom
tjann:batch_stats_clean
Sep 15, 2020
Merged

Batch get_stat_all calls if more than QUERY_BATCH_SIZE places.#155
tjann merged 4 commits intodatacommonsorg:masterfrom
tjann:batch_stats_clean

Conversation

@tjann
Copy link
Contributor

@tjann tjann commented Sep 8, 2020

Before code changes:

Screen Shot 2020-09-08 at 10 03 14 AM

After code changes:
Screen Shot 2020-09-08 at 10 04 51 AM

@tjann tjann requested a review from beets September 8, 2020 17:19
@tjann tjann requested a review from shifucun September 8, 2020 17:22
res.update(dict(place_statvar_series))

if no_data:
raise ValueError('No data in responses.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting - so we throw an error even if one out of many batches fail. Is that the right response (perhaps we can add more info in the error)? We should also throw earlier instead of doing this outside the loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking about the behavior, shall we just return empty result for the entry with no data in stead of throwing error? For example, if someone get stats all for crime for city, county, state, it seems a bit intrusive to throw error because we don't have data for county?

Copy link
Contributor Author

@tjann tjann Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's actually only throwing error if placeData was not in any of the responses. no_data is initally set to True, then each time placeData exists (the most top level key of REST response), then no_data is set to False. It is never set to True again. I could optimize it s.t. we only assign False if it is currently True, but the logic is the same. This is also why it is outside the loop--so that as long as one batch has data, we return a response.

I will add an else statement that continues the for loop so that we don't have a key error. Will also add a test for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shifucun -- I removed the raise ValueError. We may want to come back and come through other cases in the python/REST libs that we don't want to raise errors. Thanks to your suggestion, I realize we don't actually need another test--the current cases cover, since REST is always returning a placeData, even if place DCIDs are bogus.

# Ceil hack to get # of batches.
batches = -(-len(places) // utils._QUERY_BATCH_SIZE)
res = {}
no_data = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be better to use has_data here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if this comment still holds after my response here: #155 (comment)

it would be a logic reversal with no behavioral changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here i actually mean a wording change, use has_data as oppose to no_data for variable naming, just readability nit :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks Bo!

res.update(dict(place_statvar_series))

if no_data:
raise ValueError('No data in responses.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thinking about the behavior, shall we just return empty result for the entry with no data in stead of throwing error? For example, if someone get stats all for crime for city, county, state, it seems a bit intrusive to throw error because we don't have data for county?

Copy link
Contributor Author

@tjann tjann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Ceil hack to get # of batches.
batches = -(-len(places) // utils._QUERY_BATCH_SIZE)
res = {}
no_data = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks Bo!

res.update(dict(place_statvar_series))

if no_data:
raise ValueError('No data in responses.')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shifucun -- I removed the raise ValueError. We may want to come back and come through other cases in the python/REST libs that we don't want to raise errors. Thanks to your suggestion, I realize we don't actually need another test--the current cases cover, since REST is always returning a placeData, even if place DCIDs are bogus.

@tjann tjann requested review from beets and shifucun September 15, 2020 17:32
@tjann
Copy link
Contributor Author

tjann commented Sep 15, 2020

I was on the fence between doing lenient key error handling or throwing errors if REST response didn't follow the expectations.

The former might be nice for us to have more time to fix issues when REST changes, but also makes it difficult for us to know when something is wrong. So I opted for the latter in the spirit of failing fast.

@beets
Copy link
Contributor

beets commented Sep 15, 2020

The error handling here seems reasonable to me, since those are truly errors that should get looked at (and isn't a matter of a 1 missing data out of 1000 calls).

Copy link
Contributor

@shifucun shifucun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the update!

@tjann
Copy link
Contributor Author

tjann commented Sep 15, 2020

Thanks all for the review!

@tjann tjann merged commit 477977f into datacommonsorg:master Sep 15, 2020
@tjann tjann deleted the batch_stats_clean branch September 15, 2020 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants