Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 53 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1 +1,54 @@
# Changelog

## 1.0.1

**Date** - 10/2/2019

**Release Tag** - [v1.0.1](https://github.com/datacommonsorg/api-python/releases/tag/v1.0.1)

**Release Status** - Current head of branch [`stable-1.x`](https://github.com/datacommonsorg/api-python/tree/stable-1.x)

New features added to the Python Client API

- Added two new functions `get_pop_obs` and `get_place_obs`
- SPARQL query is now supported as a function `query` instead of a class.
- Added documentation on how to provision an API key and provide it to the API

Bugs fixed in new release

- Fixed various typos and formatting issues in the documentation.
- If the index of the `pandas.Series` passed into functions such as `get_populations` and `get_observations` was not contiguous, then the assignment step would not properly align the values returned by calling the function. This is because the `pandas.Series` returned by the function would have a different index than the given series. This is fixed by assigning the index of the returned series to that of the given series.

## 1.0.0

**Date** - 8/9/2019

**Release Tag** - [v1.0.0](https://github.com/datacommonsorg/api-python/releases/tag/v1.0.0)

New release of the Python Client API.

- New functions in the API built on top of the [Data Commons REST API](https://github.com/datacommonsorg/mixer).
- `get_property_labels`
- `get_property_values`
- `get_triples`
- `get_populations`
- `get_observations`
- `get_places_in`
- New tests and examples checked into `datacommons/test` and `datacommons/examples`
- Full documentation released on [readthedocs](https://datacommons.readthedocs.io/en/latest/)

## 0.4.3

**Date** - 8/13/2019

**Release Tag** - [v0.4.3](https://github.com/datacommonsorg/api-python/releases/tag/v0.4.3)

**Release Status** - Latest on [PyPI](https://pypi.org/project/datacommons/). Current head of branch [`stable-0.x`](https://github.com/datacommonsorg/api-python/tree/stable-0.x).

Patch release that fixes bugs in `datacommons.Client`.

- Functions `get_cities` and `get_states` now provides `typeOf` constraints in their datalog queries.

## 0.x

Initial release of the Data Commons API.
1 change: 0 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ RUN apt-get -q update && \

# Install python
RUN python setup.py -q install
RUN pip3 install --upgrade requests

# Run the tests
RUN ./build.sh
4 changes: 2 additions & 2 deletions datacommons/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@
# limitations under the License.

# Data Commons SPARQL query support
from datacommons.query import Query
from datacommons.query import query

# Data Commons Python Client API
from datacommons.core import get_property_labels, get_property_values, get_triples
from datacommons.places import get_places_in
from datacommons.populations import get_populations, get_observations
from datacommons.populations import get_populations, get_observations, get_pop_obs, get_place_obs

# Other utilities
from .utils import set_api_key, clean_frame, flatten_frame
2 changes: 1 addition & 1 deletion datacommons/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ def get_property_values(dcids,

# Format the results as a Series if a Pandas Series is provided.
if isinstance(dcids, pd.Series):
return pd.Series([results[dcid] for dcid in dcids])
return pd.Series([results[dcid] for dcid in dcids], index=dcids.index)
return results


Expand Down
6 changes: 6 additions & 0 deletions datacommons/examples/populations.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@

import datacommons as dc
import pandas as pd
import pprint

import datacommons.utils as utils

Expand Down Expand Up @@ -84,5 +85,10 @@ def main():
print(pd_frame)


# Get all population and observation data of Mountain View.
utils._print_header('Get Mountain View population and observation')
popobs = dc.get_pop_obs("geoId/0649670")
pprint.pprint(popobs)

if __name__ == '__main__':
main()
7 changes: 4 additions & 3 deletions datacommons/places.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@


def get_places_in(dcids, place_type):
""" Returns :obj:`Place`'s contained in :code:`dcids` of type `place_type`.
""" Returns :obj:`Place`s contained in :code:`dcids` of type
:code:`place_type`.

Args:
dcids (Union[:obj:`list` of :obj:`str`, :obj:`pandas.Series`]): Dcids to get
Expand All @@ -55,7 +56,7 @@ def get_places_in(dcids, place_type):
Examples:
We would like to get all Counties contained in
`California <https://browser.datacommons.org/kg?dcid=geoId/06>`_. Specifying
the :code:`dcids` as a :obj:`list` resulst in the following.
the :code:`dcids` as a :obj:`list` result in the following.

>>> get_places_in(["geoId/06"], "County")
{
Expand Down Expand Up @@ -90,5 +91,5 @@ def get_places_in(dcids, place_type):
# Create the results and format it appropriately
result = utils._format_expand_payload(payload, 'place', must_exist=dcids)
if isinstance(dcids, pd.Series):
return pd.Series([result[dcid] for dcid in dcids])
return pd.Series([result[dcid] for dcid in dcids], index=dcids.index)
return result
212 changes: 210 additions & 2 deletions datacommons/populations.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ def get_populations(dcids, population_type, constraining_properties={}):
payload, 'population', must_exist=dcids)
if isinstance(dcids, pd.Series):
flattened = utils._flatten_results(result, default_value="")
return pd.Series([flattened[dcid] for dcid in dcids])
return pd.Series([flattened[dcid] for dcid in dcids], index=dcids.index)

# Drop empty results while flattening
return utils._flatten_results(result)
Expand Down Expand Up @@ -223,7 +223,7 @@ def get_observations(dcids,
payload, 'observation', must_exist=dcids)
if isinstance(dcids, pd.Series):
flattened = utils._flatten_results(result, default_value="")
series = pd.Series([flattened[dcid] for dcid in dcids])
series = pd.Series([flattened[dcid] for dcid in dcids], index=dcids.index)
return series.apply(pd.to_numeric, errors='coerce')

# Drop empty results by calling _flatten_results without default_value, then
Expand All @@ -235,3 +235,211 @@ def get_observations(dcids,
except ValueError:
typed_results[k] = v
return typed_results


def get_pop_obs(dcid):
""" Returns all :obj:`StatisticalPopulation` and :obj:`Observation` \
of a :obj:`Thing`.

Args:
dcid (:obj:`str`): Dcid of the thing.

Returns:
A :obj:`dict` of :obj:`StatisticalPopulation` and :obj:`Observation` that
are associated to the thing identified by the given :code:`dcid`. The given
dcid is linked to the returned :obj:`StatisticalPopulation`,
which are the :obj:`observedNode` of the returned :obj:`Observation`.
See example below for more detail about how the returned :obj:`dict` is
structured.

Raises:
ValueError: If the payload returned by the Data Commons REST API is
malformed.

Examples:
We would like to get all :obj:`StatisticalPopulation` and
:obj:`Observations` of
`Santa Clara <https://browser.datacommons.org/kg?dcid=geoId/06085>`_.

>>> get_pop_obs("geoId/06085")
{
'name': 'Santa Clara',
'placeType': 'County',
'populations': {
'dc/p/zzlmxxtp1el87': {
'popType': 'Household',
'numConstraints': 3,
'propertyValues': {
'householderAge': 'Years45To64',
'householderRace': 'USC_AsianAlone',
'income': 'USDollar35000To39999'
},
'observations': [
{
'marginOfError': 274,
'measuredProp': 'count',
'measuredValue': 1352,
'measurementMethod': 'CensusACS5yrSurvey',
'observationDate': '2017'
},
{
'marginOfError': 226,
'measuredProp': 'count',
'measuredValue': 1388,
'measurementMethod': 'CensusACS5yrSurvey',
'observationDate': '2013'
}
],
},
},
'observations': [
{
'meanValue': 4.1583,
'measuredProp': 'particulateMatter25',
'measurementMethod': 'CDCHealthTracking',
'observationDate': '2014-04-04',
'observedNode': 'geoId/06085'
},
{
'meanValue': 9.4461,
'measuredProp': 'particulateMatter25',
'measurementMethod': 'CDCHealthTracking',
'observationDate': '2014-03-20',
'observedNode': 'geoId/06085'
}
]
}

Notice that the return value is a multi-level :obj:`dict`. The top level
contains the following keys.

- :code:`name` and :code:`placeType` provides the name and type of the
:obj:`Place` identified by the given :code:`dcid`.
- :code:`populations` maps to a :obj:`dict` containing all
:obj:`StatisticalPopulation` that have the given :code:`dcid` as its
:obj:`location`.
- :code:`observations` maps to a :obj:`list` containing all
:obj:`Observation` that have the given :code:`dcid` as its
:obj:`observedNode`.

The :code:`populations` dictionary is keyed by the dcid of each
:obj:`StatisticalPopulation`. The mapped dictionary contains the following
keys.

- :code:`popType` which gives the population type of the
:obj:`StatisticalPopulation` identified by the key.
- :code:`numConstraints` which gives the number of constraining properties
defined for the identified :obj:`StatisticalPopulation`.
- :code:`propertyValues` which gives a :obj:`dict` mapping a constraining
property to its value for the identified :obj:`StatisticalPopulation`.
- :code:`observations` which gives a list of all :obj:`Observation`'s that
have the identified :obj:`StatisticalPopulation` as their
:obj:`observedNode`.

Each :obj:`Observation` is represented by a :code:`dict` that have the keys:

- :code:`measuredProp`: The property measured by the :obj:`Observation`.
- :code:`observationDate`: The date when the :obj:`Observation` was made.
- :code:`observationPeriod` (optional): The period over which the
:obj:`Observation` was made.
- :code:`measurementMethod` (optional): A field providing additional
information on how the :obj:`Observation` was collected.
- Additional fields that denote values measured by the :obj:`Observation`.
These may include the following: :code:`measuredValue`, :code:`meanValue`,
:code:`medianValue`, :code:`maxValue`, :code:`minValue`, :code:`sumValue`,
:code:`marginOfError`, :code:`stdError`, :code:`meanStdError`, and others.
"""
url = utils._API_ROOT + utils._API_ENDPOINTS['get_pop_obs'] + '?dcid={}'.format(dcid)
return utils._send_request(url, compress=True, post=False)

def get_place_obs(place_type, observation_date, population_type, constraining_properties={}):
""" Returns all :obj:`Observation`'s for all places given the place type,
observation date and the :obj:`StatisticalPopulation` constraints.

Args:
place_type (:obj:`str`): The type of places to query
:obj:`StatisticalPopulation`'s and :obj:`Observation`'s for.
observation_date (:obj:`str`): The observation date in ISO-8601 format.
population_type (:obj:`str`): The population type of the
:obj:`StatisticalPopulation`
constraining_properties (:obj:`map` from :obj:`str` to :obj:`str`, optional):
A map from constraining property to the value that the
:obj:`StatisticalPopulation` should be constrained by.

Returns:
A list of dictionaries, with each dictionary containng *all*
:obj:`Observation`'s of a place that conform to the :obj:`StatisticalPopulation`
constraints. See examples for more details on how the format of the
return value is structured.

Raises:
ValueError: If the payload is malformed.

Examples:
We would like to get all :obj:`StatisticalPopulation` and
:obj:`Observations` for all places of type :obj:`City` in year 2017 where
the populations have a population type of :obj:`Person` is specified by the
following constraining properties.

- Persons should have `age <https://browser.datacommons.org/kg?dcid=age>`_
with value `Years5To17 <https://browser.datacommons.org/kg?dcid=Years5To17>`_
- Persons should have `placeOfBirth <https://browser.datacommons.org/kg?dcid=placeOfBirth>`_
with value BornInOtherStateInTheUnitedStates.

>>> props = {
... 'age': 'Years5To17',
... 'placeOfBirth': 'BornInOtherStateInTheUnitedStates'
... }
>>> get_place_obs('City', '2017', Person', constraining_properties=props)
[
{
'name': 'Marcus Hook borough',
'place': 'geoId/4247344',
'populations': {
'dc/p/pq6frs32sfvk': {
'observations': [
{
'marginOfError': 39,
'measuredProp': 'count',
'measuredValue': 67,
'type': 'Observation'
},
# More observations...
],
}
}
},
# Entries for more cities...
]

The value returned by :code:`get_place_obs` is a :obj:`list` of
:obj:`dict`'s. Each dictionary corresponds to a :obj:`StatisticalPopulation`
matching the given :code:`population_type` and
:code:`constraining_properties` for a single place of the given
:code:`place_type`. The dictionary contains the following keys.

- :code:`name`: The name of the place being described.
- :code:`place`: The dcid associated with the place being described.
- :code:`populations`: A :obj:`dict` mapping :code:`StatisticalPopulation`
dcids to a a :obj:`dict` with a list of :code:`observations`.

Each :obj:`Observation` is represented by a :obj:`dict` with the following
keys.
- :code:`measuredProp`: The property measured by the :obj:`Observation`.
- :code:`measurementMethod` (optional): A field identifying how the
:obj:`Observation` was made
- Additional fields that denote values measured by the :obj:`Observation`.
These may include the following: :code:`measuredValue`, :code:`meanValue`,
:code:`medianValue`, :code:`maxValue`, :code:`minValue`, :code:`sumValue`,
:code:`marginOfError`, :code:`stdError`, :code:`meanStdError`, and others.
"""
# Create the json payload and send it to the REST API.
pv = [{'property': k, 'value': v} for k, v in constraining_properties.items()]
url = utils._API_ROOT + utils._API_ENDPOINTS['get_place_obs']
payload = utils._send_request(url, req_json={
'place_type': place_type,
'observation_date': observation_date,
'population_type': population_type,
'pvs': pv,
}, compress=True)
return payload['places']
Loading