python json.loads to pandas dataframe

Question

I have a URL that returns JSON data as follows:

{
    u 'fields': [{
            u 'keyField': False,
            u 'name': u '_blockid',
            u 'fieldType': u 'long'
        }, {
            u 'keyField': False,
            u 'name': u '_collector',
            u 'fieldType': u 'string'
        }, {
            u 'keyField': False,
            u 'name': u '_collectorid',
            u 'fieldType': u 'long'
        }, {
            u 'keyField': False,
            u 'name': u '_messageid',
            u 'fieldType': u 'long'
        }
    ],
    u 'messages': [{
            u 'map': {
                u '_messageid': u '-9223368783568280026',
                u '_collectorid': u '135927517',
                u '_blockid': u '-9223372036519990555',
                u '_collector': u 'collector1',
            }
        }, {
            u 'map': {
                u '_messageid': u '-92233645345280026',
                u '_collectorid': u '13545342517',
                u '_blockid': u '-92234254242343219990555',
                u '_collector': u 'collector2',
            }
        }
    ]
}

That's a snippet. The real JSON contains thousands of values under ['messages']['map']

I have a script that runs as follows

rJSON = requests.get(JsonURL, auth=(username, password))
DATA = json.loads(rJSON.text)
for x in DATA[u'messages']:
    print type(x[u'map'])
    for i in x[u'map']:
        print np.isscalar(x[u'map'][i])

    df = pd.DataFrame.from_dict(x[u'map'])
    break ### TESTING ###

This outputs the following

<type 'dict'>
True
True
True
True
True
True
True
True
True
True
True
True
True
True
True

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-151-1b71c28d4d83> in <module>()
     11     for i in x[u'map']:
     12         print np.isscalar(q[i])
---> 13     df = pd.DataFrame.from_dict(x[u'map'])
     14 
     15     #if isinstance(msgData, pd.DataFrame): # If the variable is a dataframe, append to it...

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in from_dict(cls, data, orient, dtype)
    849             raise ValueError('only recognize index or columns for orient')
    850 
--> 851         return cls(data, index=index, columns=columns, dtype=dtype)
    852 
    853     def to_dict(self, orient='dict'):

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy)
    273                                  dtype=dtype, copy=copy)
    274         elif isinstance(data, dict):
--> 275             mgr = self._init_dict(data, index, columns, dtype=dtype)
    276         elif isinstance(data, ma.MaskedArray):
    277             import numpy.ma.mrecords as mrecords

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _init_dict(self, data, index, columns, dtype)
    409             arrays = [data[k] for k in keys]
    410 
--> 411         return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
    412 
    413     def _init_ndarray(self, values, index, columns, dtype=None, copy=False):

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
   5494     # figure out the index, if necessary
   5495     if index is None:
-> 5496         index = extract_index(arrays)
   5497     else:
   5498         index = _ensure_index(index)

C:\Users\USERID\AppData\Local\Continuum\Anaconda2\lib\site-packages\pandas\core\frame.pyc in extract_index(data)
   5533 
   5534         if not indexes and not raw_lengths:
-> 5535             raise ValueError('If using all scalar values, you must pass'
   5536                              ' an index')
   5537 

ValueError: If using all scalar values, you must pass an index

I understand it's mad because the dictionary contains scalar values, but I can't figure out why they are being loaded into the dictionary by json.loads() as a scalar, or how to convert them from scalar to strings.

My end goal is to take all of the ['messages']['map'] data and pd.concat them in the loop into 1 large dataframe that I can analyze.

Is it possible to stop json.loads from loading them as scalars? Or is there a way to convert them from scalars to something else that can be loaded into a data frame?

Try the orient='index' parameter ?

ako
– ako

2017-09-26 00:05:34 +00:00
Commented Sep 26, 2017 at 0:05 — ako
– ako, Commented Sep 26, 2017 at 0:05

akuiper · Accepted Answer · 2017-09-26 00:47:12Z

2

The messages in the data is a list of dictionaries, you can load it with DataFrame.from_records and then use apply(pd.Series) to convert the inner dictionaries to rows of the final data frame:

pd.DataFrame.from_records(data['messages']).map.apply(pd.Series)

#                   _blockid  _collector _collectorid            _messageid
#0      -9223372036519990555  collector1    135927517  -9223368783568280026
#1  -92234254242343219990555  collector2  13545342517    -92233645345280026

answered Sep 26, 2017 at 0:47

akuiper

216k33 gold badges363 silver badges380 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

python json.loads to pandas dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related