1

I am trying to normalize a nested JSON file in pandas.
I am unable to get the ab_id column in the beginning as observed in the current output screenshot.
Additionally, since if I remove record prefix from my code, I am seeing an error and if I add it it generates a couple of columns which are empty.

The current and expected output is attached in the screenshot below:
enter image description here

Code used currently:

df=pd.json_normalize(data=response['val'],record_path=['activity'],meta=['msn','iis','ica','iada'],errors='ignore', record_prefix='_')

JSON file:

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }
            }
         ]
      }
   ]
}

Can someone please help out?
Thanks so much in advance.

3 Answers 3

2
import pandas as pd
from pandas import json_normalize

data = {
    "id": "ijewiofn23441312223",
    "val": [
        {
            "ab_id": "ab_123",
            "activity": [
                {
                    "msn": "acpfile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                },
                {
                    "msn": "adefile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                }
            ]
        },
        {
            "ab_id": "ab_421",
            "activity": [
                {
                    "msn": "adbfile_source_conn",
                    "iia": True,
                    "ica": True,
                    "iada": False
                },
                {
                    "msn": "aile_source_conn",
                    "iia": True,
                    "ica": False,
                    "iada": False
                }
            ]
        }
    ]
}
df = pd.json_normalize(data['val'], record_path=['activity'], meta=['ab_id'])
df = df[['ab_id'] + [col for col in df.columns if col != 'ab_id']]
df.columns = ['id'] + df.columns[1:].tolist()

print(df)
Sign up to request clarification or add additional context in comments.

Comments

1

You can try to use json module and construct the DataFrame manually:

import json

with open("data.json", "r") as f_in:
    data = json.load(f_in)

df = pd.DataFrame(
    [{"ab_id": v["ab_id"], **a} for v in data["val"] for a in v["activity"]]
)

print(df)

Prints:

    ab_id                  msn   iia    ica   iada
0  ab_123  acpfile_source_conn  True  False  False
1  ab_123  adefile_source_conn  True  False  False
2  ab_421  adbfile_source_conn  True   True  False
3  ab_421     aile_source_conn  True  False  False

Contents of data.json:

{
   "id":"ijewiofn23441312223",
   "val":[
      {
         "ab_id":"ab_123",
         "activity":[
            {
               "msn":"acpfile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            },
            {
               "msn":"adefile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }

         ]
      },
      {
         "ab_id":"ab_421",
         "activity":[
            {
               "msn":"adbfile_source_conn",
               "iia":true,
               "ica":true,
               "iada":false
            },
            {
               "msn":"aile_source_conn",
               "iia":true,
               "ica":false,
               "iada":false
            }

         ]
      }
   ]
}

Comments

1

There is two extra } in your json example. Also, this one doesn't match the I/O images.

But you can still try this :

df = pd.json_normalize(response["val"], "activity", "ab_id")
​
# if columns-order is important
df = df[np.roll(df.columns, 1)]

​ Output :

print(df)

    ab_id                  msn   iia    ica   iada
0  ab_123  acpfile_source_conn  True  False  False
1  ab_123  adefile_source_conn  True  False  False
2  ab_421  adbfile_source_conn  True   True  False
3  ab_421     aile_source_conn  True  False  False

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.