0

Trying to convert nested JSON into columns of the current data frame. How do I accomplish this? Note: the JSON Functions list of dictionaries repeats 600+ times

{
"Functions": [
    {
        "CodeSha256": "",
        "CodeSize": 
        "Description": "",
        "Environment": {
            "Variables": {

                "COMMIT_HASH": ",
                "CodeSha256": "",
                "ECS_LOG_STREAM": "",
                "ELASTIC_SEARCH_DOMAIN_ENDPOINT": "",
                "ENVIRONMENT": "prod",
                "SERVICE_NAME": "testingservicename",
                "SERVICE_OWNER": "testingserviceowner",

            }
        },
        
        "FunctionName": "demofunctionname",
        "Timeout": ,
        "TracingConfig": {
            "Mode": 
        },
        "Version": "",
        "VpcConfig": {
            "SecurityGroupIds": [
                ""
            ],
            "SubnetIds": [
                "",
                "",
                ""
            ],
            "VpcId": ""
        }
    }

] }

How I load the json

data = json.load(open('../fileservice.json'))
df = pd.DataFrame(data["Functions"])

How it appears

    FunctionName                                    Environment
0 demofunctionname      {Variables{"COMMIT_HASH":"djkdkd","SERVICE_OWNER":"serviceownertest"}}                 

How I need it to appear

        FunctionName                               COMMIT_HASH       SERVICE_OWNER      
0      demofunctionname                            djkdkd             serviceownertest

Been trying to explode method but does not get the job done. Any suggestion or guidance is much appreciated.

1
  • 1
    Take a look at pd.json_normalize() Commented Jul 19, 2021 at 4:40

2 Answers 2

1

Take a look at pd.json_normalize(). It is a very nice tool. In your case:

pd.json_normalize(s["Functions"])

will give the following output (transformed and first row only):

CodeSha256                                                             
CodeSize                                                               
Description                                                            
FunctionName                                           demofunctionname
Timeout                                                                
Version                                                                
Environment.Variables.COMMIT_HASH                                  test
Environment.Variables.CodeSha256                                       
Environment.Variables.ECS_LOG_STREAM                                   
Environment.Variables.ELASTIC_SEARCH_DOMAIN_END...                     
Environment.Variables.ENVIRONMENT                                  prod
Environment.Variables.SERVICE_NAME                   testingservicename
Environment.Variables.SERVICE_OWNER                 testingserviceowner
TracingConfig.Mode                                                     
VpcConfig.SecurityGroupIds                                           []
VpcConfig.SubnetIds                                              [, , ]
VpcConfig.VpcId                                                        
Sign up to request clarification or add additional context in comments.

Comments

1

I dont think this is the best way to do it. But here is the solution

s={
    "Functions": [
    {
        "CodeSha256": "",
        "CodeSize": "",
        "Description": "",
        "Environment": {
            "Variables": {
                "COMMIT_HASH": "test",
                "CodeSha256": "",
                "ECS_LOG_STREAM": "",
                "ELASTIC_SEARCH_DOMAIN_ENDPOINT": "",
                "ENVIRONMENT": "prod",
                "SERVICE_NAME": "testingservicename",
                "SERVICE_OWNER": "testingserviceowner",

            }
        },
        
        "FunctionName": "demofunctionname",
        "Timeout": "" ,
        "TracingConfig": {
            "Mode": ""
        },
        "Version": "",
        "VpcConfig": {
            "SecurityGroupIds": [
                ""
            ],
            "SubnetIds": [
                "",
                "",
                ""
            ],
            "VpcId": ""
        }
    }
] }

import pandas as pd
import json

s = json.dumps(s)
data = json.loads(s)

result={'FunctionName': data["Functions"][0]["FunctionName"], 'COMMIT_HASH': data["Functions"][0]["Environment"]["Variables"]["COMMIT_HASH"], 'SERVICE_OWNER': data["Functions"][0]["Environment"]["Variables"]["SERVICE_OWNER"]}

df = pd.DataFrame(data= [result])
print(df)

Notes: I think your json has some problem, I have already fixed it in my solution.

Edit version

Here is the code for the multiple "Functions"

s={
    "Functions": [
        {
            "CodeSha256": "",
            "CodeSize": "",
            "Description": "",
            "Environment": {
                "Variables": {
                    "COMMIT_HASH": "test",
                    "CodeSha256": "",
                    "ECS_LOG_STREAM": "",
                    "ELASTIC_SEARCH_DOMAIN_ENDPOINT": "",
                    "ENVIRONMENT": "prod",
                    "SERVICE_NAME": "testingservicename",
                    "SERVICE_OWNER": "testingserviceowner",
    
                }
            },
            
            "FunctionName": "demofunctionname",
            "Timeout": "" ,
            "TracingConfig": {
                "Mode": ""
            },
            "Version": "",
            "VpcConfig": {
                "SecurityGroupIds": [
                    ""
                ],
                "SubnetIds": [
                    "",
                    "",
                    ""
                ],
                "VpcId": ""
            }
        },
        {
            "CodeSha256": "",
            "CodeSize": "",
            "Description": "",
            "Environment": {
                "Variables": {
                    "COMMIT_HASH": "test",
                    "CodeSha256": "",
                    "ECS_LOG_STREAM": "",
                    "ELASTIC_SEARCH_DOMAIN_ENDPOINT": "",
                    "ENVIRONMENT": "prod",
                    "SERVICE_NAME": "testingservicename",
                    "SERVICE_OWNER": "testingserviceowner",
    
                }
            },
            
            "FunctionName": "demofunctionname 1",
            "Timeout": "" ,
            "TracingConfig": {
                "Mode": ""
            },
            "Version": "",
            "VpcConfig": {
                "SecurityGroupIds": [
                    ""
                ],
                "SubnetIds": [
                    "",
                    "",
                    ""
                ],
                "VpcId": ""
            }
        }
    ]
}

import pandas as pd
import json

s = json.dumps(s)
data = json.loads(s)

function_name_list=[]
commit_hash_list=[]
service_owner_list=[]

for i in range(len(data["Functions"])):
    function_name_list.append(data["Functions"][i]["FunctionName"])
    commit_hash_list.append(data["Functions"][i]["Environment"]["Variables"]["COMMIT_HASH"])
    service_owner_list.append(data["Functions"][i]["Environment"]["Variables"]["SERVICE_OWNER"])

result={'FunctionName': function_name_list, 'COMMIT_HASH': commit_hash_list, 'SERVICE_OWNER': service_owner_list}

df = pd.DataFrame(list(zip(function_name_list, commit_hash_list, service_owner_list)),
               columns =['FunctionName', 'COMMIT_HASH', 'SERVICE_OWNER'])
print(df)

2 Comments

Thank you Liawi. This does seem to work for item [0] . How would this be done for a list with the same information 600+ times?
You can iterate them, I have edited the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.