Skip to content Skip to sidebar Skip to footer

How To Read Multiple Json Files Into Pandas Dataframe?

I'm having a hard time loading multiple line delimited JSON files into a single pandas dataframe. This is the code I'm using: import os, json import pandas as pd import numpy as np

Solution 1:

Change the last line to:

temp = temp.append(data, ignore_index = True)

The reason we have to do this is because the append doesn't happen in place. The append method does not modify the data frame. It just returns a new data frame with the result of the append operation.

Edit:

Since writing this answer I have learned that you should never use DataFrame.append inside a loop because it leads to quadratic copying (see this answer).

What you should do instead is first create a list of data frames and then use pd.concat to concatenate them all in a single operation. Like this:

dfs = [] # an empty list to store the data framesfor file in file_list:
    data = pd.read_json(file, lines=True) # read data frame from json file
    dfs.append(data) # append the data frame to the list

temp = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.

This alternative should be considerably faster.

Solution 2:

If you need to flatten the JSON, Juan Estevez’s approach won’t work as is. Here is an alternative :

import pandas as pd

dfs = []
for file in file_list:
    withopen(file) as f:
        json_data = pd.json_normalize(json.loads(f.read()))
    dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs

Or if your JSON are line-delimited (not tested) :

import pandas as pd

dfs = []
for file in file_list:
    withopen(file) as f:
        for line in f.readlines():
            json_data = pd.json_normalize(json.loads(line))
            dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs

Solution 3:

I combined Juan Estevez's answer with glob. Thanks a lot.

import pandas as pd
import glob

defreadFiles(path):
    files = glob.glob(path)
    dfs = [] # an empty list to store the data framesfor file in files:
        data = pd.read_json(file, lines=True) # read data frame from json file
        dfs.append(data) # append the data frame to the list

    df = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.return df

Solution 4:

Maybe you should state, if the json files are created themselves with pandas pd.to_json() or in another way. I used data which was not created with pd.to_json() and I think it is not pssible to use pd.read_json() in my case. Instead, I programmed a customized for-each loop approach to write everything to the DataFrames

Post a Comment for "How To Read Multiple Json Files Into Pandas Dataframe?"