How To Read Multiple Json Files Into Pandas Dataframe?
Solution 1:
Change the last line to:
temp = temp.append(data, ignore_index = True)
The reason we have to do this is because the append doesn't happen in place. The append method does not modify the data frame. It just returns a new data frame with the result of the append operation.
Edit:
Since writing this answer I have learned that you should never use DataFrame.append
inside a loop because it leads to quadratic copying (see this answer).
What you should do instead is first create a list of data frames and then use pd.concat
to concatenate them all in a single operation. Like this:
dfs = [] # an empty list to store the data framesfor file in file_list:
data = pd.read_json(file, lines=True) # read data frame from json file
dfs.append(data) # append the data frame to the list
temp = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.
This alternative should be considerably faster.
Solution 2:
If you need to flatten the JSON, Juan Estevez’s approach won’t work as is. Here is an alternative :
import pandas as pd
dfs = []
for file in file_list:
withopen(file) as f:
json_data = pd.json_normalize(json.loads(f.read()))
dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs
Or if your JSON are line-delimited (not tested) :
import pandas as pd
dfs = []
for file in file_list:
withopen(file) as f:
for line in f.readlines():
json_data = pd.json_normalize(json.loads(line))
dfs.append(json_data)
df = pd.concat(dfs, sort=False) # or sort=True depending on your needs
Solution 3:
I combined Juan Estevez's answer with glob. Thanks a lot.
import pandas as pd
import glob
defreadFiles(path):
files = glob.glob(path)
dfs = [] # an empty list to store the data framesfor file in files:
data = pd.read_json(file, lines=True) # read data frame from json file
dfs.append(data) # append the data frame to the list
df = pd.concat(dfs, ignore_index=True) # concatenate all the data frames in the list.return df
Solution 4:
Maybe you should state, if the json files are created themselves with pandas pd.to_json() or in another way. I used data which was not created with pd.to_json() and I think it is not pssible to use pd.read_json() in my case. Instead, I programmed a customized for-each loop approach to write everything to the DataFrames
Post a Comment for "How To Read Multiple Json Files Into Pandas Dataframe?"