Pandas Restacking Repeated Values To Columns
The below DataFrame needs to be restacked, so that I have all values for each region on one line. In the below example the new df would only have 3 lines, one for each region. The
Solution 1:
You could groupby
on 'Area' and apply
list
:
In[75]:
df.groupby('Area')['value'].apply(list).reset_index()
Out[75]:
Areavalue0AMERICAS[37, 24]1ASIA[51, 22]2EUROPE[47, 39]
This will handle a variable number of values
If you want to split the values out you can call apply
and pass pd.Series
ctor:
In [90]:
df1 = df.groupby('Area')['value'].apply(lambda x: list(x)).reset_index()
df1[['val1', 'val2']] = df1['value'].apply(pd.Series)
df1
Out[90]:
Area value val1 val2
0 AMERICAS [37, 24] 37241 ASIA [51, 22] 51222 EUROPE [47, 39] 4739
EDIT
For a variable number of columns you can't assign upfront if you don't know what the max number of values will be but you can still use the above:
In [94]:
import io
import pandas as pd
t="""index Area value
0 EUROPE 47
1 ASIA 51
2 AMERICAS 37
3 EUROPE 39
4 ASIA 22
5 AMERICAS 24
5 AMERICAS 50"""
df = pd.read_csv(io.StringIO(t), sep='\s+')
df
Out[94]:
index Area value
00 EUROPE 4711 ASIA 5122 AMERICAS 3733 EUROPE 3944 ASIA 2255 AMERICAS 2465 AMERICAS 50
In [99]:
df1 = df.groupby('Area')['value'].apply(list).reset_index()
df1
Out[99]:
Area value
0 AMERICAS [37, 24, 50]
1 ASIA [51, 22]
2 EUROPE [47, 39]
In [102]:
df1 = pd.concat([df1, df1['value'].apply(pd.Series).fillna(0)], axis=1)
df1
Out[102]:
Area value 0120 AMERICAS [37, 24, 50] 3724501 ASIA [51, 22] 512202 EUROPE [47, 39] 47390
Post a Comment for "Pandas Restacking Repeated Values To Columns"