Using Unstack In Python
I am trying to unstack a column in python but it isn't quite doing what I am expecting. My table (called df) looks similar to this: station_id year Day1 Day2 210018
Solution 1:
You need to make year
an index before you call unstack:
try:
# for Python2from cStringIO import StringIO
except ImportError:
# for Python3from io import StringIO
import pandas as pd
text = '''\
station_id year Day1 Day2
210018 1916 4 7
210018 1917 3 9
256700 1916 NaN 8
256700 1917 6 9'''
df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id', 'year'])
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)
print(df2)
yields
year1916 1917Day1Day2Day1Day2station_id2100184739256700NaN869
whereas, if year
is a column, and not an index, then
df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id'])
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)
leads to AttributeError: 'Series' object has no attribute 'columns'
.
The level='year'
is ignored in df.unstack(level='year')
when df
does not have a index level named year
(or even, say, blah
):
In [102]:dfOut[102]:yearDay1Day2station_id2100181916 472100181917 392567001916 NaN82567001917 69In [103]:df.unstack(level='blah')Out[103]:station_idyear2100181916210018191725670019162567001917Day121001842100183256700NaN2567006Day22100187210018925670082567009dtype:float64
This is the source of the surprising error.
Post a Comment for "Using Unstack In Python"