Skip to content Skip to sidebar Skip to footer

Using Unstack In Python

I am trying to unstack a column in python but it isn't quite doing what I am expecting. My table (called df) looks similar to this: station_id year Day1 Day2 210018

Solution 1:

You need to make year an index before you call unstack:

try:
    # for Python2from cStringIO import StringIO 
except ImportError:
    # for Python3from io import StringIO

import pandas as pd


text = '''\
station_id   year     Day1   Day2 
 210018       1916      4        7
 210018       1917      3        9 
 256700       1916     NaN       8
 256700       1917      6        9'''

df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id', 'year'])
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)
print(df2)

yields

year1916      1917Day1Day2Day1Day2station_id2100184739256700NaN869

whereas, if year is a column, and not an index, then

df = pd.read_table(StringIO(text), sep='\s+')
df = df.set_index(['station_id'])   
df2 = df.unstack(level='year')
df2.columns = df2.columns.swaplevel(0,1)
df2 = df2.sort(axis=1)

leads to AttributeError: 'Series' object has no attribute 'columns'.


The level='year' is ignored in df.unstack(level='year') when df does not have a index level named year (or even, say, blah):

In [102]:dfOut[102]:yearDay1Day2station_id2100181916     472100181917     392567001916   NaN82567001917     69In [103]:df.unstack(level='blah')Out[103]:station_idyear2100181916210018191725670019162567001917Day121001842100183256700NaN2567006Day22100187210018925670082567009dtype:float64

This is the source of the surprising error.

Post a Comment for "Using Unstack In Python"