Skip to content Skip to sidebar Skip to footer

How To Filter By Sub-level Index In Pandas

I have a 'df' which have a multilevel index (STK_ID,RPT_Date) sales cogs net_pft STK_ID RPT_Date 000876 200

Solution 1:

To use the "str.*" methods on a column, you could reset the index, filter rows with a column "str.*" method call, and re-create the index.

In [72]: x = df.reset_index(); x[x.RPT_Date.str.endswith("0630")].set_index(['STK_ID', 'RPT_Date'])
Out[72]: 
                      sales        cogs    net_pft
STK_ID RPT_Date                                   
000876 20060630   857483000   729541000   67157200
       20070630  1146245000  1050808000  113468500
       20080630  1932470000  1777010000  133756300
002254 20070630   501221000   289167000  118012200

However, this approach is not particularly fast.

In [73]: timeit x = df.reset_index(); x[x.RPT_Date.str.endswith("0630")].set_index(['STK_ID', 'RPT_Date'])
1000 loops, best of 3: 1.78 ms per loop

Another approach builds on the fact that a MultiIndex object behaves much like a list of tuples.

In [75]: df.index
Out[75]: 
MultiIndex
[('000876', '20060331') ('000876', '20060630') ('000876', '20060930')
 ('000876', '20061231') ('000876', '20070331') ('000876', '20070630')
 ('000876', '20070930') ('000876', '20071231') ('000876', '20080331')
 ('000876', '20080630') ('000876', '20080930') ('002254', '20061231')
 ('002254', '20070331') ('002254', '20070630') ('002254', '20070930')]

Building on that, you can create a boolean array from a MultiIndex with df.index.map() and use the result to filter the frame.

In [76]: df[df.index.map(lambda x: x[1].endswith("0630"))]
Out[76]: 
                      sales        cogs    net_pft
STK_ID RPT_Date                                   
000876 20060630   857483000   729541000   67157200
       20070630  1146245000  1050808000  113468500
       20080630  1932470000  1777010000  133756300
002254 20070630   501221000   289167000  118012200

This is also quite a bit faster.

In [77]: timeit df[df.index.map(lambda x: x[1].endswith("0630"))]
1000 loops, best of 3: 240 us per loop

Post a Comment for "How To Filter By Sub-level Index In Pandas"