Calculate Mean Sales Per Minute Accross 24-hour Cycles As Per Local-time (hh:mm)
In this example we have two days of data sampled at a resolution of 1min, giving us 2880 measurements. The measurements are collected across multiple timezones sequentially: the fi
Solution 1:
Here is an approach to get what I think you want. This requires pandas 0.17.0
Create the data as you have aboe
import pandas as pd
import numpy as np
pd.options.display.max_rows=12
np.random.seed(1234)
df=pd.DataFrame(index=pd.DatetimeIndex(pd.date_range('2015-03-29 00:00','2015-03-30 23:59',freq='1min',tz='UTC')))
df.loc['2015-03-29 00:00':'2015-03-29 04:00','timezone']='Europe/London'
df.loc['2015-03-29 04:00':'2015-03-30 23:59','timezone']='America/Los_Angeles'
df['sales1']=np.random.random_integers(100,size=len(df))
df['sales2']=np.random.random_integers(10,size=len(df))
In [79]: df
Out[79]:
timezone sales1 sales2
2015-03-2900:00:00+00:00 Europe/London 4862015-03-2900:01:00+00:00 Europe/London 8412015-03-2900:02:00+00:00 Europe/London 3912015-03-2900:03:00+00:00 Europe/London 54102015-03-2900:04:00+00:00 Europe/London 7752015-03-2900:05:00+00:00 Europe/London 259... ... ... ...
2015-03-3023:54:00+00:00 America/Los_Angeles 7782015-03-3023:55:00+00:00 America/Los_Angeles 1642015-03-3023:56:00+00:00 America/Los_Angeles 5532015-03-3023:57:00+00:00 America/Los_Angeles 1812015-03-3023:58:00+00:00 America/Los_Angeles 322015-03-3023:59:00+00:00 America/Los_Angeles 522
[2880 rows x 3 columns]
Pivot according to the timezone; this creates a multi-index with the timezone separated
x=pd.pivot_table(df.reset_index(),values=['sales1','sales2'],index='index',columns='timezone').swaplevel(0,1,axis=1)x.columns.names= ['timezone','sales']
In [82]:xOut[82]:timezoneAmerica/Los_AngelesEurope/LondonAmerica/Los_AngelesEurope/Londonsalessales1sales1sales2sales2index2015-03-29 00:00:00+00:00NaN48NaN62015-03-29 00:01:00+00:00NaN84NaN12015-03-29 00:02:00+00:00NaN39NaN12015-03-29 00:03:00+00:00NaN54NaN102015-03-29 00:04:00+00:00NaN77NaN52015-03-29 00:05:00+00:00NaN25NaN9...............2015-03-30 23:54:00+00:0077NaN8NaN2015-03-30 23:55:00+00:0016NaN4NaN2015-03-30 23:56:00+00:0055NaN3NaN2015-03-30 23:57:00+00:0018NaN1NaN2015-03-30 23:58:00+00:003NaN2NaN2015-03-30 23:59:00+00:0052NaN2NaN
[2880 rowsx4columns]
Create the groupers that we want to use, namely hours and minutes in the local zone. We are going to populate them according to the mask, IOW. where both sales1/sales2 are notnull, we will use the hours/minutes for that (local) zone
hours = pd.Series(index=x.index)
minutes = pd.Series(index=x.index)
for tz in ['America/Los_Angeles', 'Europe/London' ]:
local = df.index.tz_convert(tz)
x[(tz,'tz')] = local
mask = x[(tz,'sales1')].notnull() & x[(tz,'sales2')].notnull()
hours.iloc[mask.values] = local.hour[mask.values]
minutes.iloc[mask.values] = local.minute[mask.values]
x = x.sortlevel(axis=1)
After the above. (Note this could be a bit simplified, meaning that we don't need to actually record the local timezone, just compute hours/minutes).
Out[84]:timezoneAmerica/Los_AngelesEurope/Londonsalessales1sales2tzsales1sales2tzindex2015-03-29 00:00:00+00:00NaNNaN2015-03-28 17:00:00-07:004862015-03-29 00:00:00+00:002015-03-29 00:01:00+00:00NaNNaN2015-03-28 17:01:00-07:008412015-03-29 00:01:00+00:002015-03-29 00:02:00+00:00NaNNaN2015-03-28 17:02:00-07:003912015-03-29 00:02:00+00:002015-03-29 00:03:00+00:00NaNNaN2015-03-28 17:03:00-07:0054102015-03-29 00:03:00+00:002015-03-29 00:04:00+00:00NaNNaN2015-03-28 17:04:00-07:007752015-03-29 00:04:00+00:002015-03-29 00:05:00+00:00NaNNaN2015-03-28 17:05:00-07:002592015-03-29 00:05:00+00:00.....................2015-03-30 23:54:00+00:007782015-03-30 16:54:00-07:00NaNNaN2015-03-31 00:54:00+01:002015-03-30 23:55:00+00:001642015-03-30 16:55:00-07:00NaNNaN2015-03-31 00:55:00+01:002015-03-30 23:56:00+00:005532015-03-30 16:56:00-07:00NaNNaN2015-03-31 00:56:00+01:002015-03-30 23:57:00+00:001812015-03-30 16:57:00-07:00NaNNaN2015-03-31 00:57:00+01:002015-03-30 23:58:00+00:00322015-03-30 16:58:00-07:00NaNNaN2015-03-31 00:58:00+01:002015-03-30 23:59:00+00:005222015-03-30 16:59:00-07:00NaNNaN2015-03-31 00:59:00+01:00
[2880 rowsx6columns]
This uses the new representation for timezones (in 0.17.0).
In[85]: x.dtypesOut[85]:
timezonesalesAmerica/Los_Angelessales1float64sales2float64tzdatetime64[ns, America/Los_Angeles]Europe/Londonsales1float64sales2float64tzdatetime64[ns, Europe/London]dtype: object
Results
x.groupby([hours,minutes]).mean()
timezone America/Los_Angeles Europe/London
sales sales1 sales2 sales1 sales2
0062.55.5486152.07.0841289.03.5391367.56.55410441.05.5775581.05.5259
... ... ... ... ...
235476.54.5NaNNaN5537.55.0NaNNaN5660.58.0NaNNaN5787.57.0NaNNaN5877.56.0NaNNaN5931.05.5NaNNaN[1440 rows x 4 columns]
Post a Comment for "Calculate Mean Sales Per Minute Accross 24-hour Cycles As Per Local-time (hh:mm)"