Skip to content Skip to sidebar Skip to footer

How To Apply Euclidean Distance Function To A Groupby Object In Pandas Dataframe?

I have a set of objects and their positions over time. I would like to get the average distance between objects for each time point. An example dataframe is as follows: time = [0,

Solution 1:

You could pass an array of the points to scipy.spatial.distaince.pdist and it will calculate all pair-wise distances between Xi and Xj for i>j. Then take the mean.

import numpy as np
from scipy import spatial

df.groupby('time').apply(lambda x: spatial.distance.pdist(np.array(list(zip(x.x, x.y)))).mean())

Outputs:

time01.550094110.049876253.037722
dtype: float64

Solution 2:

For me using apply or for loop does not have much different

l1=[]
l2=[]

for y,x in df.groupby('time'):
    v=np.triu(spatial.distance.cdist(x[['x','y']].values, x[['x','y']].values),k=0)

    v = np.ma.masked_equal(v, 0)
    l2.append(np.mean(v))
    l1.append(y)


pd.DataFrame({'ave':l2},index=l1)

Out[250]: 
         ave
01.550094110.049876253.037722

Solution 3:

building this up from the first principles:

For each point at index n, it is necessary to compute the distance with all the points with index > n.

if the distance between two points is given by formula:

np.sqrt((x0 - x1)**2 + (y0 - y1)**2)

then for an array of points in a dataframe, we can get all the distances & then calculate its mean:

distances = []
for i in range(len(df)-1):
    distances += np.sqrt( (df.x[i+1:] - df.x[i])**2 + (df.y[i+1:] - df.y[i])**2 ).tolist()

np.mean(distances)

expressing the same logic using pd.concat & a couple of helper functions

def diff_sq(x, i):
    return (x.iloc[i+1:] - x.iloc[i])**2

def dist_df(x, y, i):
    d_sq = diff_sq(x, i) + diff_sq(y, i)return np.sqrt(d_sq)

def avg_dist(df):
    return pd.concat([dist_df(df.x, df.y, i) for i in range(len(df)-1)]).mean()

then it is possible to use the avg_dist function with groupby

df.groupby('time').apply(avg_dist)
# outputs:time01.550094110.049876253.037722
dtype: float64

Solution 4:

You could also use the itertools package to define your own function as follow:

 import itertools
 import numpy as np

 def combinations(series):
        l = list()
        for item in itertools.combinations(series,2):
            l.append(((item[0] - item[1])**2))
        return l

df2 = df.groupby('time').agg(combinations)
df2['avg_distance'] = [np.mean(np.sqrt(pd.Series(df2.iloc[k,0]) + 
pd.Series(df2.iloc[k,1]))) for k in range(len(df2))]

df2.avg_distance.to_frame()

Then, the output is:

    avg_distance
time01.550094110.049876253.037722

Post a Comment for "How To Apply Euclidean Distance Function To A Groupby Object In Pandas Dataframe?"