Compute Matrix Of Pairwise Angles Between Two Arrays Of Points
Solution 1:
There are multiple ways to do this:
import numpy.linalg as la
from scipy.spatial import distance as dist
# Manuallydefmethod0(x, y):
dotprod_mat = np.dot(x, y.T)
costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
costheta /= la.norm(y, axis=1)
return np.arccos(costheta)
# Using einsumdefmethod1(x, y):
dotprod_mat = np.einsum('ij,kj->ik', x, y)
costheta = dotprod_mat / la.norm(x, axis=1)[:, np.newaxis]
costheta /= la.norm(y, axis=1)
return np.arccos(costheta)
# Using scipy.spatial.cdist (one-liner)defmethod2(x, y):
costheta = 1 - dist.cdist(x, y, 'cosine')
return np.arccos(costheta)
# Realize that your arrays `x` and `y` are already normalized, meaning you can# optimize method1 even moredefmethod3(x, y):
costheta = np.einsum('ij,kj->ik', x, y) # Directly gives costheta, since# ||x|| = ||y|| = 1return np.arccos(costheta)
Timing results for (n, m) = (1212, 252):
>>>%timeit theta = method0(x, y)
100 loops, best of 3: 11.1 ms per loop
>>>%timeit theta = method1(x, y)
100 loops, best of 3: 10.8 ms per loop
>>>%timeit theta = method2(x, y)
100 loops, best of 3: 12.3 ms per loop
>>>%timeit theta = method3(x, y)
100 loops, best of 3: 9.42 ms per loop
The difference in timing reduces as the number of elements increases. For (n, m) = (6252, 1212):
>>>%timeit -n10 theta = method0(x, y)
10 loops, best of 3: 365 ms per loop
>>>%timeit -n10 theta = method1(x, y)
10 loops, best of 3: 358 ms per loop
>>>%timeit -n10 theta = method2(x, y)
10 loops, best of 3: 384 ms per loop
>>>%timeit -n10 theta = method3(x, y)
10 loops, best of 3: 314 ms per loop
However, if you leave out the np.arccos
step, i.e., suppose you could manage with just costheta
, and didn't needtheta
itself, then:
>>>%timeit costheta = np.einsum('ij,kj->ik', x, y)
10 loops, best of 3: 61.3 ms per loop
>>>%timeit costheta = 1 - dist.cdist(x, y, 'cosine')
10 loops, best of 3: 124 ms per loop
>>>%timeit costheta = dist.cdist(x, y, 'cosine')
10 loops, best of 3: 112 ms per loop
This is for the case of (6252, 1212). So actually np.arccos
is taking up 80% of the time. In this case I find that np.einsum
is much faster than dist.cdist
. So you definitely want to be using einsum
.
Summary: Results for theta
are largely similar, but np.einsum
is fastest for me, especially when you're not extraneously computing the norms. Try to avoid computing theta
and working with just costheta
.
Note: An important point I didn't mention is that finiteness of floating-point precision can cause np.arccos
to give nan
values. method[0:3]
worked for values of x
and y
that hadn't been properly normalized, naturally. But method3
gave a few nan
s. I fixed this with pre-normalization, which naturally destroys any gain in using method3
, unless you need to do this computation many many times for a small set of pre-normalized matrices (for whatever reason).
Post a Comment for "Compute Matrix Of Pairwise Angles Between Two Arrays Of Points"