Extremely Slow Sum Row Operation In Sparse Lil Matrix In Python
Solution 1:
First a demo with a very small matrix
In [523]: idx=np.arange(0,8,2)
In [526]: D=np.arange(24).reshape(8,3)
In [527]: Dll=sparse.lil_matrix(D)
In [528]: D[idx,:].sum(axis=0)
Out[528]: array([36, 40, 44])
In [529]: Dll[idx,:].sum(axis=0)
Out[529]: matrix([[36, 40, 44]], dtype=int32)
In [530]: timeit D[idx,:].sum(axis=0)
100000 loops, best of 3: 17.3 µs per loop
In [531]: timeit Dll[idx,:].sum(axis=0)
1000 loops, best of 3: 1.16 ms per loop
In [532]: score=np.zeros(3) # your looping version
In [533]: for i in idx:
.....: score = score + Dll[i,:]
In [534]: score
Out[534]: matrix([[ 36., 40., 44.]])
In [535]: %%timeit
.....: score=np.zeros(3)
.....: for i in idx:
score = score + Dll[i,:]
.....:
100 loops, best of 3: 2.76 ms per loop
For some operations the csr
format is faster:
In [537]: timeit Dll.tocsr()[idx,:].sum(axis=0)
1000 loops, best of 3: 955 µs per loop
or if I preconvert to csr:
In [538]: Dcsr=Dll.tocsr()
In [539]: timeit Dcsr[idx,:].sum(axis=0)
1000 loops, best of 3: 724 µs per loop
Still slow relative to dense.
I was going to talk about working with the data attributes of the sparse matrix as a way of selecting rows faster. But if the only purpose for selecting these rows is to sum their values we don't need to do that.
Sparse matrices sum on rows or columns by doing a matrix product with a column or row matrix of ones. And I just answered another question with the same answer.
https://stackoverflow.com/a/37120235/901925Efficiently compute columnwise sum of sparse array where every non-zero element is 1
For example:
In [588]: I=np.asmatrix(np.zeros((1,Dll.shape[0])))
In [589]: I[:,idx]=1
In [590]: I
Out[590]: matrix([[ 1., 0., 1., 0., 1., 0., 1., 0.]])
In [591]: I*Dll
Out[591]: matrix([[ 36., 40., 44.]])
In [592]: %%timeit
I=np.asmatrix(np.zeros((1,Dll.shape[0])))
I[:,idx]=1
I*Dll
.....:
1000 loops, best of 3: 919 µs per loop
For this small matrix it did not help the speed, but with the Dcsr
time drops to 215 µs
(it's much better for math). With large matrices this product version will improve.
=================
I just found out, in another question, that a A_csr[[1,1,0,3],:]
row selection is actually done with a matrix product. It constructs an 'extractor' csr matrix that looks like
matrix([[0, 1, 0, 0],
[0, 1, 0, 0],
[1, 0, 0, 0],
[0, 0, 0, 1]])
Post a Comment for "Extremely Slow Sum Row Operation In Sparse Lil Matrix In Python"