product of a dense matrix by a sparse vector too slow

Tue Jan 31, 2012 5:08 pm

Hello

I want to find a fast method to multiply a full dense matrix of size 50 by 1000 by a vector of size 1000 by 1 whose 80% of the coefficients are zeros. I tried several approaches (method 2 to 4 in the code below) but could not get it running faster that the normal matrix multiplication (method 1 in the code below) while i would expect a 80% speedup achievable.
I managed to get a very small speedup using pointers arithmetic, but the speedup is not as good as expected because the use of pointers arithmetic prevents the use of optimization based on SSE instructions (Streaming SIMD Extensions).
How could i get closer to 80% speedup using eigen accelerated functions?

thanks

Last edited by martin_IM on Wed Feb 01, 2012 11:10 am, edited 2 times in total.

martin_IM Registered Member Posts 9 Karma 0 OS	product of a dense matrix by a sparse vector too slow Tue Jan 31, 2012 5:08 pm Hello I want to find a fast method to multiply a full dense matrix of size 50 by 1000 by a vector of size 1000 by 1 whose 80% of the coefficients are zeros. I tried several approaches (method 2 to 4 in the code below) but could not get it running faster that the normal matrix multiplication (method 1 in the code below) while i would expect a 80% speedup achievable. I managed to get a very small speedup using pointers arithmetic, but the speedup is not as good as expected because the use of pointers arithmetic prevents the use of optimization based on SSE instructions (Streaming SIMD Extensions). How could i get closer to 80% speedup using eigen accelerated functions? thanks Code: Select all #include <Eigen/Core> #include <time.h> #include <iostream> #define EIGEN_YES_I_KNOW_SPARSE_MODULE_IS_NOT_STABLE_YET #include <Eigen/Sparse> void main(void){ Eigen::MatrixXd M=Eigen::MatrixXd::Random(64,1000); Eigen::VectorXd B=Eigen::VectorXd::Random(1000); B=(B.array()>0).select(B,0);// the vector B has 50% of coefficient equal to zero Eigen::VectorXd A; clock_t start; double duration; int niter=10000; // method 1 that does not take advantage of the fact that B contains 50% of zeros, // takes 450 ms for 10000 loops using SSE instruction set start=clock(); for (int i=0;i<niter;i++) {A=MB;} duration=double(clock()-start)/(double)CLOCKS_PER_SEC; std::cout<<"method 1 took "<<double(1000duration)<<"ms\n"; // method 2 that aim at takes advantage of the fact that B contains 50% of zeros, // but actually slower (takes 506 ms for 10000 loops using SSE instruction set) // probably slower because the col() method makes copies start=clock(); for (int i=0;i<niter;i++){ A.resize(50,1); A.setZero(); for(int i=0;i<B.rows();i++) { if (B[i]!=0) { A+=M.col(i)B[i] ; } }} duration=double(clock()-start)/(double)CLOCKS_PER_SEC; std::cout<<"method 2 took "<<double(1000duration)<<"ms\n"; // method 3 : using map instead of the method col to get a single column // as slow as methods 3 despite the use of map, it seems that there is // still a copy occurring... start=clock(); for (int i=0;i<niter;i++){ A.resize(50,1); A.setZero(); int nrows=M.rows(); for(int i=0;i<B.rows();i++) { if (B[i]!=0) { const Eigen::Map<Eigen::VectorXd,false> columnMap(const_cast <double> (M.data()+inrows),nrows); A+=columnMapB[i] ; } }} duration=double(clock()-start)/(double)CLOCKS_PER_SEC; std::cout<<"method 3 took "<<double(1000duration)<<"ms\n"; // method 4 : using a sparse representation for the vector B takes about 530ms using SSE instruction set start=clock(); Eigen::SparseVector<double> Bsparse(B.rows()); for(int i=0;i<B.rows();i++) { if (B[i]!=0){Bsparse.insert(i)=B[i];}} for (int i=0;i<niter;i++){ A=MBsparse; } duration=double(clock()-start)/(double)CLOCKS_PER_SEC; std::cout<<"method 4 took "<<double(1000duration)<<"ms\n"; // method 5 : using pointer arithmetic : 430ms fastest but not as fast a expected // maybe because it does not use SSE instruction speedups start=clock(); for (int i=0;i<niter;i++){ int nrows=M.rows(); A.resize(50,1); A.setZero(); double Aptr= A.data(); const double Mptr= M.data(); const double Bptr= B.data(); for(int j=M.cols();j>0;j--) { double tmp =(Bptr++); if (tmp!=0){ double Aptr2=Aptr; for(int i=nrows;i>0;i--) (Aptr2++)+=((Mptr++))tmp; } else Mptr+=nrows; } } duration=double(clock()-start)/(double)CLOCKS_PER_SEC; std::cout<<"method 5 took "<<double(1000*duration)<<"ms\n"; getchar(); } Last edited by martin_IM on Wed Feb 01, 2012 11:10 am, edited 2 times in total.
ggael Moderator Posts 3447 Karma 19 OS	Re: product of a dense matrix by a sparse vector too slow Tue Jan 31, 2012 5:21 pm This is because the default dense vectormatrix product is highly optimized, and your vector is not sparse enough to compensate for a more naive matrix-vector product algorithm. You might try to compress the matrix M wrt the sparsity of B such that: A = MB = M' * B' where B' contains no zero. But I'm afraid the extra copy will cost too much, let's try and see! Btw, M.col(j) returns a proxy, no copy occurs. Actually Map and M.col(i) are based on the same MapBase class, they are the same thing !
martin_IM Registered Member Posts 9 Karma 0 OS	Re: product of a dense matrix by a sparse vector too slow Tue Jan 31, 2012 6:45 pm I tried the solution proposed by ggael but my implementation is even slower due to copies i guess... Code: Select all // method 6 , using the submatrix of M corresponding to non zero values of B // takes 1700 ms start=clock(); for (int i=0;i<niter;i++) { // constructing the submatrix and the subvector int nnonzeros=0; for(int i=0;i<B.rows();i++) if (B[i]!=0){nnonzeros++;} Eigen::MatrixXd Msub; Msub.resize(64,nnonzeros); Eigen::VectorXd Bsub; Bsub.resize(nnonzeros); int j=0; for(int i=0;i<B.rows();i++) { if (B[i]!=0) { Msub.col(j)=M.col(i); j++; } } // making the product of the submatrix time the sub vector A=MsubBsub; } duration=double(clock()-start)/(double)CLOCKS_PER_SEC; std::cout<<"method 6 took "<<double(1000duration)<<"ms\n";

product of a dense matrix by a sparse vector too slow

Page 1 of 1 (3 posts)

product of a dense matrix by a sparse vector too slow

Re: product of a dense matrix by a sparse vector too slow

Re: product of a dense matrix by a sparse vector too slow

Bookmarks

Who is online