This forum has been archived. All content is frozen. Please use KDE Discuss instead.

product of a dense matrix by a sparse vector too slow

Tags: None
(comma "," separated)
martin_IM
Registered Member
Posts
9
Karma
0
OS
Hello

I want to find a fast method to multiply a full dense matrix of size 50 by 1000 by a vector of size 1000 by 1 whose 80% of the coefficients are zeros. I tried several approaches (method 2 to 4 in the code below) but could not get it running faster that the normal matrix multiplication (method 1 in the code below) while i would expect a 80% speedup achievable.
I managed to get a very small speedup using pointers arithmetic, but the speedup is not as good as expected because the use of pointers arithmetic prevents the use of optimization based on SSE instructions (Streaming SIMD Extensions).
How could i get closer to 80% speedup using eigen accelerated functions?

thanks

Code: Select all
#include <Eigen/Core>
#include <time.h>
#include <iostream>
#define EIGEN_YES_I_KNOW_SPARSE_MODULE_IS_NOT_STABLE_YET
#include <Eigen/Sparse>


void main(void){

  Eigen::MatrixXd M=Eigen::MatrixXd::Random(64,1000);
  Eigen::VectorXd B=Eigen::VectorXd::Random(1000);
  B=(B.array()>0).select(B,0);// the vector B has 50% of coefficient equal to zero
  Eigen::VectorXd A;

  clock_t start;
  double duration;   
  int niter=10000;


  // method 1 that does not take advantage of the fact that B contains 50% of zeros,
  // takes 450 ms for 10000 loops using SSE instruction set
  start=clock();
  for (int i=0;i<niter;i++)
  {A=M*B;}
  duration=double(clock()-start)/(double)CLOCKS_PER_SEC;
  std::cout<<"method 1 took "<<double(1000*duration)<<"ms\n";


  // method 2 that aim at takes advantage of the fact that B contains 50% of zeros,
  // but actually slower (takes 506 ms for 10000 loops using SSE instruction set)
  // probably slower because the col() method makes copies
  start=clock();
  for (int i=0;i<niter;i++){
    A.resize(50,1);
    A.setZero();
    for(int i=0;i<B.rows();i++)   
    { if (B[i]!=0)
    {
      A+=M.col(i)*B[i] ;           
    }
    }}
  duration=double(clock()-start)/(double)CLOCKS_PER_SEC;
  std::cout<<"method 2 took "<<double(1000*duration)<<"ms\n";


  // method 3 : using map instead of the method col to get a single column
  // as slow as methods 3 despite the use of map, it seems that there is
  // still a   copy occurring...
  start=clock();
  for (int i=0;i<niter;i++){
    A.resize(50,1);
    A.setZero();
    int nrows=M.rows();
    for(int i=0;i<B.rows();i++)   
    { if (B[i]!=0)
    { const Eigen::Map<Eigen::VectorXd,false> columnMap(const_cast <double*> (M.data()+i*nrows),nrows);
    A+=columnMap*B[i]  ;           
    }
    }}
  duration=double(clock()-start)/(double)CLOCKS_PER_SEC;
  std::cout<<"method 3 took "<<double(1000*duration)<<"ms\n";



  // method 4 : using a sparse representation for the vector B takes about 530ms using SSE instruction set
  start=clock();
  Eigen::SparseVector<double> Bsparse(B.rows());
  for(int i=0;i<B.rows();i++)   
  { if (B[i]!=0){Bsparse.insert(i)=B[i];}}
  for (int i=0;i<niter;i++){   
    A=M*Bsparse;
  }
  duration=double(clock()-start)/(double)CLOCKS_PER_SEC;
  std::cout<<"method 4 took "<<double(1000*duration)<<"ms\n";   


  // method 5 : using pointer arithmetic : 430ms fastest but not as fast a expected
  // maybe because it does not use SSE instruction speedups
  start=clock();
  for (int i=0;i<niter;i++){
    int nrows=M.rows();
    A.resize(50,1);
    A.setZero();
    double *Aptr= A.data();   
    const double *Mptr= M.data();
    const double *Bptr= B.data();
    for(int j=M.cols();j>0;j--)   
    {
      double tmp =(*Bptr++);
      if (tmp!=0){
        double *Aptr2=Aptr;
        for(int i=nrows;i>0;i--)
          *(Aptr2++)+=(*(Mptr++))*tmp;       
      }
      else   
        Mptr+=nrows;     
    }
  }
  duration=double(clock()-start)/(double)CLOCKS_PER_SEC;
  std::cout<<"method 5 took "<<double(1000*duration)<<"ms\n";   
  getchar();

}

   

Last edited by martin_IM on Wed Feb 01, 2012 11:10 am, edited 2 times in total.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
This is because the default dense vector*matrix product is highly optimized, and your vector is not sparse enough to compensate for a more naive matrix-vector product algorithm. You might try to compress the matrix M wrt the sparsity of B such that:

A = M*B = M' * B'

where B' contains no zero. But I'm afraid the extra copy will cost too much, let's try and see!


Btw, M.col(j) returns a proxy, no copy occurs. Actually Map and M.col(i) are based on the same MapBase class, they are the same thing !
martin_IM
Registered Member
Posts
9
Karma
0
OS
I tried the solution proposed by ggael but my implementation is even slower due to copies i guess...


Code: Select all
  // method 6 , using the submatrix of M corresponding to non zero values of B
  // takes 1700 ms
  start=clock();
  for (int i=0;i<niter;i++)
  {
    // constructing the submatrix and the subvector
    int nnonzeros=0;
    for(int i=0;i<B.rows();i++)   
      if (B[i]!=0){nnonzeros++;}   
    Eigen::MatrixXd Msub;
    Msub.resize(64,nnonzeros);
    Eigen::VectorXd Bsub;
    Bsub.resize(nnonzeros);   
    int j=0;
    for(int i=0;i<B.rows();i++)   
    { if (B[i]!=0)
    {
      Msub.col(j)=M.col(i);
      j++;
    }
    }
    // making the product of the submatrix time the sub vector
    A=Msub*Bsub;
  }
 
  duration=double(clock()-start)/(double)CLOCKS_PER_SEC;
  std::cout<<"method 6 took "<<double(1000*duration)<<"ms\n";
   


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot]