This forum has been archived. All content is frozen. Please use KDE Discuss instead.

speed issue with operations using columns of a matrix

Tags: None
(comma "," separated)
martin_IM
Registered Member
Posts
9
Karma
0
OS
Hello,

I want to do operations that involves columns of a matrix and i consistently find that the speed using eigen is about 5 time slower that a loop based implementation using pointers.
What am i doing wrong ? is the method .col() doing a copy ? if yes why ?
Using the code below i get these timings:

b=A.col(idcol)*5.0

Eigen : 50ms
Loop : 8ms




Code: Select all
#include <Eigen/Core>
#include <iostream>
#include <time.h>


void main(void)
{
   Eigen::MatrixXd A(100,100);
   Eigen::VectorXd b(100,1);
   int idcol=10;
   clock_t start;
   double duration;


    std::cout<<"b=A.col(idcol)*5.0 \n";

    // Eigen implementation :
   start=clock();
   for (int t=0;t<100000;t++)
   {
      b=A.col(idcol)*5.0;
   }
   duration=double(clock()-start)/((double)CLOCKS_PER_SEC);
    std::cout<<"   Eigen : "<<1000.0*duration<<"ms\n";
   

    // Loop implementation :
   start=clock();
   for (int t=0;t<100000;t++)
   {   
      int nb=b.size();
      double* A_ptr=&A(0,idcol);
      double* b_ptr=&b(0);
      for (int i=0;i<nb;i++)
         (*b_ptr++)= (*A_ptr++)*5.0;
   }
   duration=double(clock()-start)/((double)CLOCKS_PER_SEC);
    std::cout<<"   Loop :  "<<1000.0*duration<<"ms\n";

   getchar();
}
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
.col(i) does not make any copy. You should get similar performance. Make sure you complied with the optimizations.
martin_IM
Registered Member
Posts
9
Karma
0
OS
ggael wrote:.col(i) does not make any copy. You should get similar performance. Make sure you complied with the optimizations.

I had optimizations enabled(Maximize Speed (/O2) in visual studio 2010 )

I compiled using SSE2 instructions and augmented the number of loops to 10000000 (100x more) and i get these timings

Eigen : 686ms
Loop : 531ms

Eigen is still slower than the loop implementation despite it uses SSE2 instructions (the loop implementation does not)

Am i doing something wrong?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Here Eigen is faster (gcc):
Eigen : 5.488ms
Loop : 7.081ms

Note that your example is memory bound (1 load+1 store for only one operation), this is the worst case scenario to get significant gain from vectorization. Check for instance X = X*3.3 + Y*2.9; the speedup will be much higher and probably close to 2 for double and in-cache matrices/vectors.


Bookmarks



Who is online

Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]