This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Significant slowdown due to SSE2

Tags: None
(comma "," separated)
deng
Registered Member
Posts
10
Karma
0

Significant slowdown due to SSE2

Sat Aug 13, 2011 10:32 am
Without many words, here's the code:

Code: Select all
#include <Eigen/Dense>
#include <bench/BenchTimer.h>
#include <iostream>
using namespace Eigen;
using namespace std;

int main()
{
   MatrixXd A = MatrixXd::Random(3,20000);
   MatrixXd B = MatrixXd::Random(3,20000);
   double dist, dAB=0.0;
   BenchTimer t;
   t.start();
   for(int i=0;i<A.cols();++i) {
      for(int j=0;j<B.cols();++j) {
         dist = (A.col(i) - B.col(j)).squaredNorm();
         dAB += dist;
      }
   }
   t.stop();
   cout << "dAB: " << dAB << endl;
   cout << "Time: " << t.value() << endl;
}


Simply compiling with

g++ -lrt -DNDEBUG -O3 test.cpp

results in about ~2.3 seconds, but if I compile with "-msse2", it takes about 5 seconds, so it's more than twice as slow. I'm guessing this has something to do with the low dimensionality of the column vectors, so that using SSE2 actually introduces too much overhead? How can I make Eigen detect that and not use SSE2 in this case? Unfortunately, I cannot use fixed size matrices here, since the dimensionality isn't known at compile time.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Here I get 4s versus 4.5s, not that significant. The number of rows is not known at compile time? 3 is just an example among many possibilities? I'm asking because you could use the Matrix3Xd type which would lead to much better performance. The following version seems to be faster too:

for(int i=0;i<A.cols();++i)
dAB += (A.col(i).rowwise().replicate(B.cols()) - B).squaredNorm();

In the case the number of rows is always small, an even faster solution would be to use row major matrices. In the future the above code with replicate should be able to well vectorize such cases. In the meantime you can help Eigen vectorizing doing:

for(int i=0;i<A.cols();++i)
for(int k=0;k<A.rows();++k)
dAB += (A(k,i) - B.row(k).array()).square().sum();

where B has to be a row major matrix, otherwise this should be slower.
deng
Registered Member
Posts
10
Karma
0

Re: Significant slowdown due to SSE2

Wed Aug 17, 2011 11:21 am
Thanks again, Gael! Unfortunately, I do not know the dimensions at compile-time - it could be something other than '3'. Also, my snippet was just a very reduced test case. I have to go through the column vectors one by one, since in each I have to do calculations which depend on the distances I've calculated with the vectors which came before, so I cannot do that in one sweep.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]