This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Unexpectedly slow m.cwiseAbs().maxCoeff(&y,&x)

Tags: None
(comma "," separated)
SAnton
Registered Member
Posts
16
Karma
0
Hi.

I need to find position of element having maximal absolute value inside a matrix.

My testing code:

Code: Select all
Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );

auto start = std::chrono::steady_clock::now();

Eigen::MatrixXd::Index y, x;
double const t = m.cwiseAbs().maxCoeff(&y, &x);

auto end = std::chrono::steady_clock::now();

std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms" << std::endl;

The time taken of the code to run is 110ms.

But let us look at times of other (simpler) operations:
Code: Select all
m.maxCoeff()           : 29ms
m.maxCoeff(&y,&x)      : 35ms
m.cwiseAbs().maxCoeff(): 36ms

That is my reasoning:
We need 29ms to find value of maximal item in a matrix; we need 6ms more if we also need to locate that item; we need 7ms more if we need to take absolute value of each item but do not need to locate the maximal item. So, we will need 29 + 6 + 7 = 42ms to take absolute value and to locate maximal item.

What is going wrong?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Your problem is memory bound, therefore the absolute value should add zero overhead. This is what I observe on my system using either clang or gcc. I get 20ms when querying the x,y coordinates and 14.5ms otherwise. The speedup is due to vectorization and better pipelining.

Self contained example:
Code: Select all
#include <Eigen/Core>
#include <chrono>
#include <iostream>
int main() {
  Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );

  auto start = std::chrono::steady_clock::now();

  Eigen::MatrixXd::Index y, x;
  double const t = m.cwiseAbs().maxCoeff(&y, &x);
//   double const t = m.cwiseAbs().maxCoeff();
//   double const t = m.maxCoeff();
//   double const t = m.maxCoeff(&y, &x);

  auto end = std::chrono::steady_clock::now();

  std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms (" << t << "," << x << "," << y << ")" << std::endl;
}
SAnton
Registered Member
Posts
16
Karma
0
I just cannot understand why 2 iterations over matrix take 72ms:
Code: Select all
Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );

auto start = std::chrono::steady_clock::now();

Eigen::MatrixXd::Index y, x;
double t;
   
Eigen::MatrixXd::Index y1, x1;
double const t1 = std::abs( m.minCoeff(&y1, &x1) );
Eigen::MatrixXd::Index y2, x2;
double const t2 = std::abs( m.maxCoeff(&y2, &x2) );
if (t1 > t2) {
   t = t1; x = x1; y = y1;
} else {
   t = t2; x = x2; y = y2;
}

auto end = std::chrono::steady_clock::now();

std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms (" << t << "," << x << "," << y << ")" << std::endl;


while single iteration takes 108ms:
Code: Select all
Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );

   auto start = std::chrono::steady_clock::now();

   Eigen::MatrixXd::Index y, x;
   double t;
   
   t = m.cwiseAbs().maxCoeff(&y, &x);

   auto end = std::chrono::steady_clock::now();

   std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms (" << t << "," << x << "," << y << ")" << std::endl;


Another problem. This one takes 55ms:
Code: Select all
Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );
Eigen::VectorXd    col( Eigen::VectorXd   ::Random(5000) );
Eigen::RowVectorXd row( Eigen::RowVectorXd::Random(5000) );

auto start = std::chrono::steady_clock::now();

for (int c = 0; c < 5000; ++c)
   for (int r = 0; r < 5000; ++r)
      m(r, c) -= col(r) * row(c);

auto end = std::chrono::steady_clock::now();

std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms. " << m(2500, 2500) << std::endl;


while this one takes 210ms:
Code: Select all
Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );
Eigen::VectorXd    col( Eigen::VectorXd   ::Random(5000) );
Eigen::RowVectorXd row( Eigen::RowVectorXd::Random(5000) );

auto start = std::chrono::steady_clock::now();

m -= col * row;

auto end = std::chrono::steady_clock::now();

std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms. " << m(2500, 2500) << std::endl;


Why highly-optimized Eigen is 4x slower than 2 nested loops?
SAnton
Registered Member
Posts
16
Karma
0
Loop-based version of abs-max takes 40ms:
Code: Select all
Eigen::MatrixXd m( Eigen::MatrixXd::Random(5000, 5000) );

auto start = std::chrono::steady_clock::now();

Eigen::MatrixXd::Index y, x;
double t;
   
for (int c = 0; c < 5000; ++c) {
   for (int r = 0; r < 5000; ++r) {
      double const v = std::abs( m(r, c) );
      if (v > t) {
         t = v; y = r; x = c;
      }
   }
}
auto end = std::chrono::steady_clock::now();

std::cout << std::chrono::duration<double, std::milli>(end - start).count() << " ms (" << t << "," << x << "," << y << ")" << std::endl;


So, Eigen always looses to loop-based versions of code. :(
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
What's your compiler and compiler flags? I cannot reproduce any of your observations.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Here are my numbers:

** abs maxcoeff **

two iterations: 32ms
one iteration (t = m.cwiseAbs().maxCoeff(&y, &x);) 16ms
manual : 16ms

** for the outer product: **

manual: 22ms
m.noalias() -= col * row; : 21ms

Again, vectorization does not help because your pb is memory bound.
SAnton
Registered Member
Posts
16
Karma
0
I use Microsoft Visual Studio 12.0.30110.00 Update 1.
Compiler options are default for Release 32-bit Intel build.

By the way, noalias() makes Eigen 3x faster (but still slower shan simple nested loops).


Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot]