Registered Member
|
Hi.
I need to find position of element having maximal absolute value inside a matrix. My testing code:
The time taken of the code to run is 110ms. But let us look at times of other (simpler) operations:
That is my reasoning: We need 29ms to find value of maximal item in a matrix; we need 6ms more if we also need to locate that item; we need 7ms more if we need to take absolute value of each item but do not need to locate the maximal item. So, we will need 29 + 6 + 7 = 42ms to take absolute value and to locate maximal item. What is going wrong? |
Moderator
|
Your problem is memory bound, therefore the absolute value should add zero overhead. This is what I observe on my system using either clang or gcc. I get 20ms when querying the x,y coordinates and 14.5ms otherwise. The speedup is due to vectorization and better pipelining.
Self contained example:
|
Registered Member
|
I just cannot understand why 2 iterations over matrix take 72ms:
while single iteration takes 108ms:
Another problem. This one takes 55ms:
while this one takes 210ms:
Why highly-optimized Eigen is 4x slower than 2 nested loops? |
Registered Member
|
Loop-based version of abs-max takes 40ms:
So, Eigen always looses to loop-based versions of code. |
Moderator
|
What's your compiler and compiler flags? I cannot reproduce any of your observations.
|
Moderator
|
Here are my numbers:
** abs maxcoeff ** two iterations: 32ms one iteration (t = m.cwiseAbs().maxCoeff(&y, &x);) 16ms manual : 16ms ** for the outer product: ** manual: 22ms m.noalias() -= col * row; : 21ms Again, vectorization does not help because your pb is memory bound. |
Registered Member
|
I use Microsoft Visual Studio 12.0.30110.00 Update 1.
Compiler options are default for Release 32-bit Intel build. By the way, noalias() makes Eigen 3x faster (but still slower shan simple nested loops). |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]