This forum has been archived. All content is frozen. Please use KDE Discuss instead.

[SOLVED] [newbie] speed of At*A on a dense matrix

Tags: None
(comma "," separated)
rbossart
Registered Member
Posts
14
Karma
0
Hi again,

Another thread to ask for help on this simple task: I'm trying to use Eigen to beat matlab (x86_64) on AA=A'*A on a dense matrix A=rand(4000)

My small experimentations show me that I'm slower. Given the very nice benchmark results on the web site, I assume(hope) I'm misusing Eigen.

In matlab, this is what I run:
A=rand(4000);
tic; AA=A'*A; toc =>7.1s

With eigen, the following was compiled with g++-4.3.2, -O3 and -msse2 (using svn 898458)
Code: Select all
...
const int size=4000;
MatrixXf A = MatrixXf::Random(size,size);
MatrixXf AA = A; // initialization
timer.start();
AA = A.transpose() * A;
timer.stop();
std::cout << "product:t" << timer.value() << std::endl;


I get 11s
Could you give me some hints? I think I miss something basic.

BTW, I have similar performance problems using cholmod in eigen compared to cholmod backend in matlab. Of course, if I'm bad here, I'm bad on cholmod.

Thanks in advance,
Romain

Last edited by bjacob on Mon Dec 22, 2008 5:28 am, edited 1 time in total.


rbossart, proud to be a member of KDE forums since 2008-Dec.
User avatar
bjacob
Registered Member
Posts
658
Karma
3
First of all, our benchmark only goes up to size 1000, not 4000;

Looking at your code, it seems that you are using Eigen correctly (the AA = A is useless but that's not part of what you're timing anyway).

Your results (11s for Eigen versus 7s for Matlab) are compatible with our benchmark: look at INTEL_MKL and GOTO curves on the A^T x A benchmark, they are above us by a similar margin.

So it seems that Matlab has similar performance to INTEL_MKL and GOTO -- perhaps Matlab actually uses one of these two libraries.

BTW, I have similar performance problems using cholmod in eigen compared to cholmod backend in matlab. Of course, if I'm bad here, I'm bad on cholmod.


That sounds more worrying : just using the same library as backend should give roughly the same performance.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
rbossart
Registered Member
Posts
14
Karma
0
bjacob wrote:
BTW, I have similar performance problems using cholmod in eigen compared to cholmod backend in matlab. Of course, if I'm bad here, I'm bad on cholmod.


That sounds more worrying : just using the same library as backend should give roughly the same performance.


Thanks Benoît,

I'll be more precise regarding this problem. I don't understand either that I don't have the same performance yet.
Regards,
Romain


rbossart, proud to be a member of KDE forums since 2008-Dec.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
bjacob wrote:First of all, our benchmark only goes up to size 1000, not 4000;

Looking at your code, it seems that you are using Eigen correctly (the AA = A is useless but that's not part of what you're timing anyway).

Your results (11s for Eigen versus 7s for Matlab) are compatible with our benchmark: look at INTEL_MKL and GOTO curves on the A^T x A benchmark, they are above us by a similar margin.

So it seems that Matlab has similar performance to INTEL_MKL and GOTO -- perhaps Matlab actually uses one of these two libraries.


exactly, I'm pretty sure Matlab uses on these libraries. Moreover, 11s is not bad, it means a rate of 12 GFlops that is OK with respect to our benchmark. Nevertheless, that would be nice to try to implement an even smarter matrix product algorithm: eg., GOTO's algorithm is explained in a paper (Trans. on Mathematical Sofwares). If someone is interested...

BTW, I have similar performance problems using cholmod in eigen compared to cholmod backend in matlab. Of course, if I'm bad here, I'm bad on cholmod.


Can you be more explicit, because the perf. should be almost the same. I can see two explanations:
1) my sparse triangular solver is much slower. Perhaps you could try to compare only that part ?
2) Cholmod provides a lot of options and a lot of different algorithms, so perhaps my default settings to configure Cholmod are not optimal....
rbossart
Registered Member
Posts
14
Karma
0
ggael wrote:[...]
Can you be more explicit, because the perf. should be almost the same. I can see two explanations:
1) my sparse triangular solver is much slower. Perhaps you could try to compare only that part ?
2) Cholmod provides a lot of options and a lot of different algorithms, so perhaps my default settings to configure Cholmod are not optimal....


I'll try to compare only the triangular solver and get more used to Eigen. I'll keep this thread updated.

Thanks for the support!

Romain


rbossart, proud to be a member of KDE forums since 2008-Dec.
User avatar
bjacob
Registered Member
Posts
658
Karma
3
By the way, if you want to tune Eigen for optimal performance here, you could #define EIGEN_TUNE_FOR_L2_CACHE_SIZE to some appropriate value. The default value is (sizeof(float)*256*256) which means that the matrix product code will work on 256x256 blocks which is 256 KB. Increase this value to whatever can fit in your CPU's L2 cache (actually less than that because i guess 3 such blocks need to fit simultaneously in the cache, plus your benchmark doesn't get 100% of the cache... the default value is probably fine for CPU's with 1 MB of L2 cache).


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]