performance of matrix multiplication • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

performance of matrix multiplication

Page 1 of 1 (10 posts)

Tags:

realschorfi Registered Member Posts 4 Karma 0 OS	performance of matrix multiplication Thu Feb 04, 2010 3:02 pm Hello, i got a problem with the performance of the eigen matrix multiplication. I need to multiply two big matrices (N~4000) and it takes about 60 seconds to complete. (Intel(R) Core(TM)2 Duo CPU L7500 @ 1.60GHz) But this is quite slow, as the same matrix-multiplication on the same machine under Matlab just takes around or less than 9 second to complete. (Is eigen using a naive approach: O(n^3)? I am not sure about matlab's (but its better) ) Similar performance differences are revealing when i try to solve a regression problem of similar matrix dimensions. (A.lu().solve(b, &x) , where A is a NxN matrix and b a 3xN , matlab: x=A\b;) How can i improve the performance of this multiplication (or of the solving)? Is there any trick which improves eigen's performance or am i doing something every uncomfortable? I was using g++ 4.3.4, -O2, gentoo-linux and eigen-2.0.11 In fact, i tried the following snippet of code (matrix-multiplication): (don't be confused about the smart pointer, it doesnt really make a difference if the matrix-variable is created on the stack or not, and the source of the problem is not really the multiplication of the matrix with itself, i tested it; beware: it's memory intensive) Code: Select all #include <Eigen/Core> #include <Eigen/Array> #include <boost/shared_ptr.hpp> #include <boost/scoped_ptr.hpp> #include <boost/progress.hpp> int main() { /* timer... / boost::scoped_ptr<boost::progress_timer> timer_progress; / create matrix / boost::shared_ptr<Eigen::MatrixXd> testMatrix(new Eigen::MatrixXd( 4096 ,5000 )); / randomize matrix / (testMatrix).setRandom(); /* start timer / timer_progress.reset(new boost::progress_timer); / this takes quite long // slow multiplication? / Eigen::MatrixXd result = (testMatrix)(testMatrix).transpose(); /* get elapsed time */ std::cout << "multiplication took: "; return (EXIT_SUCCESS); } I hope for your help kind regards schorfi
ggael Moderator Posts 3447 Karma 19 OS	Re: performance of matrix multiplication Thu Feb 04, 2010 8:18 pm So overall we have a theoretical factor 6, which is what you observed. 2^3=8, even
bjacob Registered Member Posts 658 Karma 3	Re: performance of matrix multiplication Thu Feb 04, 2010 9:26 pm aaaaaaargh !!! I wanted to reply to Gael's post, instead I edited it !!! Sorry guys, i'll see how to restore Gael's post. Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
bjacob Registered Member Posts 658 Karma 3	Re: performance of matrix multiplication Thu Feb 04, 2010 9:35 pm Sorry guys, I killed Gael's post and the kde-forum staff tells me that it really can't be undone. Gael was giving you several reasons why eigen could be slower: - first, are you using 2.0? the devel branch can be 2x faster - second, matlab is probably using your 2 cpu cores, while eigen uses only one (for now). Again a 2x factor. - ... are you sure to have enabled SSE2 ? g++ -O2 -msse2 Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
ggael Moderator Posts 3447 Karma 19 OS	Re: performance of matrix multiplication Thu Feb 04, 2010 9:48 pm yes, the other reason is that when you compute r = x * x.transpose() only half of the result has to be computed, and matlab is likely to detect that, so one more factor 2. With the devel branch, on my 2GHz computer I got for r = x * x.transpose() 20sec, that corresponds to 6 GFLOPS/sec, and 3 operations per cycle which is very good. Still with the devel branch, you can compute r = x * x.transpose() even more efficiently: r.setZero(); r.sefladjointView<Upper>().rankUpdate(x); and if you want the full matrix r, then do: r.triangularView<StrictlyLower>() = r.transpose(); but usually you only need to store one half of a symmetric matrix. Here that took 10sec (for 4k x 4k matrices of doubles). Also, the solvers are much much faster in the devel branch.
realschorfi Registered Member Posts 4 Karma 0 OS	Re: performance of matrix multiplication Fri Feb 05, 2010 10:21 am hey, thank you for the reply. I'll try the dev-branch and report my results soon. This should solve another need, the product of all coefficients of an matrix/vector, i made my own up to now . Matlab is a not using parallel computing by default and/or it was using just one core for my experiments, as i easily observed that through gkrellm. (If my memory is correct, an extra toolbox for parallel computing was needed...) I tried openmp to parallelise computing on another position with eigen stuff, but failed to use both cores, seems it starts multiple threads but they are running on one only core, strange, guessing it's my fault, but that's another topic . I am using -march=native, i though that includes sse2. I will state that explicitly for a test. I wasn't aware of the symmetric matrix, but true, that could be a reason thank you i will check kind regards schorfi
ggael Moderator Posts 3447 Karma 19 OS	Re: performance of matrix multiplication Fri Feb 05, 2010 10:44 am realschorfi wrote:Matlab is a not using parallel computing by default and/or it was using just one core for my experiments Sorry, but that cannot be true. Let me explain. Your product requires 168e9 flops. If we assume than only half of the result is computed, it remains 84e9 flops. If matlab performs it in 9 secs, that means 9.3 GFLOPS/sec while the peak performance for your 1.6GHz core 2 is 1.6 * 2 (SSE) * 2 (one mul and one add in one cycle) = 6.4 GFLOPS. So matlab must had used your two CPUs. The other explanation is that you used single precision floating point matrices. (I guess the default is double precision right ?)
Hauke Registered Member Posts 109 Karma 3 OS	Re: performance of matrix multiplication Fri Feb 05, 2010 10:44 am Hi, over here, Matlab is definitely using two cores and it does not use or require the parallel computing toolbox for it (you'll need it for things like parfor). I verified via 'license inuse'. Looking at the threads consuming most of the time (via 'Process Explorer') one can observe that libguide40.dll!_kmp_launch_worker (at least over here) is heavy at work and that means OpenMP is used. The run-time is the same, 9 secs. Regards, Hauke
realschorfi Registered Member Posts 4 Karma 0 OS	Re: performance of matrix multiplication Fri Feb 05, 2010 11:04 am hey, you are right, now it's using two cores true, i am not sure what i have observed my fault sorry :/
realschorfi Registered Member Posts 4 Karma 0 OS	Re: performance of matrix multiplication Fri Feb 05, 2010 11:39 am sorry for that mismatch, i think i mixed up my own reminds, my model is using mex which is running a single core, and duo to lack of memory on my laptop i cannot do "big" experiments which results in matrices with sizes of 4000+, so the observation was a quasi single core behavior, as one core was busy and the other not, but this one had little peaks, which seems to be related to the matrix multiplication and solving but most of the time, the explanation is that the matlab model is mostly generating random-datas (matrix multiplication and solving has only a small fraction) which seems to occupy only one core, i forgot that fact however, the dev-branch is indeed faster, the "usual" matrix multiplication (matrix*matrix.transpose()) just takes around 35seconds now the half of the symmetric version takes 17 seconds that's a good improvement, i hope parallel computing comes soon best regards schorfi

Page 1 of 1 (10 posts)

Bookmarks

Who is online

Registered users: abc72656, Bing [Bot], daret, Google [Bot], Sogou [Bot], Yahoo [Bot]