Registered Member
|
I have been benchmarking execution times for threads in Mac OS 10.5.8 (Leopard). Language is Objective-C++. Compiler is gcc 4.0.1.
Typically, parallel threads provide speed improvement proportional to the number of cores, at least up to four, but this is NOT TRUE when the computation involves Eigen 2.0.5. Here is such a computation: // Matrices are row-major. Data arrays (g, w) are fixed globals. -(void)main { Matrix3d GtWGi; const int NREPEATS = 10000; for (int j = 0;j < NREPEATS;j++) { MatrixXd G = Map<MatrixXd>(g, 12, 3); MatrixXd W = Map<MatrixXd>(w, 12, 12); GtWGi = (G.transpose()*W*G).inverse(); } } And here are two benchmarks, using 20 replicates, with two CPUs (at 2 GHz): Execution time (ms) with 1 thread --> mean = 345, sigma = 6.8 Execution time with 2 threads --> mean = 463, sigma = 8.5 [SLOWER!!] I am guessing that the culprit is dynamic allocation via malloc() which, to be threadsafe, is serialized. Have you experienced anything similar and, most especially, is there some fix for this compatible with Eigen? I would very much like to use the latter but speed is important for my application. Thanks. |
Registered Member
|
Oops, this compiler gives very poor results with Eigen. Try to upgrade to GCC 4.2 at least. It is available on Mac.
Yes, I agree it's probably the explanation. If that hypothesis is true, then your code spents a lot of time waiting for malloc() to return. The quickest way to at least check if that's the problem, is to replace MatrixXd by fixed-size matrix types, that are guaranteed to not cause mallocs:
And see if that improves performance. It seems that you do know the value "3" at compile time, but perhaps not the 12? Then at least declare G as Map<Matrix<double,Dynamic,3> >(g,12,3); That will already allow to avoid most of the mallocs.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered users: Bing [Bot], daret, Google [Bot], sandyvee, Sogou [Bot]