Registered Member
|
I'm am working on the code of a program that use a 2 step mixed model. The current problems is that the second step is the slowest part of work flow. This step is calculating B=(t(X)*W*X)-1*t(X)*W*Y . The problem is to execute this step ~30 million times per analysis. To do this more environmental friendly I fiddled around with the original code and have now the following set up. Are there possibilities for speed-up like making use of the symmetric properties of matrix W or other smart (mathematical ) tricks.
I added also representative numbers of the matrices used to give a better inside view of the problem. I also calculated/estimated the amount of flops each operation costs.
Thanks a advance for a answer! I really enjoy working with the EIGEN library. |
Moderator
|
Y is a vector so use a VectorXd, beta is a Vectro3d, tXW * X is 3x3, so use a Matrix3d, and use partially fixed size matrices for X and tXW:
this should be enough to get a huge speedup. I'm not sure that exploiting the symmetry of xWx will speed up the algorithm, but you can try (in addition to the above "optimizations"):
|
Registered Member
|
Thanks for the hint on the partial fixed matrices. I read about complete fixed matrices (at http://eigen.tuxfamily.org/dox/classEigen_1_1Matrix.html)were faster then dynamic matrices, but nothing about partial fixed matrices. Is this undocumented part of EIGEN or should I try to search harder.
I tried the approach given but I had not any big improvements. 95% percent of the flops is taken by:
And setting a partial fixed matrix here would not speed-up the process in my case. Is this as expected or is this a erroneous implementation at my site? Thanks for the answer(in advance) |
Moderator
|
The following might be faster:
[CODE] Matrix<double,Dynamic,3> tXW = W.transpose() * X; [CODE] Note that you have to use tXW.transpose() is the rest of the code. |
Registered Member
|
Nice! this made the algorithm more then 2 times faster! (W is symmetric so we can skip the transpose on it). Can you explain why this is faster? In terms of flops it is not any different. |
Moderator
|
when doing X.transpose() * W, the left-hand-side does not have enough rows to enable vectorization.
|
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]