This forum has been archived. All content is frozen. Please use KDE Discuss instead.

optimise speed of step 2 of a two step mixed model

Tags: None
(comma "," separated)
maarten
Registered Member
Posts
6
Karma
0
I'm am working on the code of a program that use a 2 step mixed model. The current problems is that the second step is the slowest part of work flow. This step is calculating B=(t(X)*W*X)-1*t(X)*W*Y . The problem is to execute this step ~30 million times per analysis. To do this more environmental friendly I fiddled around with the original code and have now the following set up. Are there possibilities for speed-up like making use of the symmetric properties of matrix W or other smart (mathematical ) tricks.

I added also representative numbers of the matrices used to give a better inside view of the problem. I also calculated/estimated the amount of flops each operation costs.


Code: Select all
//set up variables
MatrixXd X = MatrixXd::Random(1000,3);
MatrixXd Y = MatrixXd::Random(1000,1);
//W should be an symmetric matrix
MatrixXd W = MatrixXd::Random(1000,1000);
W.diagonal().setOnes(); 
W.triangularView<Lower>() = W.transpose();
W=W.array().abs();

# linear mixed model 
//B=(t(X)*W*X)-1*t(X)*W*Y
MatrixXd tXW = X.transpose()*W;//flops ~6mflops
LDLT <MatrixXd>   Ch = LDLT <MatrixXd>(tXW * X ); // 17991+??? flops
//calculate beta
MatrixXd beta; = Ch.solve(tXW * Y);//5997 flops
// now compute residual variance
double sigma2 = (Y - tXW.transpose() * beta).squaredNorm() ; //flops: 1000+5000+3000


Thanks a advance for a answer! I really enjoy working with the EIGEN library.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Y is a vector so use a VectorXd, beta is a Vectro3d, tXW * X is 3x3, so use a Matrix3d, and use partially fixed size matrices for X and tXW:
Code: Select all
Matrix<double,Dynamic,3> X = ...;
VectorXd Y = ...;

# linear mixed model 
Matrix<double,3,Dynamic> tXW = X.transpose()*W;
Matrix3d xWx = tXW * X;
LDLT <Matrix3d>   Ch(xWx );
Vector3d beta = Ch.solve(tXW * Y);
// now compute residual variance
double sigma2 = (Y - tXW.transpose() * beta).squaredNorm() ;


this should be enough to get a huge speedup. I'm not sure that exploiting the symmetry of xWx will speed up the algorithm, but you can try (in addition to the above "optimizations"):
Code: Select all
xWx.triangularView<Lower>() = tXW * X;
maarten
Registered Member
Posts
6
Karma
0
Thanks for the hint on the partial fixed matrices. I read about complete fixed matrices (at http://eigen.tuxfamily.org/dox/classEigen_1_1Matrix.html)were faster then dynamic matrices, but nothing about partial fixed matrices. Is this undocumented part of EIGEN or should I try to search harder.

I tried the approach given but I had not any big improvements. 95% percent of the flops is taken by:
Code: Select all
Matrix<double,3,Dynamic> tXW = X.transpose()*W;


And setting a partial fixed matrix here would not speed-up the process in my case. Is this as expected or is this a erroneous implementation at my site?

Thanks for the answer(in advance)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
The following might be faster:
[CODE]
Matrix<double,Dynamic,3> tXW = W.transpose() * X;
[CODE]
Note that you have to use tXW.transpose() is the rest of the code.
maarten
Registered Member
Posts
6
Karma
0
ggael wrote:The following might be faster:
[CODE]
Matrix<double,Dynamic,3> tXW = W.transpose() * X;
[CODE]
Note that you have to use tXW.transpose() is the rest of the code.


Nice! this made the algorithm more then 2 times faster! (W is symmetric so we can skip the transpose on it). Can you explain why this is faster? In terms of flops it is not any different.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
when doing X.transpose() * W, the left-hand-side does not have enough rows to enable vectorization.


Bookmarks



Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot]