optimise speed of step 2 of a two step mixed model

Board index

Page 1 of 1 (6 posts)

Tags:

maarten Registered Member Posts 6 Karma 0	optimise speed of step 2 of a two step mixed model Wed Feb 26, 2014 10:08 pm I'm am working on the code of a program that use a 2 step mixed model. The current problems is that the second step is the slowest part of work flow. This step is calculating B=(t(X)WX)-1t(X)WY . The problem is to execute this step ~30 million times per analysis. To do this more environmental friendly I fiddled around with the original code and have now the following set up. Are there possibilities for speed-up like making use of the symmetric properties of matrix W or other smart (mathematical ) tricks. I added also representative numbers of the matrices used to give a better inside view of the problem. I also calculated/estimated the amount of flops each operation costs. Code: Select all //set up variables MatrixXd X = MatrixXd::Random(1000,3); MatrixXd Y = MatrixXd::Random(1000,1); //W should be an symmetric matrix MatrixXd W = MatrixXd::Random(1000,1000); W.diagonal().setOnes(); W.triangularView<Lower>() = W.transpose(); W=W.array().abs(); # linear mixed model //B=(t(X)WX)-1t(X)WY MatrixXd tXW = X.transpose()W;//flops ~6mflops LDLT <MatrixXd> Ch = LDLT <MatrixXd>(tXW X ); // 17991+??? flops //calculate beta MatrixXd beta; = Ch.solve(tXW * Y);//5997 flops // now compute residual variance double sigma2 = (Y - tXW.transpose() * beta).squaredNorm() ; //flops: 1000+5000+3000 Thanks a advance for a answer! I really enjoy working with the EIGEN library.
ggael Moderator Posts 3447 Karma 19 OS	Re: optimise speed of step 2 of a two step mixed model Thu Feb 27, 2014 1:47 pm Y is a vector so use a VectorXd, beta is a Vectro3d, tXW * X is 3x3, so use a Matrix3d, and use partially fixed size matrices for X and tXW: Code: Select all `Matrix<double,Dynamic,3> X = ...; VectorXd Y = ...; # linear mixed model Matrix<double,3,Dynamic> tXW = X.transpose()W; Matrix3d xWx = tXW X; LDLT <Matrix3d> Ch(xWx ); Vector3d beta = Ch.solve(tXW * Y); // now compute residual variance double sigma2 = (Y - tXW.transpose() * beta).squaredNorm() ;` this should be enough to get a huge speedup. I'm not sure that exploiting the symmetry of xWx will speed up the algorithm, but you can try (in addition to the above "optimizations"): Code: Select all `xWx.triangularView<Lower>() = tXW * X;`
maarten Registered Member Posts 6 Karma 0	Re: optimise speed of step 2 of a two step mixed model Sat Mar 08, 2014 8:25 pm Thanks for the hint on the partial fixed matrices. I read about complete fixed matrices (at http://eigen.tuxfamily.org/dox/classEigen_1_1Matrix.html)were faster then dynamic matrices, but nothing about partial fixed matrices. Is this undocumented part of EIGEN or should I try to search harder. I tried the approach given but I had not any big improvements. 95% percent of the flops is taken by: Code: Select all `Matrix<double,3,Dynamic> tXW = X.transpose()*W;` And setting a partial fixed matrix here would not speed-up the process in my case. Is this as expected or is this a erroneous implementation at my site? Thanks for the answer(in advance)
ggael Moderator Posts 3447 Karma 19 OS	Re: optimise speed of step 2 of a two step mixed model Sat Mar 08, 2014 8:56 pm The following might be faster: [CODE] Matrix<double,Dynamic,3> tXW = W.transpose() * X; [CODE] Note that you have to use tXW.transpose() is the rest of the code.
maarten Registered Member Posts 6 Karma 0	Re: optimise speed of step 2 of a two step mixed model Mon Mar 10, 2014 6:35 pm ggael wrote:The following might be faster: [CODE] Matrix<double,Dynamic,3> tXW = W.transpose() * X; [CODE] Note that you have to use tXW.transpose() is the rest of the code. Nice! this made the algorithm more then 2 times faster! (W is symmetric so we can skip the transpose on it). Can you explain why this is faster? In terms of flops it is not any different.
ggael Moderator Posts 3447 Karma 19 OS	Re: optimise speed of step 2 of a two step mixed model Mon Mar 10, 2014 8:20 pm when doing X.transpose() * W, the left-hand-side does not have enough rows to enable vectorization.