Declaration and optimization in Eigen

Board index

Page 1 of 1 (7 posts)

Tags:

jr001 Registered Member Posts 3 Karma 0	Declaration and optimization in Eigen Mon Jul 13, 2009 7:35 pm I am working with a function that is called many times, and depends on efficient computations. I wanted to make sure I was getting the most performance out of Eigen. The function uses 3x3 matrices to store intermediate results, and to build the large matrix Y. -Is it more efficient to declare the 3x3 matrices as class variables, so they are not re-instantiated every time the function is called, or can the compiler "optimize away" the instantiations? - Will the compiler exploit the fact that some of the matrices are constant? - Can the compiler exploit the special properties of Y, when multiplying with P_map? ( Less than half of the elements of Y are nonzero, and then only about half of the non-zero elements are changing every time the function is called.) Note: This code is only the computation intensive of the function. So some of it may not make sense out of context. Code: Select all // this part of the code builds the 3x3 matrices that are needed to construct the 15x15 matrix Y // capital letters are for 3x3 matrices, lowercase for 3x1 vectors // crossprodmat(v) ,where v is a 3x1 vector, is the matrix // [ 0 -v(2) v(1) // v(2) 0 -v(0); // -v(1) v(0) 0] Matrix3d A = B * C.transpose(); Matrix3d D = B * E.transpose(); Matrix3d F = B * G.transpose(); Matrix3d H = B * I.transpose(); Matrix3d J = 1./24.( Matrix3d::Identity() + 4.A + D); Matrix3d K = 1./12.( Matrix3d::Identity() + 4.D + F); Matrix3d L = 1./6.( Matrix3d::Identity() + 4.F + H); Matrix3d L = 1./6.(4. K + L); Vector3d t1 = 0.5(a+ b) - c; Vector3d t2 = a - c; Matrix3d T1 = crossprodmatrix(t1) B.transpose() * K; Matrix3d T2 = crossprodmatrix(t2) * B.transpose() * L; Matrix3d M = 1/6.( 4.T2 + T1); Vector3d t3 = (0.75a+0.25b) - c; Matrix3d T4 = crossprodmatrix(t3) B.transpose() * J; Matrix3d M_mid = 1/12.( 4.T1 + T4); Matrix3d N = 1/6.* ( 4.M_mid+ M); Vector3d o = e - f - d - (0.5g) ; Vector3d p = d - d2 - g; // here R and Z are matrices whose values are known at compile time Matrix3d R; R << const1 , const2, 0, const3, const4, 0, 0, 0, 1; Vector3d z; z << 0 , 0, const5; Matrix3d Z = crossprodmat(z); Matrix3d Q = -Rcrossprodmatrix(o)B.transpose(); Matrix3d S = ZRcrossprodmatrix(o)B.transpose() - Rcrossprodmatrix(p)B.transpose(); Matrix3d T = R + ZRZ; Matrix3d U = - ZwRZ; Matrix3d V = -ZwR + R; Matrix3d M_e = -ZRN + RM; Matrix3d W = -R B.transpose() * L; Matrix3d X = ZRB.transpose()L - RB.transpose()* L; //Matrix<double,15,15> Y ; // initialized at the constructor Y.block(0,0,3,3) = H.transpose(); Y.block(0,9,3,3) = -(H.transpose())( L); Y.block(3,0,3,3) = Q; Y.block(3,3,3,3) = T; Y.block(3,6,3,3) = U; Y.block(3,9,3,3) = RN; Y.block(3,12,3,3) = W; Y.block(6,0,3,3) = S; Y.block(6,3,3,3) = V; Y.block(6,9,3,3) = M_e; Y.block(6,12,3,3) = X; Y.block(9,9,3,3) = Matrix3d::Identity(); Y.block(12,12,3,3) = Matrix3d::Identity(); // Q_diag is a 15x15 diagonal matrix, whose values are known at compile time Matrix<double, 15,1> Q_diag; Q_diag << const5, const5,const5, 0, 0, 0, const6,const6,const6, const7,const7,const7, const8,const8,const8; // P_map_buffer is a double, which is a class member. It stores the elements of the // matrix P_map in Row major format, with a stride of LDIM Eigen::Map <Matrix<double,15,LDIM,Eigen::RowMajor> > Y_map(P_map_buffer,15,LDIM); P_map.block<15,15>(0,0) += ( 0.5 Q_diag.asDiagonal()); P_map.block<15,15>(0,0) = Y * P_map.block<15,15>(0,0); P_map.block<15,15>(0,0) = P_map.block<15,15>(0,0) * Y.transpose(); P_map.block<15,15>(0,0) += ( 0.5 * Q_diag.asDiagonal()); Thank you for any help.
ggael Moderator Posts 3447 Karma 19 OS	Re: Declaration and optimization in Eigen Mon Jul 13, 2009 8:31 pm -Is it more efficient to declare the 3x3 matrices as class variables, so they are not re-instantiated every time the function is called, or can the compiler "optimize away" the instantiations? Since they are fixed size matrices, they are allocated on the stack, so declaring them as class variables won't help. On the other hand, you can declare them directly as a sub matrix of Y: Code: Select all `Block<YType,3,3> Q(Y,3,0);` Likewise, it is faster to rewrite: Code: Select all `Y.block(3,3,3,3) = T;` as Code: Select all `Y.block<3,3>(3,3) = T;` - Will the compiler exploit the fact that some of the matrices are constant? - Can the compiler exploit the special properties of Y, when multiplying with P_map? ( Less than half of the elements of Y are nonzero, and then only about half of the non-zero elements are changing every time the function is called.) Unfortunately the answer is no. For your use case, the best would be to store Y in a sparse block matrix of fixed size blocks, but we don't have that in Eigen yet. We do have generic sparse matrices, but in your case I don't think that's worth it (your matrices are too small). Make sure you compile you compile program with a recent compiler, with -DNDEBUG, and with optimizations (-O2) and SSE2 (-msse2) enabled. Also, since your matrices are 15x15, you could try to define: Code: Select all `#define EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD 15` just before including Eigen/Core. This constant control the use a naive or cache friendly product implementation (the default is 16). Also: Code: Select all `P_map.block<15,15>(0,0) += ( 0.5 * Q_diag.asDiagonal());` can be optimized as: Code: Select all `P_map.block<15,15>(0,0).diagonal() += 0.5 * Q_diag.asDiagonal();`
bjacob Registered Member Posts 658 Karma 3	Re: Declaration and optimization in Eigen Tue Jul 14, 2009 12:03 am ggael wrote:Also: Code: Select all `P_map.block<15,15>(0,0) += ( 0.5 * Q_diag.asDiagonal());` can be optimized as: Code: Select all `P_map.block<15,15>(0,0).diagonal() += 0.5 * Q_diag.asDiagonal();` You mean: Code: Select all `P_map.block<15,15>(0,0).diagonal() += 0.5 * Q_diag;` Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
jr001 Registered Member Posts 3 Karma 0	Re: Declaration and optimization in Eigen Wed Jul 15, 2009 7:08 pm Thanks for the suggestions. I have another performance question with a simple example. I tested this in a loop against a C version using static arrays, and the tvmet library. The eigen version always seems much slower. I did notice an improvement when I forced lazy evaluation though. Here are some average times: for 100,000,000 loops: static: 4 seconds tvmet: 7 seconds eigen: 25 seconds for 1,000,000,000 loops: static: 36 seconds tvmet: 1 minute,10 seconds eigen: 4 minutes, 9 seconds Is there something else I could do to increase speed? (All matrices A,B,C.. are Matrix3d) Code: Select all `A = (B * C.transpose()).lazy(); D = (B * E.transpose() ).lazy(); F = ( B * G.transpose() ).lazy(); H = ( B * P.transpose() ).lazy(); I = ( a/24.( Matrix3d::Identity() + 4.A + D) ).lazy(); J = ( a/12.( Matrix3d::Identity() + 4.D + F) ).lazy(); K = ( a/6.( Matrix3d::Identity() + 4.F + H) ).lazy(); L = ( a/6.(4.J + K) ).lazy();`
ggael Moderator Posts 3447 Karma 19 OS	Re: Declaration and optimization in Eigen Thu Jul 16, 2009 7:32 am hm, that's strange. Can you post the static and tvmet versions as well as the code around because I suspect the compiler detects that some parts can be factorized out of the loop (or completely removed), and so your benchmark would be meaningless. Indeed, here you have about 337 floating point operations per iteration, so 36s for 1e9 iterations means 9 GFlops => a 9 GHz CPU ! (well in some cases current CPU are able to perform one + and one * in one cycle, so let's say this imply a 4.5 GHz CPU) So, could you also tell us was is your CPU and compiler ? Here it takes ~10s for 1e8 iterations on a 2.66GHz Core2 (only one core used). I check the assembly and nothing has been removed out of the loop. This means ~3.37GFlops. Compared to the theoretical 5.32 peak performance this is not bad at all. Also, the .lazy() are only needed around the matrix product. The other expressions are always evaluated lazily.
ggael Moderator Posts 3447 Karma 19 OS	Re: Declaration and optimization in Eigen Thu Jul 16, 2009 8:27 am To give you an example, here is a fair benchmark wrapping the key piece of code into a function which is enforced not to be inlined: http://pastebin.com/m12b8aa87 This way you guarantee the compiler won't optimize too much.
jr001 Registered Member Posts 3 Karma 0	Re: Declaration and optimization in Eigen Thu Jul 16, 2009 5:34 pm Sorry, I forgot to mention the compiler. Those estimates were with icpc and intel i7 processor, using the -O3 flag. I know it isn't a proper benchmark, I just wanted to roughly recreate what my main program is dealing with. Recently, I tried the tests with g++ and the results were the opposite. The static C version was slowest, while eigen and tvmet got faster. Are there any ways I can improve performance when using icpc? I will post the rest of the tests soon if it will help.