![]() Registered Member ![]()
|
Hi - I have the following function, that performs a specialized block-matrix to vector multiply, where the block matrices are distributed into a diagonal vector and a sparse off-diagonal mapping. The vector is a variable-size, but it is always a multiple of 6 doubles.
I'm looking for advice on how to optimize this. Are loops being unrolled? Is there alignment so that SSE can be used? I have no real experience with the segment<>() operator, and whether the compiler can understand that it preserves alignment. Finally, I need to multiply a vector by the transpose of a matrix, and was wondering is M.transpose()*v is the fastest way. Cheers --Kurt void mMV(vector< Matrix<double,6,6>, aligned_allocator<Matrix<double,6,6> > > &diag, vector< map<int,Matrix<double,6,6>, less<int>, aligned_allocator<Matrix<double,6,6> > > > &cols, VectorXd &vin, VectorXd &vout) { // loop over diag entries for (int i=0; i<(int)diag.size(); i++) vout.segment<6>(i*6) = diag[i]*vin.segment<6>(i*6); // loop over off-diag entries if (cols.size() > 0) for (int i=0; i<(int)cols.size(); i++) { map<int,Matrix<double,6,6>, less<int>, aligned_allocator<Matrix<double,6,6> > > &col = cols[i]; if (col.size() > 0) { map<int,Matrix<double,6,6>, less<int>, aligned_allocator<Matrix<double,6,6> > >::iterator it; for (it = col.begin(); it != col.end(); it++) { int ri = (*it).first; // get row index Matrix<double,6,6> &M = (*it).second; // matrix const Matrix<double,6,1> &vvin = vin.segment<6>(ri*6); vout.segment<6>(i*6) += M.transpose()*vin.segment<6>(ri*6); vout.segment<6>(ri*6) += M*vin.segment<6>(i*6); } } } } |
![]() Registered Member ![]()
|
Apologies for the formatting, here's a better one...kk
|
![]() Registered Member ![]()
|
Hi, let me first answer your general questions and then I'll look at your code more closely. I'm answering with Eigen3 in mind, answers mostly applicable to Eigen2.
You've definitely done all what was needed to preserve the compile-time knowledge about sizes. However, there is another precondition for loops to be unrolled: the estimated 'complexity' of the operation must be below a certain threshold. I am almost certain that your operations here will get unrolled, since 6 is a small enough size, but you can always adjust the threshold if they aren't, by defining EIGEN_UNROLLING_LIMIT to a larger value (the default is 100). Notice that in the special case of matrix products like here, there is potentially a second constant to adjust: EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD is the matrix size where we decide to switch to the cache-friendly approach to matrix products. The default value is 8, so in your case (6) this won't disturb you. But this is also something that would prevent unrolling, since the cache-friendly code, which is mean for large matrices, isn't unrolled.
Very good question: in your code, this information is lost. Indeed, the 'start' parameter is unknown at compile time; since sizeof(Scalar)==8 here, and SSE requires 16 byte alignment, there only will be alignment if 'start' is even. Which is the case in your program. Now here's a big difference between Eigen 2 and 3. In Eigen 2, there's no way at all that you can specify that your segment is aligned. In Eigen 3, we have working 'Aligned Map', which is something you can use here. Just add these typedefs to your function:
Now you can replace your code
by this:
Yes it is: the transpose() call gets inlined and has zero cost.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
![]() Registered Member ![]()
|
Looking quickly at your code, I can see only 2 things to say:
Don't forget to make your input parameters as 'const', for example vin. In your inner loop code:
a couple of things. In Eigen3, the indexing type is no longer 'int', it is instead, for dense matrices, the signed integer type having the size of a pointer, aka std::ptrdiff_t. You don't have to know that, just use the typedef Index from your matrix type, e.g. MyMatrixType::Index. This avoids some integer conversions (granted, these are negligible in your case). More importantly, the two last lines might be faster (or not --- do measurements here) if merged into a single line, and marked as lazy. Here's how you would do it with Eigen3:
But since this combined expression has greater complexity, you might have to play again with EIGEN_UNROLLING_LIMIT.
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
![]() Registered Member ![]()
|
Thanks, I'll try some of the Eigen3 alignments to see if that helps.
Cheers --Kurt |
![]() Registered Member ![]()
|
Ok, I switched to Eigen3 and am making some progress. But meanwhile I converted my code to be templated on the size of the vector, using N as the template parameter. I can no longer use:
Code: Select all typedef Matrix<double,6,1> Vector6d; typedef Vector6d::AlignedMapType AlignedMapVector6d; Now you can replace your code
by this:
I tried something like:
But this gives errors on
Any hints? |
![]() Registered Member ![]()
|
What's your error? Did you just forget a 'typename' keyword? When T is a template parameter, doing
sometemplate<T>::someMemberTypedef is illegal, you need to write typename sometemplate<T>::someMemberTypedef
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
![]() Registered Member ![]()
|
Yup, that was it. I'm not that great with templates...kk
|
Registered users: Bing [Bot], blue_bullet, Google [Bot], rockscient, Yahoo [Bot]