This forum has been archived. All content is frozen. Please use KDE Discuss instead.

optimizing matrix-vector multiplies

Tags: None
(comma "," separated)
konolige
Registered Member
Posts
9
Karma
0
OS

optimizing matrix-vector multiplies

Sun Aug 29, 2010 4:14 am
Hi - I have the following function, that performs a specialized block-matrix to vector multiply, where the block matrices are distributed into a diagonal vector and a sparse off-diagonal mapping. The vector is a variable-size, but it is always a multiple of 6 doubles.

I'm looking for advice on how to optimize this. Are loops being unrolled? Is there alignment so that SSE can be used? I have no real experience with the segment<>() operator, and whether the compiler can understand that it preserves alignment. Finally, I need to multiply a vector by the transpose of a matrix, and was wondering is M.transpose()*v is the fastest way.

Cheers --Kurt

void
mMV(vector< Matrix<double,6,6>, aligned_allocator<Matrix<double,6,6> > > &diag,
vector< map<int,Matrix<double,6,6>, less<int>, aligned_allocator<Matrix<double,6,6> > > > &cols,
VectorXd &vin,
VectorXd &vout)
{
// loop over diag entries
for (int i=0; i<(int)diag.size(); i++)
vout.segment<6>(i*6) = diag[i]*vin.segment<6>(i*6);

// loop over off-diag entries
if (cols.size() > 0)
for (int i=0; i<(int)cols.size(); i++)
{
map<int,Matrix<double,6,6>, less<int>,
aligned_allocator<Matrix<double,6,6> > > &col = cols[i];
if (col.size() > 0)
{
map<int,Matrix<double,6,6>, less<int>,
aligned_allocator<Matrix<double,6,6> > >::iterator it;
for (it = col.begin(); it != col.end(); it++)
{
int ri = (*it).first; // get row index
Matrix<double,6,6> &M = (*it).second; // matrix
const Matrix<double,6,1> &vvin = vin.segment<6>(ri*6);
vout.segment<6>(i*6) += M.transpose()*vin.segment<6>(ri*6);
vout.segment<6>(ri*6) += M*vin.segment<6>(i*6);
}
}
}

}
konolige
Registered Member
Posts
9
Karma
0
OS
Apologies for the formatting, here's a better one...kk

Code: Select all
mMV(vector< Matrix<double,6,6>, aligned_allocator<Matrix<double,6,6> > > &diag,
    vector< map<int,Matrix<double,6,6>, less<int>, aligned_allocator<Matrix<double,6,6> > > > &cols,
    VectorXd &vin,
    VectorXd &vout)
  {
    // loop over diag entries
    for (int i=0; i<(int)diag.size(); i++)
      vout.segment<6>(i*6) = diag[i]*vin.segment<6>(i*6);

    // loop over off-diag entries
    if (cols.size() > 0)
    for (int i=0; i<(int)cols.size(); i++)
      {
        map<int,Matrix<double,6,6>, less<int>,
          aligned_allocator<Matrix<double,6,6> > > &col = cols[i];
        if (col.size() > 0)
          {
            map<int,Matrix<double,6,6>, less<int>,
              aligned_allocator<Matrix<double,6,6> > >::iterator it;
            for (it = col.begin(); it != col.end(); it++)
              {
                int ri = (*it).first; // get row index
                Matrix<double,6,6> &M = (*it).second; // matrix
                const Matrix<double,6,1> &vvin = vin.segment<6>(ri*6);
                vout.segment<6>(i*6)  += M.transpose()*vin.segment<6>(ri*6);
                vout.segment<6>(ri*6) += M*vin.segment<6>(i*6);
              }
          }
      }

  }
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Hi, let me first answer your general questions and then I'll look at your code more closely. I'm answering with Eigen3 in mind, answers mostly applicable to Eigen2.

Are loops being unrolled?


You've definitely done all what was needed to preserve the compile-time knowledge about sizes.

However, there is another precondition for loops to be unrolled: the estimated 'complexity' of the operation must be below a certain threshold. I am almost certain that your operations here will get unrolled, since 6 is a small enough size, but you can always adjust the threshold if they aren't, by defining EIGEN_UNROLLING_LIMIT to a larger value (the default is 100).

Notice that in the special case of matrix products like here, there is potentially a second constant to adjust: EIGEN_CACHEFRIENDLY_PRODUCT_THRESHOLD is the matrix size where we decide to switch to the cache-friendly approach to matrix products. The default value is 8, so in your case (6) this won't disturb you. But this is also something that would prevent unrolling, since the cache-friendly code, which is mean for large matrices, isn't unrolled.

Is there alignment so that SSE can be used? I have no real experience with the segment<>() operator, and whether the compiler can understand that it preserves alignment.


Very good question: in your code, this information is lost. Indeed, the 'start' parameter is unknown at compile time; since sizeof(Scalar)==8 here, and SSE requires 16 byte alignment, there only will be alignment if 'start' is even. Which is the case in your program.

Now here's a big difference between Eigen 2 and 3. In Eigen 2, there's no way at all that you can specify that your segment is aligned. In Eigen 3, we have working 'Aligned Map', which is something you can use here.

Just add these typedefs to your function:
Code: Select all
typedef Matrix<double,6,1> Vector6d;
typedef Vector6d::AlignedMapType AlignedMapVector6d;


Now you can replace your code
Code: Select all
vout.segment<6>(i*6) += ...

by this:
Code: Select all
AlignedMapVector6d(&vout(i*6)) += ...


Finally, I need to multiply a vector by the transpose of a matrix, and was wondering is M.transpose()*v is the fastest way.

Yes it is: the transpose() call gets inlined and has zero cost.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Looking quickly at your code, I can see only 2 things to say:

Don't forget to make your input parameters as 'const', for example vin.

In your inner loop code:

Code: Select all
            for (it = col.begin(); it != col.end(); it++)
              {
                int ri = (*it).first; // get row index
                Matrix<double,6,6> &M = (*it).second; // matrix
                const Matrix<double,6,1> &vvin = vin.segment<6>(ri*6);
                vout.segment<6>(i*6)  += M.transpose()*vin.segment<6>(ri*6);
                vout.segment<6>(ri*6) += M*vin.segment<6>(i*6);
              }


a couple of things. In Eigen3, the indexing type is no longer 'int', it is instead, for dense matrices, the signed integer type having the size of a pointer, aka std::ptrdiff_t. You don't have to know that, just use the typedef Index from your matrix type, e.g. MyMatrixType::Index. This avoids some integer conversions (granted, these are negligible in your case).

More importantly, the two last lines might be faster (or not --- do measurements here) if merged into a single line, and marked as lazy. Here's how you would do it with Eigen3:

Code: Select all
   vout.segment<6>(i*6)  += M.transpose().lazyProduct(vin.segment<6>(ri*6))
                            + M.lazyProduct(vin.segment<6>(i*6));


But since this combined expression has greater complexity, you might have to play again with EIGEN_UNROLLING_LIMIT.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
konolige
Registered Member
Posts
9
Karma
0
OS
Thanks, I'll try some of the Eigen3 alignments to see if that helps.

Cheers --Kurt
konolige
Registered Member
Posts
9
Karma
0
OS
Ok, I switched to Eigen3 and am making some progress. But meanwhile I converted my code to be templated on the size of the vector, using N as the template parameter. I can no longer use:


Code: Select all
typedef Matrix<double,6,1> Vector6d;
typedef Vector6d::AlignedMapType AlignedMapVector6d;



Now you can replace your code
Code: Select all
    vout.segment<6>(i*6) += ...




by this:

Code: Select all
AlignedMapVector6d(&vout(i*6)) += ...




I tried something like:

Code: Select all
#define AV6d Matrix<double,N,1>::AlignedMapType



But this gives errors on

Code: Select all
  AV6d(&vout(i*N)) += ...


Any hints?
User avatar
bjacob
Registered Member
Posts
658
Karma
3
What's your error? Did you just forget a 'typename' keyword? When T is a template parameter, doing

sometemplate<T>::someMemberTypedef

is illegal, you need to write

typename sometemplate<T>::someMemberTypedef


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
konolige
Registered Member
Posts
9
Karma
0
OS
Yup, that was it. I'm not that great with templates...kk


Bookmarks



Who is online

Registered users: Bing [Bot], blue_bullet, Google [Bot], rockscient, Yahoo [Bot]