This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Colwise on Vec3 type is slow

Tags: None
(comma "," separated)
la_feuille
Registered Member
Posts
2
Karma
0

Colwise on Vec3 type is slow

Tue Oct 03, 2017 8:34 pm
Hi,

First of all thank you for your amazing library.

I'm trying to understand what's going on in the following code.

- I'm applying a 3D transformation (R|C) on some 3d point data.
- I'm using the operator() to apply the transformation either to a vector or a matrix.

You will find below the code I used in order to reproduce the issue.
Using the operator() on Vec3 data in a loop seems like 1x slower than calling the operator() on the matrix.
I did not expect such a timing difference.

$ ./main
For loop (Pose with Vec3): 259ms
(Pose with Mat3X): 18ms

If I add the following function to the Pose class the code is running as fast, but I would like to have only one transform operator and does not need to implement two.

Code: Select all
inline Vec3 operator () ( const Vec3& p ) const
  {
    return rotation_ * ( p - center_ );
  }


Can you help me to understand what I have made wrong?

Code: Select all
#include "Eigen/Dense"

#include <chrono>
#include <iostream>
#include <ratio>
#include <type_traits>

using Vec3 = Eigen::Vector3d;
using Mat3 = Eigen::Matrix<double, 3, 3>;
using Mat3X = Eigen::Matrix<double, 3, Eigen::Dynamic>;

struct Pose
{
  Mat3 rotation_;
  Vec3 center_;

  Pose(const Mat3& r = std::move(Mat3::Identity()), const Vec3& c = std::move(Vec3::Zero()))
  : rotation_( r ), center_( c ) {}

  inline Mat3X operator () ( const Mat3X& p ) const {
    return rotation_ * ( p.colwise() - center_ );
  }
};

int main()
{
  const int nb_elem = 900000;
  Mat3X test = Mat3X::Random(3, nb_elem);
  Mat3 rot = Mat3::Identity();
  Vec3 center;

  Pose pose;

  // warm up
  for (int i : {0,1,2,3,4}) {
    Mat3X testout = test;
    for (int i = 0; i < nb_elem; ++i) {
      testout.col(i) = pose(test.col(i));
    }
  }

  std::chrono::high_resolution_clock::time_point start;

  {
    start = std::chrono::high_resolution_clock::now();
    Mat3X testout = test;
    for (int i = 0; i < nb_elem; ++i)
    {
      testout.col(i) = pose(test.col(i));
    }
    const auto end = std::chrono::high_resolution_clock::now();
    std::cout << "For loop (Pose with Vec3): "
     << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
  }

  {
    start = std::chrono::high_resolution_clock::now();
    const Mat3X testout = pose(test);
    const auto end = std::chrono::high_resolution_clock::now();
    std::cout << "(Pose with Mat3X): "
       << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
  }
  return EXIT_SUCCESS;
}


Thank you for your help.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
This is because for every calls to pose(p.col(i)) the compiler has to create a Mat3X object (which is allocated on the heap, so there is a costly malloc hidden there) and copy p.col(i) into this Mat3X object. Same for the returned value. The fix is as simple as witting a single generic template function, in c++14:

Code: Select all
template<typename T>
auto operator() (const T& p) const {
  return rotation_ * ( p.colwise() - center_ );
}


This will return an expression, so be careful with:

Code: Select all
auto res = pose(p);


See: https://eigen.tuxfamily.org/dox/TopicPitfalls.html

If you want to return the computed result (i.e., a Matrix) then write

Code: Select all
template<typename T>
typename T::PlainObject operator() (const T& p) const {
  return rotation_ * ( p.colwise() - center_ );
}
la_feuille
Registered Member
Posts
2
Karma
0

Re: Colwise on Vec3 type is slow

Wed Oct 04, 2017 5:41 pm
Thank you very much. :<

Your detailed answer (modern cxx14 and the classic one) is appreciated.

I will choose the template solution for the moment and will try to learn how to better use Eigen from the TopicPitfalls doc webpage!

Before:
$ ./main
For loop (Pose with Vec3): 292ms
(Pose with Mat3X): 18ms

After
$ ./main
For loop (Pose with Vec3): 5ms // Clearly better!
(Pose with Mat3X): 18ms

I wonder if you have any tool to advise to make a runtime or static analysis that allow to find such not optimal return error type?


Bookmarks



Who is online

Registered users: Bing [Bot], daret, Google [Bot], sandyvee, Sogou [Bot]