Registered Member
|
Our code uses a lot of 3D affine transformations. Thus, I compared the speed of concatenating transformations when using the Affine3d vs. the Matrix4d class for those transformations. My expectation was that Affine3d is faster because it can make additional assumptions. However, in my experiments Affine3d is slower by factor 3 or 4.
Am I doing anything wrong? I don't think a specialized class should be slower than the general matrix class. I also compared the speed to hard-coding the concatenation, which was faster than Affine3d and Matrix4d. I am using Visual Studio 2010 32-bit. Here are the results for 100 million concatenations: WIth SSE2: Affine3d: 9.3 s Matrix4d: 3.1 s Hard-coded: 1.8 s Without SSE: Affine3d: 9.7 s Matrix4d: 2.4 s Hard-coded: 2.0 s I am also confused about the fact that Matrix4d is slowed down by SSE2. Here is my code for the experiment:
|
Moderator
|
Your benchmark is broken has it allows the compiler to completely, or partially, remove the loops. That's what happens with gcc. Here is a fixed version:
And with g++ -O2 -DNDEBUG I get: With SSE: Affine3d: 2.22058 s Matrix4d: 2.12294 s Hard-coded: 2.13547 s No SSE: Affine3d: 2.02658 s Matrix4d: 3.95067 s Hard-coded: 2.1355 s |
Registered Member
|
Thanks for correcting my benchmark. With your changes I am getting similar results as you when using gcc. Here are my results with gcc on a 64-bit Linux machine with the same CPU as my Windows machine:
Affine3d: 1.49 s Matrix4d: 1.42 s Hard-coded: 1.33 s However, with Visual Studio my results haven't changed much. Instead, the difference between Affine3d and Matrix4d has become even larger. Here are my results using your code with Visual Studio. I also tried VS 2012 and 64-bit compilation. 64-bit seems to improve the situation. VS 2012 makes it a lot worse. For now I am just going to assume the problem is caused by VS not optimizing as well as GCC does. VS 2010, 32-bit, SSE2: Affine3d: 9.6 s Matrix4d: 2.0 s Hard-coded: 1.5 s VS 2010, 32-bit, No SSE: Affine3d: 10.0 s Matrix4d: 2.9 s Hard-coded: 1.5 s VS 2010, 64-bit: Affine3d: 3.3 s Matrix4d: 2.2 s Hard-coded: 1.4 VS 2012, 32-bit, SSE2: Affine3d: 20.9 s Matrix4d: 1.6 s Hard-coded: 1.4 s VS 2012, 64-bit: Affine3d: 17.8 s Matrix4d: 1.9 s Hard-coded: 1.4 s |
Moderator
|
Make sure you compiled in"release" mode, i.e., with optimizations on and -DNDEBUG. If that was the case, then this is very likely an inlining issue. The generated assembly would help identifying where MSVC is failing.
|
Registered Member
|
|
Registered Member
|
Is there a specific reason you are not using g++ -O3? With g++ -O3 I am getting Affine3d to still be significantly slower than Matrix4d.
Affine3d: 1.43 s Matrix4d: 1.11 s Hard-coded: 1.27 s |
Registered Member
|
I just checked the assembly. There are quite a few function calls which are not inlined even with /O2 and /Ob2 (inline any suitable).
Here is a list of functions which are not inlined: Eigen::internal::transform_transform_product_impl<Eigen::Transform<double,3,2,0>,Eigen::Transform<double,3,2,0>,0>::run Eigen::MapBase<Eigen::Block<Eigen::Matrix<double,4,4,0,4,4> const ,3,3,0>,0>::coeff Eigen::EigenBase<Eigen::Matrix<double,3,3,0,3,3> >::derived // also with other types Eigen::DenseStorage<double,9,3,3,0>::rows Eigen::DenseCoeffsBase<Eigen::Block<Eigen::Matrix<double,4,4,0,4,4> const ,3,3,0>,2>::rowStride Eigen::DenseStorage<double,9,3,3,0>::data I even tested with /Ox but it did not bring much improvement. Also, there are lots of 'inline' statements in the code. We should keep in mind, that these have little to do with inlining the code as in copy & pasting. Many of them are not required or should be replaced by EIGEN_STRONG_INLINE if that is what we actually intend. The 'inline' just helps us to prevent linker errors for non-templated free functions in header only code. Btw, inserting a few EIGEN_STRONG_INLINE just shifts the inlining issue to other functions. I.e. we need to add it in a whole lot more of places if we want one liner function to be actually inlined which is probably true for almost all one-liner functions. Regards, Hauke |
Registered users: Bing [Bot], Google [Bot], Sogou [Bot]