This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Performance differences (32/64), and other things

Tags: None
(comma "," separated)
bloodtoes
Registered Member
Posts
2
Karma
0
Hello,

I'm using Eigen for some 3D geometry processing and one of the things I'm doing is constructing a transform matrix from a translation, euler rotation and scale, and I've noticed some dramatic performance differences whether I compile for 32 or 64 bit.

Here's the code:

Code: Select all
typedef Eigen::Matrix<float,3,1> fvec3;
typedef Eigen::Matrix<float,4,1> fvec4;
typedef Eigen::Matrix<float,4,4> fmat4x4;
typedef std::vector<fvec4,Eigen::aligned_allocator<fvec4> > fvec4array;
typedef std::vector<fmat4x4,Eigen::aligned_allocator<fmat4x4> > fmat4x4array;

...

// These arrays will all have the exact same number of elements
fvec4array positions;
fvec4array rotations;
fvec4array scalings;
fmat4x4array matrices;

...

Eigen::Affine3f t;
for( size_t i = 0; i < positions.size(); ++i )
{
    t = Eigen::Translation3f( reinterpret_cast<fvec3 &>(positions[i]) ) *
        Eigen::AngleAxisf( rotations[i][0], fvec3::UnitZ() ) *
        Eigen::AngleAxisf( rotations[i][1], fvec3::UnitY() ) *
        Eigen::AngleAxisf( rotations[i][2], fvec3::UnitX() ) *
        Eigen::Scaling( reinterpret_cast<fvec3 &>(scalings[i]) );
    matrices[i] = t.matrix();
}


I am using MSVC2008 and Eigen 3.0.3. When compiled for 32-bit, the above code takes 3x as long as when compiled for 64-bit. I'm not sure how to go about finding out why.

Also, is this the optimal way to go about generating a transform matrix from these inputs?

Cheers.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
One reason could be that for 32-bit you did not enabled SSE2 while they are enabled by default for 64-bit. I'm talking about the compiler options. If that changes nothing then blame MSVC.

Note that doing:

Code: Select all
t.fromPositionOrientationScale(
    positions[i],
    Eigen::AngleAxisf( rotations[i][0], fvec3::UnitZ() ) *
    Eigen::AngleAxisf( rotations[i][1], fvec3::UnitY() ) *
    Eigen::AngleAxisf( rotations[i][2], fvec3::UnitX() ),
    scalings[i]);


might be faster.
bloodtoes
Registered Member
Posts
2
Karma
0
I do have SSE2 enabled so I'm not sure what it could be. I'll try digging deeper to see what's up..

Thanks for the tip about the transform matrix. Using that routine cut the time to compute the transform matrix nearly in half.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
It could be that MSVC has different inlining heuristics in 32 or 64 bits. The best is to check the generated assembly of your for loop and compare the 32 and 64 bit versions. In particular compare the function calls. If you find something interesting we could enforce the inlining of some functions.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]