This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Performance degradation: Updating from eigen2 to devel

Tags: None
(comma "," separated)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Note that noalias or lazy are only useful for products. So here you should try to remove them as they might complicate the task of the compilers. Here the results are consistent:

Eigen2 Eigen3
novec 1 1
vec 0.64 0.59
EamonNerbonne
Registered Member
Posts
36
Karma
0
ggael wrote:Note that noalias or lazy are only useful for products. So here you should try to remove them as they might complicate the task of the compilers. Here the results are consistent:


Yeah, that's what I expected, but then I saw a difference in the timings and assumed it might be more complicated. I re-ran em, and indeed noalias makes no difference to gcc. However, for MSC:
Code: Select all
Compiler Eigen2  Eigen3(noalias) Eigen3
Msc      2.20    3.24            1.67
Msc+v    1.37    1.97            2.09
Mingw    0.82    1.60       
Mingw+v  0.92    0.90


gcc 4.4.3 + eigen2 is still faster than the custom vectorization. The results above are using -O3 and few other flags, but I tried with plain -O2, and it's just 0.03 seconds slower; so that's not a big issue. I'm assuming that represents working as intended - it's a bit faster that v2's vectorization, consistently.

However, MSC isn't being happy; I think inlining isn't working as expected (if it were, the profiler outputs shouldn't include any of the eigen internals).
EamonNerbonne
Registered Member
Posts
36
Karma
0
It's indeed the inlining.

in assign.h, I replaced lines near line 407:
Code: Select all
struct ei_assign_impl<Derived1, Derived2, LinearVectorizedTraversal, NoUnrolling>
{
  inline static void run(Derived1 &dst, const Derived2 &src)

with:
Code: Select all
struct ei_assign_impl<Derived1, Derived2, LinearVectorizedTraversal, NoUnrolling>
{
#ifdef _MSC_VER
   __forceinline
#else
    inline
#endif
     static void run(Derived1 &dst, const Derived2 &src)


And MSC now performs as expected: (benchmark version without lazy/noalias)
Code: Select all
Compiler Eigen2  Eigen3
Msc      2.20    1.67
Msc+v    1.37    0.99
Mingw    0.82    1.60       
Mingw+v  0.92    0.90

gcc+eigen2(novector) is still abnormally quick; it must be auto-vectorizing by itself.
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Great work!

We have a macro EIGEN_STRONG_INLINE exactly for that (see in Macros.h).

Why don't you send us a patch? So you get credited in the hg history.

http://eigen.tuxfamily.org/index.php?ti ... ng_a_patch


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
EamonNerbonne wrote:It's indeed the inlining.
gcc+eigen2(novector) is still abnormally quick; it must be auto-vectorizing by itself.


Try -O2 then. According to the gcc man page, -ftree-vectorize is only enabled at -O3.

And you can use -ftree-vectorize-verbose=n to get info.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
EamonNerbonne
Registered Member
Posts
36
Karma
0
I think I'm having trouble with the eigen mailing list daemon. Can someone verify whether or not they've received two messages on the mailing list? One concerns a minor patch to BenchTimer, and the second concerns another regression from Eigen2-->Eigen3 in code like this:

Code: Select all
typedef Matrix<double,Dynamic,2> QMatrix;
        QMatrix Q = QMatrix::Random(DIMS,2);
        Vector2d v = Vector2d::Random(DIMS);
        VectorXd r = VectorXd::Random(DIMS);

//Then loop this
#if EIGEN3
            r.noalias() = Q * v;
#else
            r = (Q * v).lazy();
#endif


Patches were attached to both.


Bookmarks



Who is online

Registered users: abc72656, Bing [Bot], daret, Google [Bot], Sogou [Bot], Yahoo [Bot]