Registered Member
|
The speed of unoptimized (ie compiled in for Debug) eigen is killing me. Does anyone have techniques for getting around this? Eigen is widely scattered throughout my code so I can't isolate it easily. The only idea I have so far is to create an Eigen only library with the subset of Eigen I'm using in a thin wrapper of the same names and flipping namespaces an ifdef...
thoughts? |
Moderator
|
What about compiling with both debug symbols and minimal optimizations: -O1 -g2 ?
|
Registered Member
|
What about adding an inlining macro to eigen (e.g. EIGEN_INLINE)? This would allow changing inline to be __forceinline in intel and setting the always_inline attribute for gcc, etc... It would also allow you to set inlining back to just 'inline' if you actually wanted to debug Eigen itself. I believe the massive slowdown for debugging apps using Eigen extensively is due to the large call stacks generated at runtime.
|
Registered Member
|
I see there is an EIGEN_STRONG_INLINE macro. Unfortunately, it seems it's not set up to be modified prior to including the Eigen headers, and it currently sits at just 'inline' unless you have the intel compiler. Unsure how much changing all 'inline' to always inline vs just EIGEN_STRONG_INLINE to always inline would help, but it might be worth playing with both options.
|
Registered Member
|
Here are some timings with one instantiation of my app:
no special inlining... dbg 00:03:38.412005 rel 00:00:12.835160 eigen_strong_inline to forced ... dbg 00:03:05.560514 rel 00:00:12.861347 all eigen inlining forced... dbg 00:02:27.530674 rel 00:00:12.824934 release mode is more-or-less unaffected by this change (I think this is within statistical noise). debug mode is 33% faster when all functions are inlined and 15% faster when the EIGEN_STRONG_INLINE functions are inlined. It would be interesting to see how other apps are affected. BTW, for me it was sufficient for the third test to do this: #define inline inline __attribute__((always_inline)) #include <Eigen/Dense> #undef inline The second test was accomplished by modifying Macros.h: // EIGEN_FORCE_INLINE means "inline as much as possible" #if (defined _MSC_VER) || (defined __INTEL_COMPILER) #define EIGEN_STRONG_INLINE __forceinline #else #define EIGEN_STRONG_INLINE inline #endif to // EIGEN_FORCE_INLINE means "inline as much as possible" #if (defined _MSC_VER) || (defined __INTEL_COMPILER) #define EIGEN_STRONG_INLINE __forceinline #else #define EIGEN_STRONG_INLINE inline EIGEN_ALWAYS_INLINE_ATTRIB #endif |
Moderator
|
note that many trivial function are note explicitly qualified with "inline" because they are defined in the class declaration. So perhaps even better speedup could be achieved if really all functions are forced to be inlined. Perhaps we could add a EIGEN_FORCE_INLINING option doing that.
|
Moderator
|
|
Registered Member
|
Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]