Registered Member
|
Hi All,
I have observed some performance slow downs using Eigen. I would have expected the library to be able to unroll the overhead here (maybe with the exception of aliasing betweeing in/out). The performance is twice as slow when compiled with msvc2012 x64 against manually coding the matrix multiplication / max / min and cast. Are my expectations unreasonable here? What am I missing?
The full code can be viewed at http://pastebin.com/8Ssq1uwB Cheers, Michal |
Moderator
|
Make sure you benchmarked with optimization ON (release mode). Then, the "manual" function is broken and does not perform the right computation. After fixing it I get with gcc 4.8 -O3 -DNDEBUG:
eigen1 0.0616467 eigen2 0.0552015 eigen3 0.0575281 eigen4 0.0738412 manual 0.0728137 so perhaps MSVC fails to inline some function. Have a look at the generated assembly for the eigen1 function and seach for "call". |
Registered Member
|
Hi Gael,
Thanks for taking a look! The code was compiled in release mode with /Ox (full optimization) and NDEBUG was defined. Even with the 'favor speed over size' option (/Ot) there is indeed a 'call' in the assembly listing (to something like ArrayWrapper::CoeffBasedProduct). I am guessing there's not much that the library can do to help the compiler inline here. Have you guys had success with the MSVC team incorporating optimizations required by eigen? Cheers, Michal |
Moderator
|
__forceinline can help MSVC. What is the precise call?
|
Registered Member
|
This is what I get in the assembly:
call ??0?$ArrayWrapper@$$CBV?$CoeffBasedProduct@AEBV?$Matrix@M$02$02$0A@$02$02@Eigen@@AEBV?$Matrix@M$02$00$0A@$02$00@2@$05@Eigen@@@Eigen@@QEAA@AEBV?$CoeffBasedProduct@AEBV?$Matrix@M$02$02$0A@$02$02@Eigen@@AEBV?$Matrix@M$02$00$0A@$02$00@2@$05@1@@Z ; Eigen::ArrayWrapper<Eigen::CoeffBasedProduct<Eigen::Matrix<float,3,3,0,3,3> const & __ptr64,Eigen::Matrix<float,3,1,0,3,1> const & __ptr64,6> const >::ArrayWrapper<Eigen::CoeffBasedProduct<Eigen::Matrix<float,3,3,0,3,3> const & __ptr64,Eigen::Matrix<float,3,1,0,3,1> const & __ptr64,6> const > |
Registered Member
|
|
Moderator
|
yes, file src/Core/ArrayWrapper.h, line 52, you might try to add EIGEN_STRONG_INLINE:
EIGEN_STRONG_INLINE ArrayWrapper(ExpressionType& matrix) : m_expression(matrix) {} This will likely defer the non-inlining issue further, so might have to repeat this step. |
Registered Member
|
Using EIGEN_STRONG_INLINE at:
1) src/Core/MatrixBase.h (line 322) and the other array() method 2) src/Core/MatrixBase.h (line 503) and all other constructors there did not help 3) src/Core/products/CoeffBasedProduct.h (line 148) and the other constructor there did not help ((2) and (3) may not be needed) Results in no call in the generated code and a substantial improvement. The code is still slower than the manual version when using Map. eigen1 - 180 ms (Map for input and output) eigen2 - 180 ms (no input Map) eigen3 - 160 ms (no output Map) eigen4 - 150 ms (manual implemenation of min(), max(), array() +) manual - 160 ms |
Moderator
|
So the one on src/Core/ArrayWrapper.h is not needed? or do you need this one + (1) ?
|
Registered Member
|
To clarify, I need both (0) and (1) but not the others to prevent the call:
0) src/Core/ArrayWrapper.h, line 52, 1) src/Core/MatrixBase.h (line 322) and the other array() method |
Moderator
|
Thank you.
https://bitbucket.org/eigen/eigen/commits/5b2464f21d02/ Changeset: 5b2464f21d02 User: ggael Date: 2014-03-04 17:24:00 Summary: Help MSVC to inline some trivial functions |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot]