Registered Member
|
I am running a slightly modified version of a code posted in the topic "Low efficiency using Eigen..." by Hauke(viewtopic.php?f=74&t=87929&p=158729&hilit=map+efficiency#p158729), though my issue/question is a bit different. In the code by Hauke vectors (matrix) are initialized from C arrays using Maps that then are copied into vectors (matrix) in the initialization as:
Vector4d VIE = Vector4d::MapAligned(VI); with VI a C vector. I was wondering how efficiency would get affected if instead I did: Map<Vector4d,Aligned> VIE(VI); so VIE now is an actual Map. This is very convenient if you are modifying an existing C code and you want to manage your own memory. However I found out the code with Map's to be 20% slower!!! Is there anything I am missing? Find my version of the code below. I compiled with: g++ -Ofast -DNDEBUG -DEIGEN_NO_MALLOC -I/path_to_my_Eigen benchmark1.cpp -o benchmark1 -lrt with gcc version 4.7.2. I used EIGEN_VECTORIZE to verify that indeed vectorization is on. I have an intel i5 processor. Using maps (#define USE_MAPS at the top) the code runs in 1370 ms. NOT using maps (commenting out #define USE_MAPS) it runs in 1123 ms. I would like to add that I'd like to use maps if possible so I can manage the memory, specially because I can use pointers within my code to different location in the data. Thank you a lot in advance. This is the code (full and self contained so you can try it out if you can):
|
Registered Member
|
An interesting fact: If the arrays passed to the maps are allocated dynamically (with new) the program runs about 5-7% faster.
A. |
Moderator
|
There is not enough computation to get reliable performance measures. The best in this case is still to at the generated code:
Without Eigen::Map:
With Eigen::Map:
The only differences are the three movq instructions at the beginning because you have to access the data through a pointer. However, in practice everything is inlined and therefore the overhead of this indirection will be amortised over the multiple operations, or even completely removed. Finally, you can explicitly avoid this indirection by casting your raw data into a Matrix4d or Vector4d through a reinterpret_cast, but that's rather ugly, and again, in real world code I doubt you'll get any speedup by doing though. |
Registered Member
|
Fist of all thank you so much for taking the time to look into this. You even looked into the assembly code!!
What do you mean with "the overhead of this indirection will be amortised over the multiple operations"? this code is calling the testing routine many times to get enough statistics. Wouldn't that simulate a real application where these operations are performed multiple times? if not, how would you make this test more realistic? And would you know why Eigen bothers to have a separate class Map to map C arrays instead of just having a class constructor for Matrix doing this? I had the idea this was because Eigen didn't want to loose any efficiency doing this but that was just a very non-educated guess. Finally, do you think it would be ok then to have in my project all Map's instead of Matrix's without a final impact in my performance?. Having Map's instead of Matrix's would allow me to have C pointers in the background as I want it. Thank much you again. |
Moderator
|
Your test function only does one matrix-vector product, and the Map objects are created outside the function. Is it really what you're going to do in your "real" code?
What you suggest would only be possible for dynamically allocated matrices, not for fixed, statically allocated ones.
Yes, especially if you declare your Map objects as you need them. |
Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]