Registered Member
|
Hi there-
I've recently written a simple vector class which is as simple as can be underneath (*). An example function:
Looking at the generated assembler shows that the compiler can not determine that a,b, and s refer to distinct memory areas. As a result, the assembler loop shows unnecessary loads. If I change the function to the below it does not have those loads and performs better in benchmarks:
In C you can fix issues like this by declaring variables with __restrict__, but I have not been able to figure out where to put that in my class or in the declaration of sum(). Has somebody looked at the assembler generated by Eigen to see if it is as fast as possible? Worked with __restrict__ and aliasing analysis to make sure the compiler understands how to make the code as fast as possible? Thanks! Hans (* My vector class has other features as well, but I can replicate my problem even with a super-simple class which does no more than wrap a heap-allocated float array) |
Moderator
|
Pointer aliasing cannot explain this. My bet is that floatvec::operator[] is expecting an int instead of a std::size_t. In Eigen we are careful to use the same type everywhere for indexes and sizes.
|
Registered Member
|
Hi Gaël,
And thank you for your reply. This was not the reason, but I did finally figure it out:
Takeaway:
Cheers Hans |
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], rblackwell