improve aliasing analysis / restrict pointers? • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

improve aliasing analysis / restrict pointers?

Page 1 of 1 (3 posts)

Tags:

hansecke Registered Member Posts 2 Karma 0	improve aliasing analysis / restrict pointers? Sat May 24, 2014 5:28 pm Hi there- I've recently written a simple vector class which is as simple as can be underneath (). An example function: Code: Select all `void sum(floatvec a, floatvec b, floatvec s) { const size_t n=a.size(); for(size_t i=0; i<n; i++) { s[i]=a[i]+b[i]; } }` Looking at the generated assembler shows that the compiler can not determine that a,b, and s refer to distinct memory areas. As a result, the assembler loop shows unnecessary loads. If I change the function to the below it does not have those loads and performs better in benchmarks: Code: Select all `void sum(myvec a, myvec b, myvec s) { float__restrict__ va = a.memory(); float__restrict__ vb = b.memory(); float__restrict__ vs = s.memory(); const size_t n=a.size(); for(size_t i=0; i<n; i++) { vs[i]=va[i]+vb[i]; } }` In C you can fix issues like this by declaring variables with __restrict__, but I have not been able to figure out where to put that in my class or in the declaration of sum(). Has somebody looked at the assembler generated by Eigen to see if it is as fast as possible? Worked with __restrict__ and aliasing analysis to make sure the compiler understands how to make the code as fast as possible? Thanks! Hans (* My vector class has other features as well, but I can replicate my problem even with a super-simple class which does no more than wrap a heap-allocated float array)
ggael Moderator Posts 3447 Karma 19 OS	Re: improve aliasing analysis / restrict pointers? Sat May 24, 2014 8:38 pm Pointer aliasing cannot explain this. My bet is that floatvec::operator[] is expecting an int instead of a std::size_t. In Eigen we are careful to use the same type everywhere for indexes and sizes.
hansecke Registered Member Posts 2 Karma 0	Re: improve aliasing analysis / restrict pointers? Tue May 27, 2014 11:39 pm Hi Gaël, And thank you for your reply. This was not the reason, but I did finally figure it out: If I compile with -Os (optimize for size) the generated assembler is a very clear 27 lines (13 lines of instructions). As described above, the inner loop of the first version of the sum() function has those superfluous loads. If I compile with -O3 the sum() function gets compiled to an assembler file of 94 lines, most of which I do not understand. GCC creates the correct short inner loops for both versions of sum(). If I compile with -O2 the sum() function gets compiled to an assembler file of 31 lines (14 lines of instructions). Both versions have the correct short inner loop and both assembler programs are very understandable. Takeaway: -Os might generate nicer looking and shorter code, but even for simple programs it can optimize significantly worse than -O2 or -O3 -O2 might generate the best balance of brevity and correct optimization. Cheers Hans

Page 1 of 1 (3 posts)

Bookmarks

Who is online

Registered users: Baidu [Spider], Bing [Bot], Google [Bot], rblackwell