This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Vectorization of strided data

Tags: None
(comma "," separated)
corybrinck
Registered Member
Posts
2
Karma
0

Vectorization of strided data

Mon Aug 22, 2016 9:09 pm
I noticed that vectorization is lost for strided maps/blocks. Has adding packet load/store functions for strided data been considered? It of course wouldn't be as fast as unstrided loads, but could still offer a significant improvement in speed.

I have an application that does lots of sub-sampling and sub-blocking of complex<float> data using the Map class with dynamic stride. If I switch to complex<double> it actually runs significantly faster because the Packet1cd vectorization kicks in. Especially with the new AVX support I would expect significant improvements could be had for strided operations with the addition of strided loads/stores.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
If you are using GCC, compiling with -fcx-limited-range might help a lot because then non-vectorized complex scalar products will be inlined by the compiler. I won't repeat myself, so please see the following page for more details on std::complex speed issues: http://stackoverflow.com/questions/3765 ... erformance

Then regarding strided load/stores, I guess that you would need quite complicated expression to get visible speed up. Since a single complex<> multiplication is already quite involved, this could indeed be worth a try.
corybrinck
Registered Member
Posts
2
Karma
0

Re: Vectorization of strided data

Wed Aug 24, 2016 3:32 pm
Thanks for the link. I was wondering why operator* wasn't already vectorized for std::complex by the compiler.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot]