Registered Member
|
I noticed that vectorization is lost for strided maps/blocks. Has adding packet load/store functions for strided data been considered? It of course wouldn't be as fast as unstrided loads, but could still offer a significant improvement in speed.
I have an application that does lots of sub-sampling and sub-blocking of complex<float> data using the Map class with dynamic stride. If I switch to complex<double> it actually runs significantly faster because the Packet1cd vectorization kicks in. Especially with the new AVX support I would expect significant improvements could be had for strided operations with the addition of strided loads/stores. |
Moderator
|
If you are using GCC, compiling with -fcx-limited-range might help a lot because then non-vectorized complex scalar products will be inlined by the compiler. I won't repeat myself, so please see the following page for more details on std::complex speed issues: http://stackoverflow.com/questions/3765 ... erformance
Then regarding strided load/stores, I guess that you would need quite complicated expression to get visible speed up. Since a single complex<> multiplication is already quite involved, this could indeed be worth a try. |
Registered Member
|
Thanks for the link. I was wondering why operator* wasn't already vectorized for std::complex by the compiler.
|
Registered users: Bing [Bot], Google [Bot], Sogou [Bot]