Registered Member
|
Eigen community and developers,
I've found a case where I expect some auto-vectorized code to result, but am observing none. I've trying to use a complex, dynamic matrix, and use the ArrayBase< Derived >::abs2() to get a real, dynamic matrix with the squared magnitude of the input. I'm observing that the resulting code is not auto-vectorized. Doubly worse, I'm observing code that gets the complex-absolute-value, then squares it. This is wasteful, as the complex-absolute-value does a square-root which is un-done by the subsequent square operation. I'd expect that 'array.abs2()' should evaluate to something nearly identical to: array.real().square() + array.imag().square() My question is, should I expect: - the result of abs2 to skip the square-root then square? - the resulting code to be auto-vectorized? My processor is Intel 64-bit Core 2 Duo. My OS is Ubuntu 11.04. My Eigen library version is 3.0.2. My compiler is g++-4.5. My compiler flags are: -Wall -O3 -DNDEBUG -march=native My test code is here:
|
Moderator
|
The difficulty is that the result of abs2 is not a complex, and so we would need to take two packets of complexes to return one of reals. That's not impossible, but difficult to do in a general manner. In the future we'll support packet of various sizes for the same scalar type. Therefore, we can imagine meta packet types that will aggregate multiple atomic packets that is exactly what we need for abs2 and efficient loop peeling.
|
Registered Member
|
Gaël,
Thanks for the quick response. Okay, so I can't expect Array< MatrixXcf >::abs2() to be vectorized any time soon. Is my manual formulation of 'abs2' the best I can do with Eigen 3.0.x? |
Moderator
|
in the meantime the best is to use .abs2(). Indeed, std::norm(complex) returns the squared magnitude and is therefore implemented as re^2+im^2. Yes, I know that's very misleading, a mistake of C++ among many others...
|
Registered Member
|
Gaël,
I've discovered that, for my environment, it is better to use the "manual" evaluation of 'abs2' for a complex matrix type. Specifically, the resulting assembly code is making SSE packet-calls, and the resulting run-time is roughly an order of magnitude improved. I suspect that, despite the limitations of Eigen to auto-vectorize complex-to-real operations, this is still faster because: [list=] [*] some auto-vectorized code is being created. [*] the square-root then square operations are not required. [/list] What do you think about these results? Below is my updated example:
My compiler was: g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 My compiler flags were: "-Wall -O2 -l rt" Below is a section of the resulting assembly code:
The resuling timing numbers are usually:
|
Moderator
|
The "manual" version is not vectorized. It uses scalar SSE instructions, not the vector ones. The problem of .abs2() is that it calls the function cabsf and then squares. This clearly must be reimplemented in a more optimized way!
|
Registered Member
|
You're right, they are SIMD instructions, but not the vectorized/packet ones.
I agree 'abs2' for complex types should be implemented in a different, optimized way. Should I file a bug-report, or is this something that is already being addressed? |
Moderator
|
this is already fixed in the default branch.
|
Registered Member
|
I see your change at:
https://bitbucket.org/eigen/eigen/chang ... a04dc004e5 I just wrote a small unit-test for std::norm, vs. a manual 'magnitude squared', and it is doing 'cabs' then squaring it! I can't believe GNU's 'std::norm' has such a basic bug. |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]