ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

ArrayBase< MatrixXcf >::abs2 not auto-vectorizing.

Page 1 of 1 (9 posts)

Tags:

willo Registered Member Posts 14 Karma 0 OS	ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Fri Sep 23, 2011 5:29 pm Eigen community and developers, I've found a case where I expect some auto-vectorized code to result, but am observing none. I've trying to use a complex, dynamic matrix, and use the ArrayBase< Derived >::abs2() to get a real, dynamic matrix with the squared magnitude of the input. I'm observing that the resulting code is not auto-vectorized. Doubly worse, I'm observing code that gets the complex-absolute-value, then squares it. This is wasteful, as the complex-absolute-value does a square-root which is un-done by the subsequent square operation. I'd expect that 'array.abs2()' should evaluate to something nearly identical to: array.real().square() + array.imag().square() My question is, should I expect: - the result of abs2 to skip the square-root then square? - the resulting code to be auto-vectorized? My processor is Intel 64-bit Core 2 Duo. My OS is Ubuntu 11.04. My Eigen library version is 3.0.2. My compiler is g++-4.5. My compiler flags are: -Wall -O3 -DNDEBUG -march=native My test code is here: Code: Select all #include <Eigen/Dense> #include <iostream> int main( int argc, char* argv[] ) { Eigen::MatrixXcf in_matrix( 20, 40 ); Eigen::MatrixXf out_matrix( 20, 40 ); asm( "# BEGIN LIBRARY" ); out_matrix.array() = in_matrix.array().abs2(); asm( "# END LIBRARY" ); asm( "# BEGIN MANUAL" ); out_matrix.array() = in_matrix.array().real().square() + in_matrix.array().imag().square(); asm( "# END MANUAL" ); std::cout << "In matrix: " << in_matrix << std::endl; std::cout << "Out matrix: " << out_matrix << std::endl; return 0; }
ggael Moderator Posts 3447 Karma 19 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Fri Sep 23, 2011 5:43 pm The difficulty is that the result of abs2 is not a complex, and so we would need to take two packets of complexes to return one of reals. That's not impossible, but difficult to do in a general manner. In the future we'll support packet of various sizes for the same scalar type. Therefore, we can imagine meta packet types that will aggregate multiple atomic packets that is exactly what we need for abs2 and efficient loop peeling.
willo Registered Member Posts 14 Karma 0 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Fri Sep 23, 2011 6:04 pm Gaël, Thanks for the quick response. Okay, so I can't expect Array< MatrixXcf >::abs2() to be vectorized any time soon. Is my manual formulation of 'abs2' the best I can do with Eigen 3.0.x?
ggael Moderator Posts 3447 Karma 19 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Fri Sep 23, 2011 6:19 pm in the meantime the best is to use .abs2(). Indeed, std::norm(complex) returns the squared magnitude and is therefore implemented as re^2+im^2. Yes, I know that's very misleading, a mistake of C++ among many others...
willo Registered Member Posts 14 Karma 0 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Mon Nov 07, 2011 6:58 pm Gaël, I've discovered that, for my environment, it is better to use the "manual" evaluation of 'abs2' for a complex matrix type. Specifically, the resulting assembly code is making SSE packet-calls, and the resulting run-time is roughly an order of magnitude improved. I suspect that, despite the limitations of Eigen to auto-vectorize complex-to-real operations, this is still faster because: [list=] [] some auto-vectorized code is being created. [] the square-root then square operations are not required. [/list] What do you think about these results? Below is my updated example: Code: Select all #include <Eigen/Dense> #include <ctime> // timespec, clock_gettime, CLOCK_REALTIME #include <iostream> double get_time() { timespec timespec; int rc = -1; rc = clock_gettime( CLOCK_REALTIME, &timespec ); assert( rc == 0 ); const double time = static_cast< double >( timespec.tv_sec ) + static_cast< double >( timespec.tv_nsec ) * 1e-9; return time; } int main( int argc, char* argv[] ) { const size_t num_rows = 1000u; const size_t num_cols = 1000u; Eigen::MatrixXcf in_matrix = Eigen::MatrixXcf::Random( num_rows, num_cols ); Eigen::MatrixXf out_matrix = Eigen::MatrixXf::Random( num_rows, num_cols ); const double library_begin_time = get_time(); asm( "# BEGIN LIBRARY" ); out_matrix.array() = in_matrix.array().abs2(); asm( "# END LIBRARY" ); const double library_end_time = get_time(); const double manual_begin_time = get_time(); asm( "# BEGIN MANUAL" ); out_matrix.array() = in_matrix.array().real().square() + in_matrix.array().imag().square(); asm( "# END MANUAL" ); const double manual_end_time = get_time(); std::cout << "In matrix: " << in_matrix << std::endl; std::cout << "Out matrix: " << out_matrix << std::endl; std::cout << "Library elapsed-time: " << library_end_time - library_begin_time << "." << std::endl; std::cout << "Manual elapsed-time: " << manual_end_time - manual_begin_time << "." << std::endl; return 0; } My compiler was: g++ (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2 My compiler flags were: "-Wall -O2 -l rt" Below is a section of the resulting assembly code: Code: Select all call _Z8get_timev movsd %xmm0, (%rsp) #APP # 25 "test_eigen_complex_abs2_vectorization.cpp" 1 # BEGIN LIBRARY # 0 "" 2 #NO_APP movq 40(%rsp), %rax cmpq 72(%rsp), %rax jne .L722 movq 48(%rsp), %rbp cmpq 80(%rsp), %rbp jne .L722 imulq %rax, %rbp testq %rbp, %rbp jle .L724 movq 32(%rsp), %r13 movq 64(%rsp), %r12 xorl %ebx, %ebx .p2align 4,,10 .p2align 3 .L725: movq (%r12,%rbx,8), %xmm0 call cabsf mulss %xmm0, %xmm0 movss %xmm0, 0(%r13,%rbx,4) addq $1, %rbx cmpq %rbp, %rbx jne .L725 .L724: #APP # 27 "test_eigen_complex_abs2_vectorization.cpp" 1 # END LIBRARY # 0 "" 2 #NO_APP call _Z8get_timev movsd %xmm0, 8(%rsp) call _Z8get_timev movsd %xmm0, 16(%rsp) #APP # 31 "test_eigen_complex_abs2_vectorization.cpp" 1 # BEGIN MANUAL # 0 "" 2 #NO_APP movq 72(%rsp), %rax cmpq 40(%rsp), %rax movq 80(%rsp), %rcx jne .L726 cmpq 48(%rsp), %rcx jne .L726 imulq %rax, %rcx testq %rcx, %rcx jle .L728 movq 32(%rsp), %rsi movq 64(%rsp), %rdx xorl %eax, %eax .p2align 4,,10 .p2align 3 .L729: movss 4(%rdx), %xmm0 movss (%rdx), %xmm1 mulss %xmm0, %xmm0 addq $8, %rdx mulss %xmm1, %xmm1 addss %xmm1, %xmm0 movss %xmm0, (%rsi,%rax,4) addq $1, %rax cmpq %rcx, %rax jne .L729 .L728: #APP # 34 "test_eigen_complex_abs2_vectorization.cpp" 1 # END MANUAL # 0 "" 2 #NO_APP call _Z8get_timev movl $11, %edx The resuling timing numbers are usually: In matrix: (-0.211234,0.680375) Out matrix: 0.507531 Library elapsed-time: 0.0228312. Manual elapsed-time: 0.00304437.
ggael Moderator Posts 3447 Karma 19 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Tue Nov 08, 2011 7:24 am The "manual" version is not vectorized. It uses scalar SSE instructions, not the vector ones. The problem of .abs2() is that it calls the function cabsf and then squares. This clearly must be reimplemented in a more optimized way!
willo Registered Member Posts 14 Karma 0 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Tue Nov 08, 2011 3:18 pm You're right, they are SIMD instructions, but not the vectorized/packet ones. I agree 'abs2' for complex types should be implemented in a different, optimized way. Should I file a bug-report, or is this something that is already being addressed?
ggael Moderator Posts 3447 Karma 19 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Tue Nov 08, 2011 9:43 pm this is already fixed in the default branch.
willo Registered Member Posts 14 Karma 0 OS	Re: ArrayBase< MatrixXcf >::abs2 not auto-vectorizing. Wed Nov 09, 2011 7:25 pm I see your change at: https://bitbucket.org/eigen/eigen/chang ... a04dc004e5 I just wrote a small unit-test for std::norm, vs. a manual 'magnitude squared', and it is doing 'cabs' then squaring it! I can't believe GNU's 'std::norm' has such a basic bug.

Page 1 of 1 (9 posts)

Bookmarks

Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]