Registered Member
|
I'm having trouble getting Eigen to use ARM NEON vector instructions. It's detecting that the NEON instructions are available, but doesn't seem to actually use them and instead just generates four separate operations. I'm not sure if this is because something isn't set up quite right, I'm missing some compiler options, or perhaps the compiler I'm using is not supported. When targeting an Intel processor with SSE enabled, it uses the vector instructions as expected. Any ideas how to get Eigen working with ARM NEON instructions?
Test Code Based on the suggested test operating on vectors of 4 floats at http://eigen.tuxfamily.org/index.php?ti ... bly_output. I've added to it to check what Eigen is detecting, check that the NEON intrinsics are supported by the compiler, and try with arrays and vectors of floats and ints:
Targeting Intel Compiler version and command:
The resulting assembly for foo() is similar to that given in the example, using vector instructions to operate on all 4 elements of the vector in one pass:
Targeting ARM Compiler version and command targeting ARM:
The resulting assembly contains:
confirming that Eigen has detected that NEON instructions are available. Looking at the intrin() function confirms that NEON intrinsics are supported:
But the foo() function operates on a single element of the vector at a time, repeated 4 times, rather than using vector instructions:
Similar behaviour is seen with the other functions I added to operate on arrays and vectors of floats and ints. Things I've already tried:
System Info
Any suggestions on how to get Eigen working with ARM NEON instructions would be very welcome. Mark. |
Moderator
|
On ARM, only dynamically sized vector and matrices are vectorized because, to be worth the effort, the vectorization of small fixed sized vector requires that the stack is 16-bytes aligned, which cannot be guaranteed on ARM.
|
Registered Member
|
Thanks. That makes sense. Changing the parameters from Vector4f to VectorXf does indeed result in vectorised instructions on groups of 4 elements, so looks like my setup is working - just an unsuitable test in this case.
The FAQs give the example using Vector4f as a test that vectorisation is working: http://eigen.tuxfamily.org/index.php?ti ... ng_used.3F http://eigen.tuxfamily.org/index.php?ti ... bly_output Would it be worth noting in those sections that these examples using fixed-size vectors may not be vectorised anyway on some architectures? |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]