This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Slow on Cortex M4

Tags: None
(comma "," separated)
nghiaho
Registered Member
Posts
1
Karma
0

Slow on Cortex M4

Tue Dec 15, 2015 7:06 am
Hi all,

I've got Eigen running on an STM32F4 (Cortex M4) discovery board with ARM gcc 4.9.3. I've compared it with another matrix library provided by the PX4 autopilot project. I did this simple test:

Code: Select all
Matrix<float, 2, 2> A, B, C

B.setIdentity();
C.setIdentity();

__asm(#start_here)
C = A + B
__asm(#end_here)

Eigen takes about 30 uS to complete, whilst the PX4 matrix library takes 3 uS. The PX4 library just does two nested for loops, no error checking, straight forward. I did a quick inspection of the assembly between #start_here and #end_here and Eigen generates an insane amount of code like below.

Code: Select all
804eeb4:   48b7         ldr   r0, [pc, #732]   ; (804f194 <main+0x11ec>)
804eeb6:   4621         mov   r1, r4
804eeb8:   f035 fa26    bl   8084308 <__cyg_profile_func_enter>
804eebc:   4621         mov   r1, r4
804eebe:   48b6         ldr   r0, [pc, #728]   ; (804f198 <main+0x11f0>)
804eec0:   f035 fa22    bl   8084308 <__cyg_profile_func_enter>
804eec4:   4621         mov   r1, r4
804eec6:   48b4         ldr   r0, [pc, #720]   ; (804f198 <main+0x11f0>)
804eec8:   f035 fa30    bl   808432c <__cyg_profile_func_exit>
....
....
(for ~250 lines)
....
 804f168:       08049745        .word   0x08049745
 804f16c:       080467f9        .word   0x080467f9
 804f170:       0804b6fd        .word   0x0804b6fd
 804f174:       0804d729        .word   0x0804d729
 804f178:       0804ce69        .word   0x0804ce69
 804f17c:       0804b94d        .word   0x0804b94d
 804f180:       0804bd99        .word   0x0804bd99
 804f184:       0804db0d        .word   0x0804db0d
 804f188:       0804d275        .word   0x0804d275
 804f18c:       0804c1f9        .word   0x0804c1f9
...
(and more ldr, mov stuff)
...

I have no idea what it is doung. But it's probably why it's running so slow, just a lot of instructions to go through. Any ideas?

Nghia
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS

Re: Slow on Cortex M4

Thu Dec 17, 2015 8:49 am
make sure that you compiled with optimizations: -O3 -DNDEBUG and also make sure that C is really used afterwards so that the compiler did not removed useful code for PX4 version. The best is usually to wrap the interesting expression within a non inlined function:

EIGEN_DONT_INLINE void foo(Matrix2f &A, const MatrixX2f &B, const Matrix2f &C) { C = A+B; }

and same for PX4, then you can start looking at the assembly....


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot]