Vectorization confirmation- Assembly Code

Thu Jun 23, 2011 9:10 pm

Hi,

I am running a sparse diagonalization routine (ARPACK++) whose only input is a function that calculates the matrix times a vector. This is working beautifully. I recently was introduced to Eigen, and since most of the computation time comes from that matrix-vector product I thought the vectorization done by Eigen could make this calculation much faster. However, after setting this up, I see a negligible effect, if any. Either a) my compiler was already vectorizing the code, or b) Eigen isn't. I compile with g++ 4.2 on Mac OS X 10.6. I have checked that EIGEN_VECTORIZE is defined, and I am compiling with the -msse2 flag. One concern is that, for reasons I don't want to get into, I have to compile everything in 32 bit architecture using the -m32 flag. Does that make a difference? As per the Eigen FAQ, I inserted assembler print statements and, using the -S flag, printed out the assembler code... but I have never looked at assembler code in my life and have no idea how to tell if this is vectorized.

Here is the line that I am trying to vectorize (where w and v are standard arrays of size n and Msparse is an n by n SparseMatrix that has already been initialized):

And here is the associated assembly code (sorry it is so long):

Can anyone please tell me if this is vectorized?

Thank you in advance!

-Carl

cpg42 Registered Member Posts 4 Karma 0 OS	Vectorization confirmation- Assembly Code Thu Jun 23, 2011 9:10 pm Hi, I am running a sparse diagonalization routine (ARPACK++) whose only input is a function that calculates the matrix times a vector. This is working beautifully. I recently was introduced to Eigen, and since most of the computation time comes from that matrix-vector product I thought the vectorization done by Eigen could make this calculation much faster. However, after setting this up, I see a negligible effect, if any. Either a) my compiler was already vectorizing the code, or b) Eigen isn't. I compile with g++ 4.2 on Mac OS X 10.6. I have checked that EIGEN_VECTORIZE is defined, and I am compiling with the -msse2 flag. One concern is that, for reasons I don't want to get into, I have to compile everything in 32 bit architecture using the -m32 flag. Does that make a difference? As per the Eigen FAQ, I inserted assembler print statements and, using the -S flag, printed out the assembler code... but I have never looked at assembler code in my life and have no idea how to tell if this is vectorized. Here is the line that I am trying to vectorize (where w and v are standard arrays of size n and Msparse is an n by n SparseMatrix that has already been initialized): Code: Select all `asm("#it begins here!"); Map<VectorXd>(w, n) = Msparse*Map<VectorXd>(v, n); asm("#it ends here!");` And here is the associated assembly code (sorry it is so long): Code: Select all #it begins here! movl $1, %eax call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb movb %al, -121(%ebp) testb %al, %al je L1204 L1112: movl 8(%ebp), %edi addl $68, %edi movl %edi, (%esp) LEHB39: call __ZNK9index_map4sizeEv LEHE39: addl %eax, %eax movl 12(%ebp), %edx movl %edx, -64(%ebp) movl %eax, -60(%ebp) notl %eax shrl $31, %eax call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1205 L1114: cmpb $0, -121(%ebp) je L1206 L1116: leal -64(%ebp), %eax movl 8(%ebp), %edx addl $36, %edx movl %edx, -80(%ebp) movl %eax, -76(%ebp) movl $0, -72(%ebp) movl $0, -68(%ebp) movl 4(%edx), %eax cmpl -60(%ebp), %eax sete %al movzbl %al, %eax call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1207 L1120: L1118: cmpb $0, -121(%ebp) jne L1121 movl $70, 12(%esp) leal LC53-"L00000000035$pb"(%ebx), %eax movl %eax, 8(%esp) leal __ZZN5Eigen6StrideILi0ELi0EEC4EvE19__PRETTY_FUNCTION__-"L00000000035$pb"(%ebx), %eax movl %eax, 4(%esp) leal LC54-"L00000000035$pb"(%ebx), %eax movl %eax, (%esp) LEHB40: call __ZN5Eigen8internal11assert_failEPKcS2_S2_i L1121: movl %edi, (%esp) call __ZNK9index_map4sizeEv addl %eax, %eax movl 16(%ebp), %edx movl %edx, -52(%ebp) movl %eax, -48(%ebp) notl %eax shrl $31, %eax call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1208 L1123: cmpb $0, -121(%ebp) jne L1125 movl $153, 12(%esp) leal LC44-"L00000000035$pb"(%ebx), %esi movl %esi, 8(%esp) leal __ZZN5Eigen7MapBaseINS_3MapINS_6MatrixIdLin1ELi1ELi0ELin1ELi1EEELi0ENS_6StrideILi0ELi0EEEEELi0EEC4EPdiE19__PRETTY_FUNCTION__-"L00000000035$pb"(%ebx), %eax movl %eax, 4(%esp) leal LC56-"L00000000035$pb"(%ebx), %eax movl %eax, (%esp) call __ZN5Eigen8internal11assert_failEPKcS2_S2_i movl $174, 12(%esp) movl %esi, 8(%esp) leal __ZZNK5Eigen7MapBaseINS_3MapINS_6MatrixIdLin1ELi1ELi0ELin1ELi1EEELi0ENS_6StrideILi0ELi0EEEEELi0EE11checkSanityEvE19__PRETTY_FUNCTION__-"L00000000035$pb"(%ebx), %eax movl %eax, 4(%esp) leal LC45-"L00000000035$pb"(%ebx), %eax movl %eax, (%esp) call __ZN5Eigen8internal11assert_failEPKcS2_S2_i LEHE40: L1125: leal -80(%ebp), %eax movl %eax, -148(%ebp) movl -80(%ebp), %eax movl 8(%eax), %esi leal 0(,%esi,8), %eax movl %eax, (%esp) call _malloc testl %eax, %eax je L1209 movl %eax, -40(%ebp) movl %esi, -36(%ebp) movl -80(%ebp), %eax movl 8(%eax), %esi cmpb $0, -121(%ebp) je L1210 L1130: leal -40(%ebp), %edx movl %edx, -140(%ebp) cmpl -36(%ebp), %esi je L1132 movl -40(%ebp), %eax movl %eax, (%esp) call _free testl %esi, %esi jne L1211 movl $0, -40(%ebp) L1132: movl -140(%ebp), %ecx movl %esi, 4(%ecx) movl %esi, -96(%ebp) movl $0, -88(%ebp) movl $0, -84(%ebp) xorl %eax, %eax testl %esi, %esi setns %al call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1212 L1138: leal -96(%ebp), %edi movl -96(%ebp), %esi cmpb $0, -121(%ebp) je L1213 L1140: movl -140(%ebp), %eax cmpl 4(%eax), %esi je L1142 movl (%eax), %eax movl %eax, (%esp) call _free testl %esi, %esi jne L1214 movl -140(%ebp), %eax movl $0, (%eax) L1142: movl -140(%ebp), %edx movl %esi, 4(%edx) movl %esi, -188(%ebp) xorl %eax, %eax cmpl (%edi), %esi sete %al call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1215 L1148: movl -188(%ebp), %eax shrl $31, %eax addl -188(%ebp), %eax movl %eax, %ecx andl $-2, %ecx jle L1151 xorl %edx, %edx L1153: fldl 8(%edi) movl -140(%ebp), %esi movl (%esi), %eax fstl (%eax,%edx,8) fstpl 8(%eax,%edx,8) addl $2, %edx cmpl %edx, %ecx jg L1153 L1151: cmpl -188(%ebp), %ecx jge L1154 leal 0(,%ecx,8), %edx L1156: movl -140(%ebp), %esi movl (%esi), %eax fldl 8(%edi) fstpl (%eax,%edx) incl %ecx addl $8, %edx cmpl -188(%ebp), %ecx jne L1156 L1154: movl -148(%ebp), %edx movl (%edx), %eax movl 4(%eax), %eax testl %eax, %eax jle L1157 movl $1, -112(%ebp) movl $0, -108(%ebp) leal LC44-"L00000000035$pb"(%ebx), %ecx movl %ecx, -152(%ebp) leal __ZZN5Eigen7MapBaseINS_5BlockINS_6MatrixIdLin1ELi1ELi0ELin1ELi1EEELi1ELi1ELb0ELb1EEELi0EEC4EPdiiE19__PRETTY_FUNCTION__-"L00000000035$pb"(%ebx), %esi movl %esi, -156(%ebp) leal LC61-"L00000000035$pb"(%ebx), %eax movl %eax, -160(%ebp) leal __ZZNK5Eigen7MapBaseINS_5BlockINS_6MatrixIdLin1ELi1ELi0ELin1ELi1EEELi1ELi1ELb0ELb1EEELi0EE11checkSanityEvE19__PRETTY_FUNCTION__-"L00000000035$pb"(%ebx), %edx movl %edx, -164(%ebp) leal LC45-"L00000000035$pb"(%ebx), %ecx movl %ecx, -168(%ebp) leal LC62-"L00000000035$pb"(%ebx), %esi movl %esi, -172(%ebp) leal __ZZN5Eigen5BlockINS_6MatrixIdLin1ELi1ELi0ELin1ELi1EEELi1ELi1ELb0ELb1EEC4ERS2_iE19__PRETTY_FUNCTION__-"L00000000035$pb"(%ebx), %eax movl %eax, -176(%ebp) L1159: movl -148(%ebp), %edx movl 4(%edx), %eax movl (%eax), %eax movl -108(%ebp), %ecx movsd (%eax,%ecx,2), %xmm0 movsd %xmm0, -136(%ebp) cmpb $0, -121(%ebp) je L1216 L1160: movl -140(%ebp), %ecx xorl %eax, %eax cmpl $0, 4(%ecx) setg %al call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1217 L1163: movl -148(%ebp), %edx movl (%edx), %edx movl %edx, -188(%ebp) movl 16(%edx), %ecx movl %ecx, -116(%ebp) movl 20(%edx), %edi movl 12(%edx), %esi movl %esi, -220(%ebp) movl -108(%ebp), %eax movl (%eax,%esi), %edx movl 4(%eax,%esi), %ecx cmpl %ecx, %edx jge L1165 .align 4,0x90 L1194: movl (%edi,%edx,4), %eax sall $3, %eax movl -140(%ebp), %esi addl (%esi), %eax movsd -136(%ebp), %xmm0 movl -116(%ebp), %esi mulsd (%esi,%edx,8), %xmm0 addsd (%eax), %xmm0 movsd %xmm0, (%eax) incl %edx cmpl %ecx, %edx jne L1194 movl -148(%ebp), %eax movl (%eax), %eax movl %eax, -188(%ebp) L1165: movl -112(%ebp), %eax incl -112(%ebp) addl $4, -108(%ebp) movl -188(%ebp), %edx cmpl %eax, 4(%edx) jg L1159 L1157: leal -52(%ebp), %ecx movl %ecx, -144(%ebp) movl -48(%ebp), %eax cmpl -36(%ebp), %eax sete %al movzbl %al, %eax call __ZN5Eigen8internal12_GLOBAL__N_19copy_boolEb testb %al, %al je L1218 L1169: movl -144(%ebp), %esi movl 4(%esi), %esi movl %esi, -120(%ebp) movl -144(%ebp), %eax movl (%eax), %edi movl %esi, -28(%ebp) movl %esi, %ecx xorl %eax, %eax testl $7, %edi jne L1173 movl %edi, %eax shrl $3, %eax andl $1, %eax movl %eax, -32(%ebp) leal -28(%ebp), %edx leal -32(%ebp), %ecx cmpl %esi, %eax cmovle %ecx, %edx movl (%edx), %ecx movl %esi, %edx subl %ecx, %edx movl %edx, %eax shrl $31, %eax addl %edx, %eax andl $-2, %eax L1173: leal (%eax,%ecx), %ebx xorl %esi, %esi xorl %edx, %edx testl %ecx, %ecx jg L1179 jmp L1177 .align 4,0x90 L1193: movl -144(%ebp), %eax movl (%eax), %edi L1179: movl -140(%ebp), %eax movl (%eax), %eax movl %eax, -204(%ebp) fldl (%eax,%edx) fstpl (%edx,%edi) incl %esi addl $8, %edx cmpl %ecx, %esi jne L1193 L1177: cmpl %ebx, %ecx jge L1180 leal 0(,%ecx,8), %edx L1182: movl %edx, %eax movl -140(%ebp), %esi addl (%esi), %eax movupd (%eax), %xmm0 movl -144(%ebp), %esi movl (%esi), %eax movapd %xmm0, (%eax,%edx) addl $2, %ecx addl $16, %edx cmpl %ecx, %ebx jg L1182 L1180: cmpl %ebx, -120(%ebp) jle L1183 leal 0(,%ebx,8), %ecx L1185: movl -144(%ebp), %eax movl (%eax), %edx movl -140(%ebp), %esi movl (%esi), %eax fldl (%ecx,%eax) fstpl (%ecx,%edx) incl %ebx addl $8, %ecx cmpl -120(%ebp), %ebx jne L1185 L1183: movl -40(%ebp), %eax movl %eax, (%esp) call _free movl -72(%ebp), %eax movl %eax, (%esp) call _free #it ends here! Can anyone please tell me if this is vectorized? Thank you in advance! -Carl
ggael Moderator Posts 3447 Karma 19 OS	Re: Vectorization confirmation- Assembly Code Fri Jun 24, 2011 6:56 am Not all operations are vectorized. Typically sparse matrix ops are not because that's not possible in general. If your matrix has a special structure that would allow to form blocks of at least 2 consecutive double without introducing too many explicit zeros, then vectorization could theoretically be enabled, but such cases are pretty rare in practice.
cpg42 Registered Member Posts 4 Karma 0 OS	Re: Vectorization confirmation- Assembly Code Fri Jun 24, 2011 12:00 pm Thank you ggael, that is very helpful. My system actually is one of those situations (it contains either 2 by 2 or 3 by 3 blocks of non-zero elements). The way I have done the multiplication in the past is save these blocks as small 2 by 2 arrays and then do a bunch of (2by2)(2) multiplications. It sounds like this could be sped up by using Matrix2d's instead. It sounds, though, like this might not work when the blocks are 3 by 3. When you do a (3by3)(3) operations, does it vectorize two of the three doubledouble operations that go into each rowvector operation, leading to a possible 33% speedup? What if I use MatrixXd(3,3) instead of Matrix3d? Thanks again for you help. -Carl
ggael Moderator Posts 3447 Karma 19 OS	Re: Vectorization confirmation- Assembly Code Fri Jun 24, 2011 12:25 pm Yes using a SparseMatrix of Matrix2d and a vector of Vector2d for the rhs could work (never tried) and be fully vectorized. On the other hand, 3x3 matrices are not vectorized at all because of the unaligned loads overheads it would imply. A MatrixXd(3,3) would be vectorized but it would also be much slower.
cpg42 Registered Member Posts 4 Karma 0 OS	Re: Vectorization confirmation- Assembly Code Fri Jun 24, 2011 1:38 pm Ok, that makes sense. Thanks for the help.

Vectorization confirmation- Assembly Code

Page 1 of 1 (5 posts)

Vectorization confirmation- Assembly Code

Re: Vectorization confirmation- Assembly Code

Re: Vectorization confirmation- Assembly Code

Re: Vectorization confirmation- Assembly Code

Re: Vectorization confirmation- Assembly Code

Bookmarks

Who is online