transposeInPlace() not vectorized for 4x4 or 8x8 Matrices ?

Board index

Page 1 of 1 (5 posts)

Tags:

lplagne Registered Member Posts 11 Karma 0	transposeInPlace() not vectorized for 4x4 or 8x8 Matrices ? Tue Jan 20, 2015 12:12 pm Hi, I wonder why the following method does not rely on vectorized kernel like __MM_TRANSPOSE4_PS Eigen::Matrix<float,4,4> mat; ... mat.transposeInPlace(); I did handcode the method using intrinsics but it is a bit frustrating since Eigen allowed me to clean away all the other assembly hacks from my code Thank you for your help, Laurent
ggael Moderator Posts 3447 Karma 19 OS	Re: transposeInPlace() not vectorized for 4x4 or 8x8 Matrice Thu Jan 22, 2015 9:54 pm Sure, we should definitely support that. Internally, we even have ptranspose(...) intrinsics for all architectures and supported packet types... So that should be very easy!
lplagne Registered Member Posts 11 Karma 0	Re: transposeInPlace() not vectorized for 4x4 or 8x8 Matrice Sat Jan 24, 2015 10:41 am Cool ! So I Just have to wait now
ggael Moderator Posts 3447 Karma 19 OS	Re: transposeInPlace() not vectorized for 4x4 or 8x8 Matrice Mon Jan 26, 2015 4:11 pm Here you go: https://bitbucket.org/eigen/eigen/commits/32a8021225d5/ Changeset: 32a8021225d5 User: ggael Date: 2015-01-26 16:09:01+00:00 Summary: Enable vectorization of transposeInPlace for PacketSize x PacketSize matrices As an exemple, the generated code for: Code: Select all `Matrix4f m; m *= 2.34; m.transposeInPlace(); m = m+m;` is now: Code: Select all movaps LCPI0_0(%rip), %xmm0 movaps (%rdi), %xmm1 mulps %xmm0, %xmm1 movaps 16(%rdi), %xmm2 mulps %xmm0, %xmm2 movaps 32(%rdi), %xmm3 mulps %xmm0, %xmm3 mulps 48(%rdi), %xmm0 movaps %xmm1, %xmm4 unpcklps %xmm2, %xmm4 ## xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1] movaps %xmm3, %xmm5 unpcklps %xmm0, %xmm5 ## xmm5 = xmm5[0],xmm0[0],xmm5[1],xmm0[1] unpckhps %xmm2, %xmm1 ## xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3] unpckhps %xmm0, %xmm3 ## xmm3 = xmm3[2],xmm0[2],xmm3[3],xmm0[3] movaps %xmm4, %xmm0 movlhps %xmm5, %xmm0 ## xmm0 = xmm0[0],xmm5[0] movhlps %xmm4, %xmm5 ## xmm5 = xmm4[1],xmm5[1] movaps %xmm1, %xmm2 movlhps %xmm3, %xmm2 ## xmm2 = xmm2[0],xmm3[0] movhlps %xmm1, %xmm3 ## xmm3 = xmm1[1],xmm3[1] addps %xmm0, %xmm0 movaps %xmm0, (%rdi) addps %xmm5, %xmm5 movaps %xmm5, 16(%rdi) addps %xmm2, %xmm2 movaps %xmm2, 32(%rdi) addps %xmm3, %xmm3 movaps %xmm3, 48(%rdi)
lplagne Registered Member Posts 11 Karma 0	Re: transposeInPlace() not vectorized for 4x4 or 8x8 Matrice Wed Jan 28, 2015 9:01 am Wow... that is reactive ! Thank you very very much. Laurent