This forum has been archived. All content is frozen. Please use KDE Discuss instead.

transposeInPlace() not vectorized for 4x4 or 8x8 Matrices ?

Tags: None
(comma "," separated)
lplagne
Registered Member
Posts
11
Karma
0
Hi,

I wonder why the following method does not rely on vectorized kernel like __MM_TRANSPOSE4_PS

Eigen::Matrix<float,4,4> mat;
...
mat.transposeInPlace();

I did handcode the method using intrinsics but it is a bit frustrating since Eigen allowed me to clean away all the other assembly hacks from my code :'(

Thank you for your help,

Laurent
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Sure, we should definitely support that. Internally, we even have ptranspose(...) intrinsics for all architectures and supported packet types... So that should be very easy!
lplagne
Registered Member
Posts
11
Karma
0
Cool ! So I Just have to wait now ;)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Here you go:
https://bitbucket.org/eigen/eigen/commits/32a8021225d5/
Changeset: 32a8021225d5
User: ggael
Date: 2015-01-26 16:09:01+00:00
Summary: Enable vectorization of transposeInPlace for PacketSize x PacketSize matrices

As an exemple, the generated code for:
Code: Select all
  Matrix4f m;
  m *= 2.34;
  m.transposeInPlace();
  m = m+m;

is now:
Code: Select all
   movaps   LCPI0_0(%rip), %xmm0
   movaps   (%rdi), %xmm1
   mulps   %xmm0, %xmm1
   movaps   16(%rdi), %xmm2
   mulps   %xmm0, %xmm2
   movaps   32(%rdi), %xmm3
   mulps   %xmm0, %xmm3
   mulps   48(%rdi), %xmm0
   movaps   %xmm1, %xmm4
   unpcklps   %xmm2, %xmm4    ## xmm4 = xmm4[0],xmm2[0],xmm4[1],xmm2[1]
   movaps   %xmm3, %xmm5
   unpcklps   %xmm0, %xmm5    ## xmm5 = xmm5[0],xmm0[0],xmm5[1],xmm0[1]
   unpckhps   %xmm2, %xmm1    ## xmm1 = xmm1[2],xmm2[2],xmm1[3],xmm2[3]
   unpckhps   %xmm0, %xmm3    ## xmm3 = xmm3[2],xmm0[2],xmm3[3],xmm0[3]
   movaps   %xmm4, %xmm0
   movlhps   %xmm5, %xmm0            ## xmm0 = xmm0[0],xmm5[0]
   movhlps   %xmm4, %xmm5            ## xmm5 = xmm4[1],xmm5[1]
   movaps   %xmm1, %xmm2
   movlhps   %xmm3, %xmm2            ## xmm2 = xmm2[0],xmm3[0]
   movhlps   %xmm1, %xmm3            ## xmm3 = xmm1[1],xmm3[1]
   addps   %xmm0, %xmm0
   movaps   %xmm0, (%rdi)
   addps   %xmm5, %xmm5
   movaps   %xmm5, 16(%rdi)
   addps   %xmm2, %xmm2
   movaps   %xmm2, 32(%rdi)
   addps   %xmm3, %xmm3
   movaps   %xmm3, 48(%rdi)
lplagne
Registered Member
Posts
11
Karma
0
Wow... that is reactive !

Thank you very very much.

Laurent


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]