I can't get SSE2 code in MSVC 2005, help! • KDE Community Forums

This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Board index

I can't get SSE2 code in MSVC 2005, help!

Page 2 of 2 (22 posts)

Previous • 1, 2

Tags:

bjacob Registered Member Posts 658 Karma 3	Re: I can't get SSE2 code in MSVC 2005, help! Thu May 27, 2010 5:34 pm It's pretty simple: with the development branch, when you say Aligned, you tell Eigen that it's safe to rely on the assumption that your array start pointer is a multiple of 16 bytes. Then it uses SSE instructions that rely on that assumption, that is in ei_pload, that's why you get the crash there. The cause of the crash is that you did something of the form Code: Select all `Map<MatrixXd, Aligned> my_map(some_non_aligned_ptr, x, y)` where some_non_aligned_ptr is not a multiple of 16 bytes. For example, on many platforms, malloc() and "new" return such pointers. When you need a pointer aligned to 16 bytes, you need to use Eigen's functions ei_aligned_malloc or ei_aligned_new. Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
martinakos Registered Member Posts 53 Karma 0 OS	Re: I can't get SSE2 code in MSVC 2005, help! Tue Jun 01, 2010 12:16 pm Hi bjacob, Thanks for your help again. I doubled-checked the memory allocation for the pointer I was mapping and I found one place I wasn't controlling, and that was producing the exception. Now I’m controlling all the allocations and making them aligned. So the code I referred to before is working again with the development branch, however, I think there is still something else wrong. The code is still slower than the hand written one. And when I look a the disassembly for the line eResults += eVector * eMatrix.transpose(); I see some instructions and a ei_unaligned_assign_impl which makes me think it’s not using the aligned version of the code??? I attach the disassembled code for the line: eResults += eVector * eMatrix.transpose(); 00837E80 lea eax,[esp+18h] 00837E84 push eax 00837E85 lea ecx,[esp+60h] 00837E89 lea edx,[esp+44h] 00837E8D push ecx 00837E8E lea ecx,[esp+38h] 00837E92 mov dword ptr [esp+20h],edx 00837E96 call Eigen::MatrixBase<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > >::operator<Eigen::Transpose<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > > > (839CC0h) 00837E9B lea edx,[esp+20h] 00837E9F mov dword ptr [esp+18h],edx 00837EA3 lea ecx,[esp+17h] 00837EA7 lea edx,[esp+50h] 00837EAB mov dword ptr [esp+1Ch],ecx 00837EAF push edx 00837EB0 mov ecx,eax 00837EB2 mov byte ptr [esp+110h],1 00837EBA call Eigen::DenseBase<Eigen::GeneralProduct<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> >,Eigen::Transpose<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > >,5> >::eval (839F80h) 00837EBF mov edi,dword ptr [esp+28h] 00837EC3 imul edi,dword ptr [esp+24h] 00837EC8 mov esi,eax 00837ECA mov eax,edi 00837ECC cdq 00837ECD sub eax,edx 00837ECF sar eax,1 00837ED1 add eax,eax 00837ED3 xor ecx,ecx 00837ED5 test eax,eax 00837ED7 jle ImageMetrics::DoubleMatrix::AddVecMatTransposeProduct+1A2h (837F02h) 00837ED9 lea esp,[esp] 00837EE0 mov edx,dword ptr [esi] 00837EE2 movapd xmm1,xmmword ptr [edx+ecx8] 00837EE7 mov edx,dword ptr [esp+20h] 00837EEB movapd xmm0,xmmword ptr [edx+ecx8] 00837EF0 lea edx,[edx+ecx8] 00837EF3 add ecx,2 00837EF6 cmp ecx,eax 00837EF8 addpd xmm0,xmm1 00837EFC movapd xmmword ptr [edx],xmm0 00837F00 jl ImageMetrics::DoubleMatrix::AddVecMatTransposeProduct+180h (837EE0h) 00837F02 push edi 00837F03 push eax 00837F04 lea eax,[esp+20h] 00837F08 push eax 00837F09 push esi 00837F0A call Eigen::ei_unaligned_assign_impl<0>::run<Eigen::Matrix<double,33331,33331,0,33331,33331>,Eigen::SelfCwiseBinaryOp<Eigen::ei_scalar_sum_op<double>,Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > > > (839A30h) 00837F0F mov ecx,dword ptr [esp+60h] 00837F13 mov esi,dword ptr [__imp___aligned_free (9E86CCh)] 00837F19 push ecx 00837F1A call esi 00837F1C mov edx,dword ptr [esp+78h] 00837F20 push edx 00837F21 call esi 00837F23 add esp,18h Call to operator = template<typename Derived> template<typename OtherDerived> inline const typename ProductReturnType<Derived,OtherDerived>::Type MatrixBase<Derived>::operator(const MatrixBase<OtherDerived> &other) const { 00839CC0 push ecx // A note regarding the function declaration: In MSVC, this function will sometimes // not be inlined since ei_matrix_storage is an unwindable object for dynamic // matrices and product types are holding a member to store the result. // Thus it does not help tagging this function with EIGEN_STRONG_INLINE. enum { ProductIsValid = Derived::ColsAtCompileTime==Dynamic \|\| OtherDerived::RowsAtCompileTime==Dynamic \|\| int(Derived::ColsAtCompileTime)==int(OtherDerived::RowsAtCompileTime), AreVectors = Derived::IsVectorAtCompileTime && OtherDerived::IsVectorAtCompileTime, SameSizes = EIGEN_PREDICATE_SAME_MATRIX_SIZE(Derived,OtherDerived) }; // note to the lost user: // * for a dot product use: v1.dot(v2) // * for a coeff-wise product use: v1.cwiseProduct(v2) EIGEN_STATIC_ASSERT(ProductIsValid \|\| !(AreVectors && SameSizes), INVALID_VECTOR_VECTOR_PRODUCT__IF_YOU_WANTED_A_DOT_OR_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTIONS) EIGEN_STATIC_ASSERT(ProductIsValid \|\| !(SameSizes && !AreVectors), INVALID_MATRIX_PRODUCT__IF_YOU_WANTED_A_COEFF_WISE_PRODUCT_YOU_MUST_USE_THE_EXPLICIT_FUNCTION) EIGEN_STATIC_ASSERT(ProductIsValid \|\| SameSizes, INVALID_MATRIX_PRODUCT) return typename ProductReturnType<Derived,OtherDerived>::Type(derived(), other.derived()); 00839CC1 mov eax,dword ptr [esp+8] 00839CC5 mov dword ptr [eax],ecx 00839CC7 mov ecx,dword ptr [esp+0Ch] 00839CCB mov ecx,dword ptr [ecx] 00839CCD xor edx,edx 00839CCF mov dword ptr [eax+4],ecx 00839CD2 mov dword ptr [esp],edx 00839CD5 mov dword ptr [eax+8],edx 00839CD8 mov dword ptr [eax+0Ch],edx 00839CDB mov dword ptr [eax+10h],edx } 00839CDE pop ecx 00839CDF ret 8 Call to product / eval /** \returns the matrix or vector obtained by evaluating this expression. * * Notice that in the case of a plain matrix or vector (not an expression) this function just returns * a const reference, in order to avoid a useless copy. */ EIGEN_STRONG_INLINE const typename ei_eval<Derived>::type eval() const { 00839F80 push 0FFFFFFFFh 00839F82 push offset __ehhandler$?eval@?$DenseBase@V?$GeneralProduct@V?$Map@V?$Matrix@N$0ICDD@$0ICDD@$0A@$0ICDD@$0ICDD@@Eigen@@$00V?$Stride@$0A@$0A@@2@@Eigen@@V?$Transpose@V?$Map@V?$Matrix@N$0ICDD@$0ICDD@$0A@$0ICDD@$0ICDD@@Eigen@@$00V?$Stride@$0A@$0A@@2@@Eigen@@@2@$04@Eigen@@@Eigen@@QBE?BV?$Matrix@N$0ICDD@$0ICDD@$0A@$0ICDD@$0ICDD@@2@XZ (9DD278h) 00839F87 mov eax,dword ptr fs:[00000000h] 00839F8D push eax 00839F8E mov dword ptr fs:[0],esp 00839F95 push ecx 00839F96 push esi 00839F97 mov esi,ecx // Even though MSVC does not honor strong inlining when the return type // is a dynamic matrix, we desperately need strong inlining for fixed // size types on MSVC. return typename ei_eval<Derived>::type(derived()); 00839F99 mov eax,dword ptr [esi+4] 00839F9C mov eax,dword ptr [eax+4] 00839F9F mov ecx,dword ptr [esi] 00839FA1 mov ecx,dword ptr [ecx+4] 00839FA4 push edi 00839FA5 mov edi,dword ptr [esp+1Ch] 00839FA9 push eax 00839FAA imul eax,ecx 00839FAD push ecx 00839FAE push eax 00839FAF mov ecx,edi 00839FB1 mov dword ptr [esp+14h],0 00839FB9 call Eigen::ei_matrix_storage<double,33331,33331,33331,0>::ei_matrix_storage<double,33331,33331,33331,0> (838720h) 00839FBE mov edx,dword ptr [esi+4] 00839FC1 mov eax,dword ptr [edx+4] 00839FC4 mov ecx,dword ptr [esi] 00839FC6 mov ecx,dword ptr [ecx+4] 00839FC9 push eax 00839FCA imul eax,ecx 00839FCD push ecx 00839FCE push eax 00839FCF mov ecx,edi 00839FD1 mov dword ptr [esp+20h],0 00839FD9 call Eigen::ei_matrix_storage<double,33331,33331,33331,1>::resize (8386B0h) 00839FDE mov edx,dword ptr [esi+4] 00839FE1 mov eax,dword ptr [edx+4] 00839FE4 mov ecx,dword ptr [esi] 00839FE6 mov ecx,dword ptr [ecx+4] 00839FE9 push eax 00839FEA imul eax,ecx 00839FED push ecx 00839FEE push eax 00839FEF mov ecx,edi 00839FF1 call Eigen::ei_matrix_storage<double,33331,33331,33331,1>::resize (8386B0h) 00839FF6 push edi 00839FF7 mov ecx,esi 00839FF9 call Eigen::ProductBase<Eigen::GeneralProduct<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> >,Eigen::Transpose<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > >,5>,Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> >,Eigen::Transpose<Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > > >::evalTo<Eigen::Matrix<double,33331,33331,0,33331,33331> > (839ED0h) } 00839FFE mov ecx,dword ptr [esp+0Ch] 0083A002 mov eax,edi 0083A004 pop edi 0083A005 pop esi 0083A006 mov dword ptr fs:[0],ecx 0083A00D add esp,10h 0083A010 ret 4 Call to template <> struct ei_unaligned_assign_impl<false> { // MSVC must not inline this functions. If it does, it fails to optimize the // packet access path. #ifdef _MSC_VER template <typename Derived, typename OtherDerived> static EIGEN_DONT_INLINE void run(const Derived& src, OtherDerived& dst, int start, int end) #else template <typename Derived, typename OtherDerived> static EIGEN_STRONG_INLINE void run(const Derived& src, OtherDerived& dst, int start, int end) #endif { for (int index = start; index < end; ++index) 00839A30 mov edx,dword ptr [esp+0Ch] 00839A34 push edi 00839A35 mov edi,dword ptr [esp+14h] 00839A39 cmp edx,edi 00839A3B jge Eigen::ei_unaligned_assign_impl<0>::run<Eigen::Matrix<double,33331,33331,0,33331,33331>,Eigen::SelfCwiseBinaryOp<Eigen::ei_scalar_sum_op<double>,Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > > >+0C7h (839AF7h) 00839A41 mov eax,edi 00839A43 push ebp 00839A44 mov ebp,dword ptr [esp+10h] 00839A48 sub eax,edx 00839A4A cmp eax,4 00839A4D push esi 00839A4E jl Eigen::ei_unaligned_assign_impl<0>::run<Eigen::Matrix<double,33331,33331,0,33331,33331>,Eigen::SelfCwiseBinaryOp<Eigen::ei_scalar_sum_op<double>,Eigen::Map<Eigen::Matrix<double,33331,33331,0,33331,33331>,1,Eigen::Stride<0,0> > > >+95h (839AC5h) 00839A50 mov eax,dword ptr [esp+10h] 00839A54 mov ecx,dword ptr [ebp] 00839A57 mov ecx,dword ptr [ecx] 00839A59 push ebx 00839A5A mov ebx,dword ptr [eax] };
bjacob Registered Member Posts 658 Karma 3	Re: I can't get SSE2 code in MSVC 2005, help! Tue Jun 01, 2010 12:42 pm First of all, don't worry about ei_unaligned_assign_impl, it is just taking care of the small parts of your matrix that can't be addressed by 16 byte packets (the beginning and end of a row/column...). The most probable cause of bad performance is MSVC failing to correctly inline a function that must crucially be inlined. The reason why you'd be the first one to report this particular issue is that it's not too common to do row_vector * transpose_of_matrix, so can you jus try replacing: Code: Select all `eResults += eVector * eMatrix.transpose();` by Code: Select all `eResults.transpose() += eMatrix * eVector.transpose();` and see if that makes any difference. If that's still slow, then the best you can do is file a bug report on our issue tracker with a self-contained compilable test program, exhibiting poor performance with MSVC 2008. Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
martinakos Registered Member Posts 53 Karma 0 OS	Re: I can't get SSE2 code in MSVC 2005, help! Tue Jun 01, 2010 3:21 pm Hi bjacob, I have tried the alternative syntax. I don’t get any performance improvement. I’ll make an entry on the issue tracker. Thanks for your help. Martin.
ggael Moderator Posts 3447 Karma 19 OS	Re: I can't get SSE2 code in MSVC 2005, help! Tue Jun 01, 2010 9:08 pm for the record there were two issues: 1 - you really should use Map<RowVectorXd> for the vectors otherwise Eigen uses the general matrixmatrix product which is not very well suited for matrixvectors... 2 - there was a bug in eigen preventing Map<> object to be fully optimized. if you update your local copy and do the change 1) then the eigen version should really be significantly faster because of SSE and better cache use. Also, here adding Aligned is not really useful, but adding .noalias() avoid one useless memory alloc/copy.
bjacob Registered Member Posts 658 Karma 3	Re: I can't get SSE2 code in MSVC 2005, help! Tue Jun 01, 2010 9:49 pm 1 - you really should use Map<RowVectorXd> for the vectors otherwise Eigen uses the general matrixmatrix product which is not very well suited for matrixvectors... Wow, this is huge, how could I let that pass!! Great job for 2) too. martinakos, what gael means with .noalias() is eResults.noalias() = otherstuff; Join us on Eigen's IRC channel: #eigen on irc.freenode.net Have a serious interest in Eigen? Then join the mailing list!
martinakos Registered Member Posts 53 Karma 0 OS	Re: I can't get SSE2 code in MSVC 2005, help! Wed Jun 02, 2010 11:15 am Now it's working as expected!!! Using doubles is twice as fast as the non-SSE hand written code! I have tried using Aligned and without it and the speed is more or less the same, so I imagine that for this size of matrixes/vectors I can get good performance without needing to aligning the memory. Thanks very much ggael and bjacob for your help. Martinakos

Page 2 of 2 (22 posts)

Previous • 1, 2

Bookmarks

Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot]