NaturalOrdering takes a third of total inversion time

Board index

Page 1 of 1 (7 posts)

Tags:

haavind Registered Member Posts 13 Karma 0 OS	NaturalOrdering takes a third of total inversion time Fri Dec 05, 2014 4:03 pm I have a sparse ldlt solver of type: Eigen::SimplicialLDLT<Eigen::SparseMatrix<double,Eigen::ColMajor>, Eigen::Upper, Eigen::NaturalOrdering<int>> My matrix is on nearly optimal form already, so reordering it is just a waste of time. Yet when profiling my code, the 'ordering' step takes 32% of total 'compute' time. NaturalOrdering is supposed to be just a no-op, right? There are two steps that consumes time inside of 'ordering'; internal::permute_symm_to_symm and internal::permute_symm_to_fullsymm. Is this as intended?
ggael Moderator Posts 3447 Karma 19 OS	Re: NaturalOrdering takes a third of total inversion time Fri Dec 05, 2014 5:48 pm Indeed, with NaturalOrdering the current code is doing many useless operations. Here is a patch which should significantly speed-up computations. Works best in your input is column major and that the upper triangular part is stored: Code: Select all diff --git a/Eigen/src/SparseCholesky/SimplicialCholesky.h b/Eigen/src/SparseCholesky/SimplicialCholesky.h --- a/Eigen/src/SparseCholesky/SimplicialCholesky.h +++ b/Eigen/src/SparseCholesky/SimplicialCholesky.h @@ -602,29 +602,42 @@ public: bool m_LDLT; }; template<typename Derived> void SimplicialCholeskyBase<Derived>::ordering(const MatrixType& a, CholMatrixType& ap) { eigen_assert(a.rows()==a.cols()); const Index size = a.rows(); - // Note that amd compute the inverse permutation + // Note that ordering methods compute the inverse permutation + if(!internal::is_same<OrderingType,NaturalOrdering<Index> >::value) { - CholMatrixType C; - C = a.template selfadjointView<UpLo>(); + { + CholMatrixType C; + C = a.template selfadjointView<UpLo>(); + + OrderingType ordering; + ordering(C,m_Pinv); + } + + if(m_Pinv.size()>0) m_P = m_Pinv.inverse(); + else m_P.resize(0); - OrderingType ordering; - ordering(C,m_Pinv); + ap.resize(size,size); + ap.template selfadjointView<Upper>() = a.template selfadjointView<UpLo>().twistedBy(m_P); } - - if(m_Pinv.size()>0) - m_P = m_Pinv.inverse(); else + { + m_Pinv.resize(0); m_P.resize(0); - - ap.resize(size,size); - ap.template selfadjointView<Upper>() = a.template selfadjointView<UpLo>().twistedBy(m_P); + if(UpLo==Lower) + { + ap.resize(size,size); + ap.template selfadjointView<Upper>() = a.template selfadjointView<UpLo>().twistedBy(m_P); + } + else + ap = a.template triangularView<Upper>(); + } } } // end namespace Eigen #endif // EIGEN_SIMPLICIAL_CHOLESKY_H In theory we could even avoid a copy in some ideal cases but I doubt that this copy is significant. Please tell us how well does this patch perform for you.
haavind Registered Member Posts 13 Karma 0 OS	Re: NaturalOrdering takes a third of total inversion time Mon Dec 08, 2014 1:18 pm I tried to implement the patch, and there was a notable speedup. The copying was not entirely insignificant though, so it is worthwhile to remove that step as well. My intuition agrees with yours, that the copying shouldn't be an issue, but it is possible that this is a borderline case since the matrix is roughly on the size where an unnecessary copy could cause data to drop out of L2 cache. The strange thing is that factorize_preordered had a ~25% reduction in run-time with this fix. I'm not sure how to interpret this result. The one thing shouldn't really affect the other.
ggael Moderator Posts 3447 Karma 19 OS	Re: NaturalOrdering takes a third of total inversion time Mon Dec 08, 2014 1:58 pm Profiling is often very very tricky. Could you export (see http://eigen.tuxfamily.org/dox-devel/gr ... tml#title3), zip, and send me your matrix so that I can try to reproduce the issue. Thank you.
ggael Moderator Posts 3447 Karma 19 OS	Re: NaturalOrdering takes a third of total inversion time Mon Dec 08, 2014 2:03 pm For the record, I applied the patch: https://bitbucket.org/eigen/eigen/commits/f9b6d0d26703/
haavind Registered Member Posts 13 Karma 0 OS	Re: NaturalOrdering takes a third of total inversion time Mon Dec 08, 2014 3:05 pm Test matrix sent by email. I did another test where i skipped the ordering and copying entirely. My results, as obtained by the VerySleepy profiler are: Original code: - compute = 3.9 s -- factorize = 2.2 s -- ordering = 1.3 s -- analyze = 0.43 s fix: - compute = 2.6 s -- factorize = 1.6 s -- ordering = 0.5 s -- analyze = 0.54 s fix v2 w/o copy: - compute = 1.9 s -- factorize = 1.4 s -- ordering = NA -- analyze = 0.56 s Note that this is for several iterations with a profiler running in the background, not the time for a single inversion. They can only be used for relative comparison. I did some wall-time measurement as well, but it is hard to be very quantitative about them since the LDLT-inversion is a fairly small part of the entire runtime and only becomes important because it is a bottleneck in a highly parallel section. That said, the wall time agrees roughly with the profiler.
ggael Moderator Posts 3447 Karma 19 OS	Re: NaturalOrdering takes a third of total inversion time Mon Dec 08, 2014 4:57 pm Yes, I also observed a non negligible overhead of the deep copy. Fixed in devel branch: https://bitbucket.org/eigen/eigen/commits/dafda2a707af/