This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Segfault in Cholesky factorization

Tags: None
(comma "," separated)
TaMo
Registered Member
Posts
2
Karma
0

Segfault in Cholesky factorization

Sun Oct 04, 2009 11:01 am
Hi,

I am getting weird segfaults using the cholesky factorization. For example, even in this simple program:

Code: Select all

#define SIZE_P 230

int main(){
 
  MatrixXd HP = MatrixXd::Random(9,SIZE_P);

  Matrix<double,9,9> S = HP*HP.transpose()+30*Matrix<double,9,9>::Identity();
 
  Matrix<double,9,1> r = Matrix<double,9,1>::Ones();
     
  S.llt().solveInPlace(HP);


}



I get a segfault. gdb said the following:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb7d066c0 (LWP 19357)]
Eigen::ei_solve_triangular_selector<Eigen::Flagged<Eigen::Transpose<Eigen::NestByValue<Eigen::Matrix<double, 9, 9, 2, 9, 9> > >, 1024u, 0u>, Eigen::Matrix<double, 10000, 10000, 2, 10000, 10000>, 1024, 1>::run (lhs=@0xbfbd8598, other=@0xbfbd8b90)
at /usr/lib/gcc/i486-linux-gnu/4.3.2/include/emmintrin.h:150
150 *(__m128d *)__P = __A;
(gdb) up
#1 0x0804e805 in Eigen::LLT<Eigen::Matrix<double, 9, 9, 2, 9, 9> >::solveInPlace<Eigen::Matrix<double, 10000, 10000, 2, 10000, 10000> > (
this=0xbfbd8650, bAndX=@0xbfbd8b90) at ../../eigen2/Eigen/src/Core/SolveTriangular.h:48
48 ei_solve_triangular_selector<Flagged<Lhs,LhsMode,0>,Rhs>::run(lhs._expression(), other);
(gdb) up
#2 0x08049512 in main () at main2.cpp:46
46 S.llt().solveInPlace(HP);


I am compiling using g++ -O3 -msse3.

Am I doing something wrong?

Any help would be greatly appreciated.
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Thanks for the report. After trying, I found that this crash exists only in the development version, not in 2.0. So my obvious advice would be to stick to the stable version if this is a major concern to you.

I found that the crash happens whenever SSE (2+) is enabled, and that optimization isn't playing a role here.

So I compiled with just -msse2 -g3 and no optimization, and got this backtrace:
Code: Select all
#0  0x0804e505 in _mm_load_pd (__P=0xbfffede4) at /usr/lib/gcc/i586-suse-linux/4.4/include/emmintrin.h:106
#1  Eigen::ei_pload<double> (__P=0xbfffede4) at eigen2/Eigen/src/Core/arch/SSE/PacketMath.h:176
#2  0x0804b91c in Eigen::ei_cache_friendly_product_colmajor_times_vector<false, false, double, Eigen::ei_gemv_selector<2, 0, true>::run(const ProductType&, Dest&, typename ProductType::Scalar) [with ProductType = Eigen::GeneralProduct<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> >, 3>, Dest = Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 1, 1, 32>]::ActualRhsType>(int, const double *, int, const Eigen::ei_gemv_selector<2, 0, true>::ActualRhsType &, double *, double) (size=3, lhs=0xbfffed54, lhsStride=9, rhs=..., res=0xbfffe6e0, alpha=-1)
    at eigen2/Eigen/src/Core/products/GeneralMatrixVector.h:147
#3  0x080591de in Eigen::ei_gemv_selector<2, 0, true>::run<Eigen::GeneralProduct<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> >, 3>, Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 1, 1, 32> > (prod=..., dest=..., alpha=-1)
    at eigen2/Eigen/src/Core/Product.h:331
#4  0x08058309 in Eigen::GeneralProduct<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> >, 3>::scaleAndAddTo<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 1, 1, 32> > (this=0xbfffe870, dst=..., alpha=-1)
    at eigen2/Eigen/src/Core/Product.h:279
#5  0x08057133 in Eigen::ProductBase<Eigen::GeneralProduct<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> >, 3>, Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> > >::scaleAndAddTo<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 1, 1, 32> > (this=0xbfffe870, dst=..., alpha=-1)
    at eigen2/Eigen/src/Core/ProductBase.h:112
#6  0x08055898 in Eigen::ProductBase<Eigen::GeneralProduct<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> >, 3>, Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> > >::subTo<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 1, 1, 32> > (this=0xbfffe870, dst=...) at eigen2/Eigen/src/Core/ProductBase.h:109
#7  0x08054707 in Eigen::NoAlias<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 1, 1, 32> >::operator-=<Eigen::GeneralProduct<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> >, 3>, Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 33331, 33331, 1, 32>, Eigen::Transpose<Eigen::Block<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 1, 33331, 1, 32> > > (this=0xbfffe87c, other=...) at eigen2/Eigen/src/Core/NoAlias.h:71
#8  0x08051e79 in Eigen::ei_llt_inplace<2048>::unblocked<Eigen::Matrix<double, 9, 9, 0, 9, 9> > (mat=...) at eigen2/Eigen/src/Cholesky/LLT.h:144
#9  0x08050853 in Eigen::ei_llt_inplace<2048>::blocked<Eigen::Matrix<double, 9, 9, 0, 9, 9> > (m=...) at eigen2/Eigen/src/Cholesky/LLT.h:156
#10 0x08050099 in Eigen::LLT_Traits<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 2048>::inplace_decomposition (m=...) at eigen2/Eigen/src/Cholesky/LLT.h:202
#11 0x0804f90b in Eigen::LLT<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 2048>::compute (this=0xbfffed24, a=...) at eigen2/Eigen/src/Cholesky/LLT.h:228
#12 0x0804f18f in Eigen::LLT<Eigen::Matrix<double, 9, 9, 0, 9, 9>, 2048>::LLT (this=0xbfffed24, matrix=...) at eigen2/Eigen/src/Cholesky/LLT.h:85
#13 0x0804eae2 in Eigen::MatrixBase<Eigen::Matrix<double, 9, 9, 0, 9, 9> >::llt (this=0xbfffe9f4) at eigen2/Eigen/src/Cholesky/LLT.h:285
#14 0x0804b06e in main () at a.cpp:16


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
So the line that breaks is
Code: Select all
              _EIGEN_ACCUMULATE_PACKETS(d,du,d);

at line 147 in GMV.h, EvenAligned case.

I added couts to get the addresses of the packets:
Code: Select all
0xbfdd97c4 0xbfdd989c 0xbfdd9854 0xbfdd980c


So the problem is that the assumption, that the first is aligned, is broken.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Gael: leaving this one to you, as it seems that you have more cases to handle here? The first is not necessarily aligned but at the same time,there are at most 4x more cases to check...

TaMo: for now, the quickest is to disable vectorization, it'll fix your issue.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Ah! No, I know, sorry, the issue is elsewhere.

See my pointers starting in 0xbf ...... ?

That's _stack_ pointers. What is probably happening is that here, the size 9 is known at compile time, so the array is a static array, and since 9*9*sizeof(double) isn't a multiple of 16, the array is not aligned.

Great, we know that we don't vectorize fixed-size ops that are not fully vectorizable, but then why are SSE instructions used at all here ? That's the bug.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
This is confirmed:

TaMo: just replacing all matrix types by MatrixXd, works around the bug for now (the arrays are then aligned).


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
ok one last note:

the other approach would be to get rid of the assumptions "first aligned" as anyway that doesn't apply when taking blocks...

actually, lately i've been wondering. Perhaps it always was stupid to align dynamic arrays ??? We have to treat the last few remaining scalars anyway, so why not treat the first few? (Which we already do whenever we want something that applies to blocks)


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
TaMo
Registered Member
Posts
2
Karma
0
Wow, that was quick. Thanks for the reply! I will try that and let you know if there are problems still.

In a related question: is there a way to pre-allocate the temporaries of cholesky, so that they are not re-allocated every time? For the particular example above with a 9x9 matrix it may not be too bad, but in other parts of my program I need to take many LLT's of large matrices (e.g. 200x200). Alternatively, is it possible to compute the LLT in place (i.e., to overwrite the matrix being decomposed)?

In fact, I am in the process of switching my code from MKL to eigen. In MKL you can pre-allocate all the temporaries, and it will also compute the decomposition destructively. I would like to be able to get as close as possible to that performance.

That said, I would also like to congratulate and thank you for this wonderful effort. Had this library existed a couple of years ago, I would have saved many many hours of programming.....

Thanks!!!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
TaMo wrote:In a related question: is there a way to pre-allocate the temporaries of cholesky, so that they are not re-allocated every time? For the particular example above with a 9x9 matrix it may not be too bad, but in other parts of my program I need to take many LLT's of large matrices (e.g. 200x200). Alternatively, is it possible to compute the LLT in place (i.e., to overwrite the matrix being decomposed)?


For in-place decompositions, we don't yet have that API although that's planned (we even discussed exactly that a few days ago).

In fact, I am in the process of switching my code from MKL to eigen. In MKL you can pre-allocate all the temporaries, and it will also compute the decomposition destructively. I would like to be able to get as close as possible to that performance.


Yes, I understand. Not sure where we stand, performance wise, vs MKL, at the moment. If this is performance critical for you, I don't want to induce you into migrating too early. Do benchmarking before.

That said, I would also like to congratulate and thank you for this wonderful effort. Had this library existed a couple of years ago, I would have saved many many hours of programming.....

Thanks! we appreciate kind words.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
bjacob wrote:
TaMo wrote:In a related question: is there a way to pre-allocate the temporaries of cholesky, so that they are not re-allocated every time? For the particular example above with a 9x9 matrix it may not be too bad, but in other parts of my program I need to take many LLT's of large matrices (e.g. 200x200). Alternatively, is it possible to compute the LLT in place (i.e., to overwrite the matrix being decomposed)?


For in-place decompositions, we don't yet have that API although that's planned (we even discussed exactly that a few days ago).



Well, at least you can declare a LLT object once, and use it several time:

LLT<MatrixXd> llt;

for() {
llt.compute(mymat);
llt.solveInPlace(x);
}


In fact, I am in the process of switching my code from MKL to eigen. In MKL you can pre-allocate all the temporaries, and it will also compute the decomposition destructively. I would like to be able to get as close as possible to that performance.


Yes, I understand. Not sure where we stand, performance wise, vs MKL, at the moment. If this is performance critical for you, I don't want to induce you into migrating too early. Do benchmarking before.



In the devel branch we have a blocked LLT decomposition and fast triangular solvers, so for large matrices the overall performance of Eigen's LLT should be very close to the one of MKL. The main difference is that in Eigen we don't generate multithreaded code yet... For small matrices, Eigen outperforms MKL.

I'm looking at this issue right now...
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
ok, the problem is that on a 32 bits linux system, objects lying on the stack are only aligned on a 4 bytes boundary while I assumed doubles were are least aligned on a 8 bytes boundary. Performance wise it is always a good idea to align doubles on 8 bytes, so the solutions are:

- on the user side: use this gcc's flag: -malign-double

- on the Eigen side: enforce the alignment of objects of doubles (or complex of double) on 8 bytes.
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Performance wise it is always a good idea to align doubles on 8 bytes, so the solutions are:

- on the user side: use this gcc's flag: -malign-double

- on the Eigen side: enforce the alignment of objects of doubles (or complex of double) on 8 bytes.


I'm OK to enforce 8byte alignment of doubles, since this doesn't lead to additional memory consumption. We just need to document that well, e.g. for Map...

Then we are going to run into one of the issues with the UnalignedArrayAssert, namely: when passing-by-value. But precisely that's OK because that only matters when vectorization is on, and then that's already covered by the UnalignedArrayAssert.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
User avatar
bjacob
Registered Member
Posts
658
Karma
3
ok, the changes are done.

TaMo: seems to be fixed now.

Gael: i align based only on total size, so e.g. Vector2f gets aligned. Do you prefer to only align to 8byte if sizeof(T) itself is multiple of 8? No big important reason for my choice, it was just easier to document.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
yotam
Registered Member
Posts
4
Karma
0
Could this be the same bug affecting my inability to create an std::vector or std::valarray of SparseMatrix's? When I try, I get a bus error. I can create a raw array of SparseMatrix's just fine and a vector/valarray of DynamicSparseMatrix's just fine, too. Here is code to reproduce the problem (eigen 2.0.6):

Code: Select all
// g++ -Iext/eigen2 foo.c

#include "Eigen/Eigen"

typedef double real_t;
typedef Eigen::DynamicSparseMatrix< real_t, Eigen::RowMajor > assembly_matrix_t;
typedef Eigen::SparseMatrix< real_t, Eigen::RowMajor > solve_matrix_t;


and then one of

Code: Select all
#include <vector>

int main( int argc, char* argv[] )
{
    std::vector< solve_matrix_t > foo;
    foo.resize( 100 );
   
    return 0;
}


or

Code: Select all
#include <valarray>

int main( int argc, char* argv[] )
{
    std::valarray< solve_matrix_t > foo;
    foo.resize( 100 );
   
    return 0;
}


or

Code: Select all
int main( int argc, char* argv[] )
{
    solve_matrix_t* foo = new solve_matrix_t[ 100 ];
    delete [] foo;
   
    return 0;
}


The code with the new[] works fine, while the others crash. If I change solve_matrix_t to assembly_matrix_t it doesn't crash.
This is on Mac OS X, in case the problem is platform specific.

Thanks for making such a great library, by the way! I wouldn't have found this if I wasn't using Eigen so heavily. :-)

Yotam
User avatar
bjacob
Registered Member
Posts
658
Karma
3
No, this doesn't seem to be the same error. The error discussed in this thread was a very special issue about a specific function assuming double's to be 8 byte aligned.

Can you please open an issue on the tracker, and post a full back-trace of your crash (compile with -g3) ?


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!


Bookmarks



Who is online

Registered users: Bing [Bot], daret, Google [Bot], sandyvee, Sogou [Bot]