Registered Member
|
I got the following error when running SparseMatrix<double> mat2= mat.transpose()*mat.
Can anyone help me estimate what goes wrong? The error message is "terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc Aborted" In my program, mat is a 129m*31m sparse matrix with 1.5b non-zero entries; My machine is a 64bit linux machine, and has 240G free memory. When the program crashed, only 66G memory was used. I can use Matlab to compute mat'*mat without any problem, resulting in a 31m*31m sparse matrix. Thanks, David |
Moderator
|
could you show the trace? thanks.
also, what's the number of non zeros in the result computed by matlab? |
Registered Member
|
1, There are 2.8409e+09 non-zero entries in mat'*mat;
2, When the error happened, I still had >100G free memory; 3, The code is compiled as 64 bit code; 4, The trace is here: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Program received signal SIGABRT, Aborted. 0x00000fffb7a080e0 in .raise () from /lib64/power7/libc.so.6 (gdb) where #0 0x00000fffb7a080e0 in .raise () from /lib64/power7/libc.so.6 #1 0x00000fffb7a09da0 in .abort () from /lib64/power7/libc.so.6 #2 0x00000fffb7db9658 in ._ZN9__gnu_cxx27__verbose_terminate_handlerEv () from /usr/lib64/libstdc++.so.6 #3 0x00000fffb7db6944 in ?? () from /usr/lib64/libstdc++.so.6 #4 0x00000fffb7db6988 in ._ZSt9terminatev () from /usr/lib64/libstdc++.so.6 #5 0x00000fffb7db6b04 in .__cxa_throw () from /usr/lib64/libstdc++.so.6 #6 0x00000fffb7db7410 in ._Znwm () from /usr/lib64/libstdc++.so.6 #7 0x00000fffb7db7540 in ._Znam () from /usr/lib64/libstdc++.so.6 #8 0x0000000010051c4c in Eigen::internal::CompressedStorage<double, int>::reallocate (this=0xfffffffe950, size=18446744072339895854) at /users4/dwang/eigen/Eigen/src/SparseCore/CompressedStorage.h:207 #9 0x0000000010051dd8 in Eigen::internal::CompressedStorage<double, int>::reserve (this=0xfffffffe950, size=18446744072339895854) at /users4/dwang/eigen/Eigen/src/SparseCore/CompressedStorage.h:77 #10 0x0000000010051ea0 in Eigen::SparseMatrix<double, 0, int>::reserve (this=0xfffffffe930, reserveSize=-1369655762) at /users4/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:259 #11 0x0000000010034568 in Eigen::internal::conservative_sparse_sparse_product_impl<Eigen::SparseMatrix<double, 0, int>, Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, int> > (lhs=..., rhs=..., res=...) at /users4/dwang/eigen/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h:41 #12 0x0000000010056364 in Eigen::internal::conservative_sparse_sparse_product_selector<Eigen::SparseMatrix<double, 0, int>, Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long>, 0, 0, 0>::run (lhs=..., rhs=..., res=...) at /users4/dwang/eigen/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h:140 #13 0x000000001005649c in Eigen::SparseSparseProduct<Eigen::SparseMatrix<double, 0, int>, Eigen::SparseMatrix<double, 0, long> const&>::evalTo<Eigen::SparseMatrix<double, 0, long> > (this=0xfffffffece0, result=...) at /users4/dwang/eigen/Eigen/src/SparseCore/SparseProduct.h:118 #14 0x0000000010056540 in Eigen::SparseMatrixBase<Eigen::SparseMatrix<double, 0, long> >::operator=<Eigen::SparseMatrix<double, 0, int>, Eigen::SparseMatrix<double, 0, long> const&> (this=0xfffffffedf8, product=...) at /users4/dwang/eigen/Eigen/src/SparseCore/SparseProduct.h:162 #15 0x00000000100565a4 in Eigen::SparseMatrix<double, 0, long>::operator=<Eigen::SparseMatrix<double, 0, int>, Eigen::SparseMatrix<double, 0, long> const&> (this=0xfffffffedf8, product=...) at /users4/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:728 #16 0x000000001005666c in Eigen::SparseMatrix<double, 0, long>::SparseMatrix<Eigen::SparseSparseProduct<Eigen::SparseMatrix<double, 0, int>, Eigen::SparseMatrix<double, 0, long> const&> > (this=0xfffffffedf8, other=...) at /users4/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:652 #17 0x0000000010034cf4 in main (argc=6, argv=0xffffffff2e8) at ./src/Eigs.cpp:58 |
Registered Member
|
I think the problem is in the following line, where the reserveSize is a negative number.
This triggered the consequent problems. I did not check why this variable became a negative number in the first place. #10 0x0000000010051ea0 in Eigen::SparseMatrix<double, 0, int>::reserve (this=0xfffffffe930, reserveSize=-1369655762) at /users4/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:259 |
Moderator
|
oh, for such a big matrix you should use std::ptrdiff_t for the index type:
|
Registered Member
|
I also tried std::ptrdiff_t, but got the same error.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ typedef Eigen::SparseMatrix<double, 0, std::ptrdiff_t> SpMat; //initialize the matrix of mat; SpMat mat(matrix.numrow,matrix.numcol); mat.setFromTriplets(matrix.triplelet.begin(), matrix.triplelet.end()); //compute mat'*mat SpMat mat2= mat.transpose()*mat; //std::bad_alloc ERROR ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ running mat.transpose() alone is fine. |
Moderator
|
This patch should do the job: (I cannot reproduce as I don't have enough memory on my computer)
|
Moderator
|
https://bitbucket.org/eigen/eigen/commits/663e2458f7f7/
Changeset: 663e2458f7f7 User: ggael Date: 2014-02-13 23:58:28 Summary: Fix propagation of index type |
Registered Member
|
Thank you for the update.
This time I went much further, but failed again. Should I also make changes to SparseMatrix.h? See #2 of the gdb message. David This is the error message: assertion failed: m_outerIndex[outer]==int(m_data.size()) && "You must call startVec for each inner vector sequentially" in function void Eigen::SparseMatrix<_Scalar, _Flags, _Index>::startVec(typename Eigen::internal::traits<Eigen::SparseMatrix<_Scalar, _Options, _Index> >::Index) [with _Scalar = double, int _Options = 0, _Index = long int] at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:405 Program received signal SIGABRT, Aborted. 0x00000fffb7a080e0 in .raise () from /lib64/power7/libc.so.6 This is the gdb message: (gdb) where #0 0x00000fffb7a080e0 in .raise () from /lib64/power7/libc.so.6 #1 0x00000fffb7a09da0 in .abort () from /lib64/power7/libc.so.6 #2 0x0000000010040f84 in Eigen::internal::assert_fail ( condition=0x1006d018 "m_outerIndex[outer]==int(m_data.size()) && \"You must call startVec for each inner vector sequentially\"", function=0x1006f0e0 "void Eigen::SparseMatrix<_Scalar, _Flags, _Index>::startVec(typename Eigen::internal::traits<Eigen::SparseMatrix<_Scalar, _Options, _Index> >::Index) [with _Scalar = double, int _Options = 0, _Index ="..., file=0x1006d080 "/users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h", line=405) at /users/dwang/eigen/Eigen/src/Core/util/Macros.h:193 #3 0x00000000100414dc in Eigen::SparseMatrix<double, 0, long>::startVec (this=0xfffffffe910, outer=3325149) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:405 #4 0x00000000100310c8 in Eigen::internal::conservative_sparse_sparse_product_impl<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long> > (lhs=..., rhs=..., res=...) at /users/dwang/eigen/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h:46 #5 0x00000000100507a8 in Eigen::internal::conservative_sparse_sparse_product_selector<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long>, 0, 0, 0>::run (lhs=..., rhs=..., res=...) at /users/dwang/eigen/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h:140 #6 0x00000000100508e0 in Eigen::SparseSparseProduct<Eigen::SparseMatrix<double, 0, long> const&, Eigen::SparseMatrix<double, 0, long> const&>::evalTo<Eigen::SparseMatrix<double, 0, long> > (this=0xfffffffecc0, result=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseProduct.h:119 #7 0x0000000010050984 in Eigen::SparseMatrixBase<Eigen::SparseMatrix<double, 0, long> >::operator=<Eigen::SparseMatrix<double, 0, long> const&, Eigen::SparseMatrix<double, 0, long> const&> (this=0xfffffffedf8, product=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseProduct.h:163 #8 0x00000000100509e8 in Eigen::SparseMatrix<double, 0, long>::operator=<Eigen::SparseMatrix<double, 0, long> const&, Eigen::SparseMatrix<double, 0, long> const&> (this=0xfffffffedf8, product=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:728 #9 0x0000000010050ab0 in Eigen::SparseMatrix<double, 0, long>::SparseMatrix<Eigen::SparseSparseProduct<Eigen::SparseMatrix<double, 0, long> const&, Eigen::SparseMatrix<double, 0, long> const&> > (this=0xfffffffedf8, other=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:652 #10 0x0000000010031840 in main (argc=6, argv=0xffffffff2e8) at ./src/Eigs.cpp:63 |
Moderator
|
Indeed, the assertion should be replaced by:
m_outerIndex[outer]==Index(m_data.size()) |
Moderator
|
https://bitbucket.org/eigen/eigen/commits/70173f2a5116/
Changeset: 70173f2a5116 User: ggael Date: 2014-02-15 09:35:23 Summary: Fix a few Index to int buggy conversions |
Registered Member
|
I believe we are very close to get the correct answer.
This may be one of the last errors: ~~~~~~~~~~~~~Error message:~~~~~~~~~~~~~~~~~~~~ Program received signal SIGSEGV, Segmentation fault. 0x000000001004e060 in Eigen::SparseMatrix<double, 1, long>::operator=<Eigen::SparseMatrix<double, 0, long> > ( this=0xfffffffe958, other=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:1101 1101 dest.m_data.index(pos) = j; ~~~~~~~~~~~~~GDB message:~~~~~~~~~~~~~~~~~~~~~ (gdb) where #0 0x000000001004e060 in Eigen::SparseMatrix<double, 1, long>::operator=<Eigen::SparseMatrix<double, 0, long> > ( this=0xfffffffe958, other=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:1101 #1 0x000000001004e268 in Eigen::SparseMatrix<double, 1, long>::SparseMatrix<Eigen::SparseMatrix<double, 0, long> > ( this=0xfffffffe958, other=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:652 #2 0x0000000010050898 in Eigen::internal::conservative_sparse_sparse_product_selector<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long>, 0, 0, 0>::run (lhs=..., rhs=..., res=...) at /users/dwang/eigen/Eigen/src/SparseCore/ConservativeSparseSparseProduct.h:142 #3 0x00000000100509bc in Eigen::SparseSparseProduct<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long> const&>::evalTo<Eigen::SparseMatrix<double, 0, long> > (this=0xfffffffecd0, result=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseProduct.h:119 #4 0x0000000010050a60 in Eigen::SparseMatrixBase<Eigen::SparseMatrix<double, 0, long> >::operator=<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long> const&> (this=0xfffffffedf0, product=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseProduct.h:163 #5 0x0000000010050ac4 in Eigen::SparseMatrix<double, 0, long>::operator=<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long> const&> (this=0xfffffffedf0, product=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:728 #6 0x0000000010050b8c in Eigen::SparseMatrix<double, 0, long>::SparseMatrix<Eigen::SparseSparseProduct<Eigen::SparseMatrix<double, 0, long>, Eigen::SparseMatrix<double, 0, long> const&> > (this=0xfffffffedf0, other=...) at /users/dwang/eigen/Eigen/src/SparseCore/SparseMatrix.h:652 #7 0x000000001003185c in main (argc=6, argv=0xffffffff2e8) at ./src/Eigs.cpp:61 ~~~~~~~~~~Code:~~~~~~~~~~~~~~~~~ SpMat mat3= mat.transpose()*mat; //Line 61 of Eigs.cpp SpMat is defined as: typedef Eigen::SparseMatrix<double, 0, std::ptrdiff_t> SpMat; mat is also a SpMat type matrix, |
Moderator
|
hm, that one looks also more tricky. What's the value of 'pos' and dst.m_data.size() and dst.m_data.allocatedSize()? thanks.
|
Registered Member
|
(gdb) print pos
$1 = -2144616846 It will take a longer time to get the other two's values. |
Moderator
|
another negative number. that's weird, I cannot reproduce even when reaching the memory limit of a computer equipped with 96GB. Here is my testing program, run with 10000 as argument:
|
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], Yahoo [Bot]