Registered Member
|
Hey all,
my problem ist that Matlab solves exactly the same linear system three times faster than Eigen 3.2.4 with the sparsesuite library. For the solution within my c++-code, I use the following schemes:
or
Does anyone has an idea, why MATLAB could solve ~2-3 times faster? Here is the link to the linear system: LinSystem.tar.bz2 |
Registered Member
|
What compiler are you using?
Also, are you sure all the compiler optimizations are enabled? |
Registered Member
|
compiler is gcc 4.8.2 with optimization flags: -O3 -msse2 -DNDEBUG
|
Moderator
|
In this scenario, Eigen is only a very thin layer between your code and Suitesparse, so there is not much we can do on Eigen's side. However, Suitesparse solvers have to be linked to BLAS/Lapack library for efficient dense algebra. Matlab is using the highly optimized and parallelized MKL library for that. If you don't have a fast BLAS installed, you can compile Eigen's one yourself:
and link your code with build/blas/libeigen_blas_static instead of the default blas. If your CPU support AVX, then download the devel version of Eigen, and compile with -DEIGEN_TEST_AVX=ON (x2 speed up). Another possibility is that your matrix is extremely sparse, in which case the builtin SparseLDLT<> solver will be significantly faster than Cholmod. |
Registered Member
|
Thank you for your help. I had not in mind that MATLAB does not use stack suitesparse solvers. That should be the point.
Anyways, is there by chance a straightforward way to use for instance PETSc solvers with Eigen? |
Moderator
|
I'm not sure to follow you, so to make it clear: AFAIK, Matlab does use suitesparse solvers, but suitesparse itself is not self-contained and it is has to be linked to a BLAS/LAPACK library. The speed of suitesparse will thus highly depend on the performance of the chosen BLAS/LAPACK library.
|
Registered Member
|
Dear ggael,
I have similar questions about sparse solvers in Eigen. Their performance are not as good as it should be for Ax=b problem. Could you please help me to clarify them? My system: CPU Xeon e5-2609 v2 with 16 Gb DDR3 1600. Currently I use MKL with Eigen, suitesparse and SuperLU, I will try to compile BLAS with AVX option as you suggested. In the first testing, I run a loop on some different matrices A sized 10000x10000, each row has 25 non-zero elements. They could be symmetric but not positive definite. And here is the time performance I got:
I don't know why the BiCGSTAB solvers are extremely slow and give me very bad accuracy, so I don't use them. In the second testing, I use matrices A as 40000x40000, each row has 25 non-zero elements.
I am not sure if the results I got are reasonable, but I expect to get better results from Eigen as they are mentioned in http://eigen.tuxfamily.org/dox/group__TopicSparseSystems.html . Or maybe I miss something with the configuration? Could you please share me the matrices you used (vector_graphics, poisson_SPD, sherman2, ... ) for the Benchmark Routine? I think it'd be great if you provide these matrices to Eigen's users who want to bench/check the time performance. Indeed the UmfPackLU solver is the best one here, but as far as I know, Umfpack works only when matrix A is small. Could you suggest me which solver I should use for bigger and very sparse matrix A (size 500000x500000 and about 50-100 non-zero elements in each row), especially when it is not symmetric positive definite? In my demo, sometimes I need to solve a single Ax=b problem with very big and sparse A, sometimes I need to solve multi small dense linear problems under OMP parallel pragma. The question is do I need to disable/enable multi-threading in Eigen and how to do it?
|
Moderator
|
Which eigen version are you using? Please try with the 3.2.5, especially regarding SparseLU for which supernodes has been wrongly disabled in some 3.2.x versions.
How are you generating the matrices? Random matrices are usually not representative of real-world problems: for sparse problems the structure matters a lot. If BiCGSTAB is so slow, then maybe you did not used the right preconditioner. For non diagonally dominant problems, try with IncompleteLUT<>. BTW, "50-100 non-zero elements in each row" is not "very-sparse", it is rather dense and for such problem BiCGSTAB, LeastSquareConjugateGradient, or ConjugateGradient are usually much faster than direct solvers. In the devel branch, these iterative solvers can even exploit OpenMP. I'll post an archive with our testbed matrices soon. Could you share yours so that we can include it in our tests. Clearly it is much different that what we currently have. A 10000x10000 one would be good enough as what matter the most for testing purpose is the structure. For OpenMP, see: http://eigen.tuxfamily.org/dox/TopicMultiThreading.html |
Registered Member
|
Hi ggael,
Thanks for your support. I am using Eigen 3.2.5. In my problem, the matrix A is an approximation of Laplacian-Beltrami operator over big point sets. The number of non-zero element in row i equals to the number of the neighbors within a sphere centered at point p_i. These elements are defined as : A[i,j] = w_ij*f(p_i), here w_ij are weighted values. And A[i,i] = - sum_of_all(A[i,j]) with i != j (thus the matrix A is not diagonally dominant. And It is not symmetric because of double precision ). I tried BiCGSTAB with IncompleteLUT<double> but it still touch a long time to complete with bad convergence. It should work faster than other solvers. Here is a piece of my code with Eigen solvers.
Since the number of non-zero elements in each row is a input parameter, it could be from 25 (for small data sets with around 10k elements) to 500 (for big data sets with millions elements). I said my matrices are sparse because their sparsity is less than 1%. Guess I am not the only one who misunderstand the term "sparse matrix" in Eigen documentation. Anyways, the dense solvers are still much slower than the sparse solvers in my case. Maybe the time performance I got is reasonable, it is slow because of big number of non-zero elements. There are some weird rules in our company, so I have to ask my boss first before download and sharing anything from our servers. It could take time to get his answer. I will pm you later. |
Moderator
|
hm, I guess it is not symmetric because f(p_i) != f(p_j), but if w_ij==w_ji, then you form the A as A(i,j) = w_ij, and solve the problem: A * x = F.asDiagonal().inverse() * b; with F a VectorXd with F[i]=f(p_i). Now A is symmetric, and a ConjugateGradient should do the job pretty well. Again, to get the best of ConjugateGradient, use the devel branch and: ConjugateGradient<SparseMatrix<double>, Lower|Upper> cg(A); x = cg.solve(F.asDiagonal().inverse() * b); You might even compile with -fopenmp to get multithreading. |
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]