Registered Member
|
Hi, this is my first post here and would like use this opportunity to thank everyone who contributed to Eigen .
Now, back to my question. a. Problem: solving system of linear equations with a sparse SPD coefficient matrix (H) b. Dimensions: H is now precisely 104x104 c. Sparsity: ____c1. H_non_zero_entries_cnt/H_all_entries_cnt = ca. 0.45 ____c2. sparsity pattern is fixed for the entire scenario d. Usage scenario: solving the linear system multiple (hundreds of thousands -> millions -> ...) times; for each factorization there are ca. 180 "solves" [note: points above describe the actual situation which made me create this thread, but in general: H can be arbitrarily large and the proportion between factorizations and solves can be different, which does not change the fact that solves are the bottleneck] e. Approach: I used to treat H as a dense matrix and apply dense L(D)LT solvers but I needed MUCH shorter execution times => upon noticing (c1) I turned to sparse H and simplicial L(D)LT. To exploit (c2) I use a single analyzePattern() and then only factorize()-s and solve()-s. The same sparse matrix object is used the entire time - only non-zero entries (InnerIterator) are updated (before each factorization). f. Question: after compilation and testing I was happy to see proper results but a bit frustrated with the performance: ____+ sparse factorize() turned out to be comparable to dense compute() ____+ sparse solve() turned out to be comparable (actually - a bit slower) to the dense solve() Was I being naive hoping for performance boost? Do you know a better way I could exploit the sparsity of H to get the boost? Thanks! |
Registered Member
|
Your matrix is actually very small.
I'm afraid you won't see any difference between dense and sparse solve. In general, you should call analyzepattern() only once as long as the structural pattern does not change, and then call factorize() once as long as the values in the matrix does not change, then call solve() for all right hand sides. I assume It is what you are already doing. For your particular case, since your matrix is almost 50% full, it's better to call a dense solver. |
Registered Member
|
Dee33 - thanks for the speedy reply.
So 50% sparsity is not all that sparse, huh? This is bad news. The thing that originally got my hopes up was this article by Roy Featherstone - it is about LTL/LTDL factorizations with no fill-in. I know this is a specific case but I imagined (mistakenly, as it seems) that "generic" sparse solvers would also provide some speed-up. Nevertheless, my H matrix is precisely the one described in this paper so all Feathersotne's assumptions hold. It would be great if I could pick your brains on the following subject: is it worth spending time trying to go with the aforementioned article in hope of getting a substantial speed-up compared to Eigen's built in L(D)LT solvers? I am asking this since I imagine that Eigen's implementations (both dense and sparse ones) are already highly optimized while the ideas presented in the article are abstracted from actual implementation reality and may not provide any improvement once implemented within Eigen. To be honest - I had a (brief) look at the Eigen's code for these solvers and my eyes soon began to bleed (although I know that they are far less complex than other modules). Please let me know - is it worth further investigation or maybe chances are way too slim and I should look for optimizations elsewhere? |
Registered Member
|
I see. With a careful implementation of the cholesky-like decomposition on your matrix, it will produce no fill in.
Sounds good. But I remain skeptical about performance on a so small matrix. If your application will deal with larger matrices in the future, then you can consider spending time to implement their proposed approach. In the actual case, It's better to keep the dense solver, it benefits from vectorization and is more cache friendly, unlike the sparse implementation with multiple indirect adressing. |
Moderator
|
I also don't believe you'll get any performance gain with only 50% of non zeros even with no fill-in. The full dense case has the advantage to be vectorized (x4 for floats on SSE), to enable direct and coherent memory access, etc.
|
Registered Member
|
I get it - thanks a lot for your suggestions
|
Registered users: Baidu [Spider], Bing [Bot], Google [Bot], Yahoo [Bot]