Registered Member
|
I've recently started using Eigen, and I really like it.
That said, I've been having performance problems when building a moderately large, sparse matrix as the sum of various sparser ones. My profiler (Shark on OS X, a sampling profiler) that 95%+ of my time is being spent inside operator= inside operator+=. I've been using the DynamicSparseMatrix class. Any ideas? It's *much* faster when I simply store a list of (i,j, value) tuples and then build a matrix out of those at the end, but that approach takes too much memory. Thanks, Yotam |
Moderator
|
hi I add a look and even though a DynamicSparseMatrix is a bit slower than a compact SparseMatrix for that operation (because of the several memory allocations, it is still faster than other libs (like GMM++) and I don't know how to make it faster.
I guess your use case looks like:
right ? then maybe you can try to use a compact SparseMatrix for the main_matrix (they are compatible) to reduce the number of dynamic memory allocations, or you can also try to directly fill the main_matrix (still declared as a DynamicSparseMatrix):
and finally you can also have a look at this page explaining how to fill a sparse matrix in the most efficient way: http://eigen.tuxfamily.org/dox-devel/Tu ... rseFilling. When you said it is much faster to build a list of (i,j,value) tuples and then build a matrix, you mean that at the end you have to sort all tuples while accumulating the duplicated values ? How do you do this sort ? do you simply use the DynamicSparseMatrix mechanisms ? And how much faster is it ? |
Registered Member
|
Thanks, I refactored my code so that all my += happens in one place; now I will switch the += to a fill according to the tutorial. It would be nice to have a way to reserve the number of non-zeros *per row* rather than for the whole matrix. PETSc allows one to do this. In the case of addition, I can even calculate exactly what the resulting sparsity pattern will be (the union of both). I wish I could preallocate a matrix with the sparsity pattern of all the matrix addition I'm about to do, then there would be no memory allocation in the loop.
What I did was simply store pointers to every matrix I wanted to add, then iterate over those to create a giant list of tuples, and then I copy those to scipy in Python, and then build a sparse matrix there (scipy.sparse.coo_matrix). That was very fast (how fast I don't remember, ~5x?), but the memory use was unbearable for large problems. |
Moderator
|
Very interesting.
About a reserve per row (or per col), that's very easy to add a function for that, so I'll do it, and the API will be: mat.row(j).reserve(XXX); Then you can create a pattern by doing .insert(i,j) = 0;, but I still don't really understand how you can take benefit of it because it seems to me that computing the pattern or doing the actual addition is basically the same. Unless you have some additional knowledge which are specific to your case. About scipy, well, I guess I have to have a look at how scipy implements that operation ! |
Moderator
|
ok, I run again some benchmark for the matrix assembly process, and if you cannot take any advantage of any ordering (random filling), then from the fastest to the slowest method we have:
1 - Use google's hash map: SparseMatrix<Scalar> mat(SIZE,SIZE); { RandomSetter<SparseMatrix<Scalar>, GoogleDenseHashMapTraits> setter(mat); for (...) setter(i,j) += value(i,j); } // mat is ready for use 2 - Directly fill a single DynamicSparseMatrix using: mat.coeffRef(i,j) += value(i,j); 3 - Build a tuple list and then sort it and accumulate duplicates (it is sometimes faster than method 2 but it requires a lot of memory). This mode is currently not in Eigen but it could be implemented as a specialization of RandomSetter. 4 - Build intermediate matrices and accumulate them (2 to 4 times slower than other methods) Actually I was in the process of removing the RandomSetters mechanism to make DynamicSparseMatrix the unique way to assemble a sparse matrix. Indeed, the DynamicSparseMatrix class has a lot of advantages: - even for purely random accesses it is still pretty fast - allows to perform higher level operations during the matrix assembly process (reference to rows, blocks, etc) - no dependency (unlike the Google's dense hash map mode) - the matrix is always valid (no duplicates, data are sorted, etc.) and ready for use - keeping only it makes the lib simpler to learn But I'm still a bit puzzled because a RandomSetter based on tuples or better on google's dense hash map can still be slightly faster (x1.5). Do you have an opinion ? |
Registered users: Bing [Bot], daret, Google [Bot], sandyvee, Sogou [Bot]