Registered Member
|
Hello,
I am working on an application where I need to store a (relatively large) number of (really) sparse matrices. At some point I noticed that my application was consuming too much memory, and I thought that it was because I was not reserving the appropriate memory for the sparse matrices. However, given that the matrices are really, really sparse, I tried to allocate a number of empty matrices just to have a sense of the overhead. Follows bellow my test code:
Surprisingly my memory profiler (I am using Intel Inspector XE 2015) reports that the above test consumes more than 1 GB. I wonder if I am doing something super wrong or perhaps Eigen is already (pre)allocating some memory, even if I say that all rows have no non-zero entries. Any help is immensely appreciated! |
Moderator
|
Each matrix has to allocate the "outer-index-vector" which is of size numberOfVariables*sizeof(int). Since you have numberOfVariables of such matrices, this represents:
numberOfVariables^2 * 4 = 400 000 000 Bytes |
Registered Member
|
Thanks for the reponse ggael!
So, according to you, I should rethink my design, given that I'll be allocating a considerable amout of memory for nothing. Is that right? What I am actually trying to implementing is a sparse Tensor class by stacking several Sparse matrices. This is convenient in my case because all operations (including some linear system solutions) I have to perform are in the "direction" of this several matrix-slices of my tensor. I know Eigen has an implementation of Tensor. How mature is it? Last time I read about it there were many features still pending implementation. Should I consider to use it? Or perharps there is a way to workaround this "outer-index-vector" allocation? Thanks and regards! |
Moderator
|
The current Tensor implementation is for dense tensors only.
Cannot you work slice by slice so that only once has to be stored? What about the sparsity pattern of the different slice? Are they all the same or all completely different? What about storage[i].col(j).nonZeros()? Is it approximately constant for all i? Are there many empty slices? |
Registered Member
|
Hi ggael,
Follow below my answers:
That would be a possibility, however with many drawbacks in terms of design.
Each row that has entries different than zero, which has number of neighbors in the discretization stencil, has as many non-zero entries as there are neighbors in the discretization. In other words, you can think of an hepta-diagonal tensor in the 3D case.
Yes, it is. Only the entries respective to the cells in the boundaries will have different nonZeros size. But mind the answer above. The only non-zero entries are with respect to the cell and its neighbors. Everything else is 0.
Although very sparse, no slice wil be empty. Thanks for looking into it! |
Moderator
|
Alright, so if I follow you correctly, you would have about 7 nonzeros per row of each slice (neglecting boundaries), and so the total number of nonzeros would be: 7 * numberOfVariables^2, so about 5.2GB to store only the values of the nonzeros anyways... So much more than the storage of the "outer-indices".
|
Registered Member
|
I can see I was not very clear in my previous reply. Let me try to clarify things. For the sake of simplicity, let's consider a 1D case first. The first slice of my tensor should look like (I wish I had an equation editor here... I will try my best):
[ x x 0 ... 0 x x 0 ... 0 0 0 0 ... 0 (....) 0 0 0 ... 0] So only the two first entries of the first two rows are non-zero. Now if we look at the second slice: [ x x 0 0 0 ... 0 x x x 0 0 ... 0 0 x x x 0 ... 0 0 0 0 0 0 ... 0 (....) 0 0 0 0 0 ... 0] So, now I have only three non-zero rows and each row has three non-zero entries (disconsider the first row, because it is a boundary). The pattern goes on and the final tensor could be interprerted as a tri-diagonal tensor, but the diagonals are defined in the direction of the slices (in the direction of the screen if you will). We can make an estimate of the required memory using the following formula: Mem = 8*(2d+1)^2*Nslices where d is the dimension of my problem (1D, 2D, 3D) and Nslices the number of matrices in the third dimension of the tensor. So, for the case I posted first, I would expect an memory allocation of Mem = 8*(2*3+1)^2*10000 = 3.92 MB and not 400 MB as reported before. Of course my estimate is an upper-limit of only the memory required by the individual entries, given that I am disconsidering boundaries and not taking taking into account the auxiliary data structures of Eigen::SparseMatrix - which here seems to be occupying most of the allocated memory if compared to matrix entries memory. Hope it was clearer now. One thing I could do is to store only the triplets for each slice. I believe this would have a memory requirement closer to what I was expecting. However I would lose in execution time given that I would have to assemble an Eigen::SparseMatrix every time I had to operate with a given slice. |
Moderator
|
Alright, then you would need: http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1179
|
Registered Member
|
Wow, that seems to be exactly what I need!!!
Given that it is a very recent feature request (just a couple of days ago) would you have an estimate of when this would become available? I am eager to have access to it! |
Moderator
|
You should ask Dmitry who proposed to implement this feature. Perhaps, he could already share his current implementation...
BTW, you said that you have to perform some linear solving per slice, but since numerous rows and columns are completely empty, your problems are highly singular.... |
Registered Member
|
I've registered to Eigen Bugzilla and subscribed to the feature request. I added a comment making myself available to perform tests with his implementation!
|
Registered Member
|
Oh, and about the singularity of the system. Indeed, it would be highly singular if I didn't operate with it before actually solving it with a given RHS. But I actually do some operations with a given slice before sending it to the linear solver. But thanks for the warning anyways!
|
Registered Member
|
Update:
I've been testing the implementation ggael metioned above (http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1179). If I run the test I posted before using Eigen::SparseMatrix the total memory allocation is 1113 MB according to Intel Inspector. However, if I use the HyperSparse matrix implementation suggested by Dmitry, the memory allocation goes down to 400 MB. It is a considerable memory allocation requirement, but much better than before. The allocation process seems to be faster as well. Note: in the algorithm above depicted, the line
must be changed to
because the currrent implemenation does not support such type of reserve method. I'm not even sure if it would make sense to support it though... |
Registered users: Bing [Bot], Evergrowing, Google [Bot], rockscient