Moderator
|
can you be more specific ? in which case ?
Ok, let me be clear, currently the SelfAdjoint and UpperTriangular/LowerTriangular are only used to tells eigen it can optimize some computation, and managing the coefficient is left up to the user. For instance, if you set SelfAdjoint only, then the product will use this information to only over an half of the matrix, but that's it. If you set SelfAdjoint|UpperTriangular, then Eigen knows that the strict lower part is empty. If that's not the case, then you'll likely get an error. Also not that with sparse matrix, doing mat(i,j) is inherently slow (involve a binary search), might be broken if the coeff does not exist, and should be avoided as much as possible in favor of more high level algorithm built on top of efficient bricks.
I guess that matrix and source_matrix are both sparse, so the best is to simply do matrix = matrix_source; and let Eigen find the best algorithm to do the copy (eg., if matrix is row-major and matrix-source is col-major, then the copy is not trivial). Again, I'm not sure to understand your issue.
again, I think it is better to keep these flags only as hint for Eigen's high level algorithm, and let the user honors these flags the way he wants. So currently, the result won't be doubled.
Let's recap. First you have a sparse matrix A which is symmetric. Then in order to optimize the product A * v, you built a matrix B, upper triangular, such that B + B' = A. So in that case you had to take the upper part of A, and scale the diagonal by 0.5. With the current state of Eigen, what you can do is: during the computation of A discard all computation of the strict lower part ( if (j() * v From such a matrix A, if you want to assemble the full matrix A, you cannot do A = A + A'; because this will double the diag. does that answer your concerns ? |
Moderator
|
ok, thanks for the details.
Which uBlas matrix type do you use. I thought that for random access, uBlas also provided a matrix type based on a map ? A common way is also to accumulate all elements per column and at the end do a big sort per column and sum all entries which correspond to the same coeff. Do you use such a strategy ? this looks quite expensive.
1 - you need to have it installed (they are just a couple of headers, and I know debian has a package for them) 2 - include google hash_map headers before eigen's Sparse header 3 - do: RandomSetter, GoogleDenseHashMapTraits> setter(...); RandomSetter, GoogleSparseHashMapTraits> setter(...); |
Registered Member
|
Yes! One more bug report: SparseMatrix += SparsematrixUpperTriangular does not work properly. I counted the nonzeros and they are less than supposed to be. The segfault problem has been solved: wrong element access of myself. And some funny benchmark: I replaced the ublas::vector by its Eigen2 counterpart. During assembling process I often use the relation myvector(i) += value; It seems that uBlas is doing so considerably faster (added a third of the complete assembly runtime after doing this replacement). I even tried coeffRef - no change. Any ideas what is going wrong here? |
Moderator
|
sorry but I don't understand what could the problem be. I mean, the fact that the matrix is upper triangular does not change the way the sum is performed. If you actually meant: generic_sparsemat += symmetric_sparse_mat_where_only_upper_part_is_stored; then, yes that won't work out of the box, and I don't think that's a good idea to make it working transparently because it would hide expensive memory allocation and computation. On the other hand that'd be a good idea to add static assertions and to add a convenient method to transform such a matrix to a matrix with full storage.
this does not make sense since myvector(i) is equivalent to myvector[i] where myvector is a plain C vector (double* myvector;). Are you sure you compile with optimizations (-O2) and with NDEBUG defined (-DNDEBUG) ? |
Moderator
|
I also wanted to add to bench about the filling of a sparse matrix.
Here is what I did: 1 - precompute 24M of random coefficients (coordinates + values) 2 - insert these 24M value into a 1Mx1M sparse matrix in a random order. I benched: Eigen SparseMatrix + RandomSetter Eigen SparseMatrix + RandomSetter Eigen SparseMatrix + RandomSetter ublas mapped_matrix (+ copy to a compressed_matrix) ublas compressed_matrix ublas coordinate_matrix the results are (in sec): Eigen std::map 66.1888 Eigen google::dense 17.3848 Eigen google::sparse 43.7848 ublas mapped 63.9181 ublas compressed hours... ublas coordinate days... about the memory consumption, the worst is with ublas mapped and Eigen+std::map, then comes Eigen+google::dense, and far better is Eigen::sparse. So honestly, I really don't know what to do to provide something faster than ublas, since Eigen's is already much better according to that benchmark... The benchmark is in eigen/bench/sparse_setter.h |
Registered Member
|
I am not in office, so I will try to recover the details from memory:
For random filling one usually takes generalized_vector_of_vector > > (or similar - I hope I remember correctly) and copies the final matrix into a compress_matrix. This ansatz is much faster than those you have tested. On my example, the filling time is about the same as RandomSetter (6.5 vs. 6.2sec). The memory is almost the same as storing a compressed_matrix, whereas I discovered significant memory grow using Eigen2. I did not yet try the Google Hash (had some conflicts with gcc 4.3 - but I found a solution this morning) and will hope that it will solve most of my issues. I will look into the benchmark you provided next week in order to apply the uBlas types I have used here.
Yes, this is exactly what I meant. Currently I help myself using RandomSetter I am a little sorry for waking up sleeping dogs. Eigen2 is great and will provide features that want me to use it now though Eigen2 is not ready yet. It is my library of choice now - I am just accustomed to the logic behind boost and go into problems whenever both libs behave differently. |
Moderator
|
Thanks to you to beta test our sparse module and for your helpful feedback. Its good for my motivation and it already helps to improve the module a lot.
New results using ublas::vector_of_vector, 2.4M of zero in a 100k^2 matrix: ____________________time in sec____% of memory Eigen std::map_______4.71893_______35.5 Eigen google dense____1.08547_______27.2 Eigen google sparse___3.36835_______14.0 ublas mapped________4.72259_______35.4 ublas vecofvec_______3.76116_______19.1 so clearly google's hash_map really rocks ! Note that in all case I perform a conversion to a compressed format. This conversion is included in the time but the memory only includes the wrapper object AND the precomputed data stored as a std::vector of struct {int i; int j; double v; }; which represents 5.8%.
Last edited by ggael on Sat Jan 17, 2009 4:55 pm, edited 1 time in total.
|
Moderator
|
One more result: if you manage to fill your matrix such that you fill it one row (or column) after the other, i.e., using fillrand();, then it is 10 times faster than the fastest previous method with zero memory overhead:
Eigen fillrand________0.132481____11.4 that let me think that perhaps there exists a solution in-between and I have a few idea... |
Registered Member
|
You mean, when assembling a column wise matrix mxn, creating a vector of n sparse vectors with dimension m which are then assembled using fillrand? |
Registered Member
|
Now that we talked about matrix assembly some ideas came into my mind. From my understanding, the sparse data is stored in vectors. That means, a random modification (say adding some element) needs a shift of all subsequent data in memory making it rather expensive. A dynamically linked list would solve this problem, but would introduce new memory and decrease performance of linalg operations. This is what I know from uBlas: generalized_vector_of_vector (the equivalent of Eigen's RandomSetter with the only difference that generalized_vector_of_vector is a real matrix class and RandomSetter is not) is used whenever the matrix is going to be modified and may be even used in linalg operations where it is not that much slower compared with compressed_matrix.
During assemble process in FEM apps I usually have the following operations: (1) matrix(i,j) += scalar (2) matrix += matrix_B (or blockwise) (3) matrix = mat_A'*mat_B*mat_A (4) eliminate elements which are close to zero (optimizes linalg operations and memory) (5) (1..4) with mixed symmetric/general matrices I wonder if an efficient RandomSetter could be used to efficiently perform at least (1..3) ? |
Registered Member
|
Where I want to head to is:
When RandomSetter is applied to a matrix, it copies the contents of this matrix to itself and copies back when the destructor is called. When you really consider to add some specialized functions, I ask myself, why RandomSetter is not a matrix_type itself which only needs some fast copy functions. When applying points (1..3) I will need: o an efficient random access, probably even here with configurability (such as the built-in fillrand() per vector or the Google SparseHashmap instead for each vector) o an iterator concept allowing random insertion/deletion/modification of elements o basic operations like +=,*=, =scalar*mat o Other operations, particularly bindings, should not be required and should be referred to SparseMatrix I believe this is more flexible than the current concept. I am happy that my input is welcomed. I was in care if my many misunderstandings (and not complete reading of the documentation) lead to some annoyance. Thanks for the support so far - its like having a commercial license |
Registered Member
|
updates:
GoogleHash: The provided rpm seems to be optimized for Debian - I can not use it on openSuse using gcc 4.3 The provided tar (version 1.3) of GoogleHash compiles fine using the built-in example files. But the symbol 'GoogleSparseHashMapTraits' is not defined. Could you please check your version? I would love to test GoogleHash as well... UpperTriangular: On January 15th I tried the new types and was impressed: Without A*v multiplication the assembly took 0.11sec With vec+=A*v where A was general it took 0.16 sec With vec+=A*v where A was UpperTri it took only 0.13 sec On friday I send you a call for help because the performance dropped down to 0.19 sec, but I only changed the type of vec from uBlas to Eigen2::VectorXd - hence my first idea was that it is the assembly process itself. This morning I tried the whole assembly without vec+=A*v and obtained again 0.11sec. That means: The product became slower. I guess, either the product got slower or the contents of A. Did you change anything since 01/15 what may be the source of this? |
Registered Member
|
I do now know what happened: (1) First I assemble matrix A. (2) Then assemble vector v1; (3) Then I add v1 += A*v2; Before I used Ublas to assemble A and copied it to Eigen2-storage Now I use Eigen2's RandomSetter to assemble A from the beginning, i.e. the memory usage grew from 24% to 45% for the whole application. I guess the computer was catching and cleaning up memory while I already performed and tested steps (2..3). Repeating (2..3) gained the old times again. I did never experienced such a behaviour. I beg for pardon and hope this observation may be interesting for your own later benchmarks... |
Moderator
|
Hi,
for Google's sparse hash stuff, you actually need to include yourself before Eigen's Sparse header. Also I agree with you about the need for a more flexible true sparse matrix representation. Actually, this was my initial idea, and then I dropped it in favor of something much simpler: a basic RandomSetter. The reason why RandomSetter is not a matrix is because it relies on a hash_map implementation. Since the elements of a hash_map are not sorted (in contrast to std::map), it is not possible to perform any matrix operation on it. This is also why it is so fast. You probably think that because the elements are not sorted, the copy from the RandomSetter to the sparse matrix should be very expensive (because you would have to sort all coeffs)... the trick is to store the RandomSetter in an opposite storage order than the sparse matrix, such that during the copy we have to transpose the data that has for effect to naturally sort the data without any explicit sort ! Anyway, I'll add a DynamicSparseMatrix class that will provide all the needed flexibility. After some tests, it will basically be equivalent to a std::vector of SparseVector. My initial tests show that is almost as efficient than google::dense_hash_map (assuming the number of non zero entries per vector is not too much) with much lower memory overhead. Moreover such a DynamicSparseMatrix is almost the same than a SparseMatrix, so all basic matrix operations should be as efficient. gael. |
Registered Member
|
Hi,
I just found out that GoogleSparseHashMapTraits is part of Eigen's namespace :lightbulb:! Though I do not know about 'Bits' that you used in the benchmarks. It compiles even without it anyways.
Great! This is wonderful news. Sebastian |
Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell