Moderator
|
Actually, the current sparse*sparse product automatically switches between a sparse (dynamically linked list) and a dense vector to accumulate the result. So whatever the density is, you should always get good performance.
right, currently B + B.transpose() is not directly doable, you have to manually force the evaluation of the expression B.transpose() such that B and B' have the same storage order, eg: A = B + TypeOfB(B.transpose()); I'm still not sure I want to auto-magically evaluate the transpose expression, since this would hide a copy.
I think that with current eigen, you can simply write: DenseVector f = B * v + B.transpose() * v; and it should compile to exactly what you did without any memory overhead. As I said, computing A = B + B'; requires a temporary to store B' and another storage overhead to store A, so the overhead peak memory should be up to 2 * . |
Registered Member
|
Good to hear!
ah, I think I expressed myself misunderstandable: I mean: I converted A into B in order to express efficient storage and matrix-vector products.
Good to hear - I will test this! Btw: I couldn't stand the temptation: I had to test the new release this morning! The speedup is - I repeat myself - impressive: I have a speedup of factor 3 for a sparse_matrix*dense_vector product compared with uBlas! What is of concern is the time (more important) and memory usage during assembly process (random fill). I have to test the assemble efficiency using random access in detail, but on first sight it seems that there is not much speedup to uBlas. Regarding memory usage I found that Eigen2 takes more using random access. Consider a ublas matrix taking 6% RAM. While simply copying it using Eigen2 additional 16% are token (finally 4% are used). uBlas requires only negligably more RAM during filling process ending with additional 6%. One may argue that using a hash map for temporary storage always creates some overhead, but I believe it should be limited by a factor around 2 as target. |
Registered Member
|
There is another thing that is unclear to me:
What is the most efficient way to iterate over the nonzero elements of a sparse matrix? |
Registered Member
|
And a small bug report in between:
This code causes a egfault. Valgrind finds an uninitialized value. The code is working perfectly when allocating at least 1 element in the sparse matrix (commented lines). |
Registered Member
|
And the next benchmark:
Using a large triangular sparse matrix A and a dense vector v. I perform f = Av + A'v applied to uBlas objects like
The time is of the same magnitude (but little slower) compared the ublas-method, i.e.: iterate over all nonzeros of A and perform Av+A'v for each iterate (like described somewhere above in detail) |
Moderator
|
thanks, bug fixed in rev 911464 |
Moderator
|
here is an example:
we could add a NonzeroIterator which would integrate the outer for foop, but well... |
Registered Member
|
Thanks you! By replacing ublas by eigen2 in the operation Av=Bv+B'v using this iterator I am even faster than ublas (though I wonder why - I hope the result won't be different?). |
Registered Member
|
Because Gael is very clever
Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list! |
Registered Member
|
And his implementation speed leaves even the performance of Eigen2 behind! :star:
Last edited by Seb on Thu Jan 15, 2009 3:54 pm, edited 1 time in total.
|
Moderator
|
I have even better. Instead of decomposing A to B + B', what about:
DenseVector X; DenseVector y = A.marked() * x; if you only store the upper part of the symmetric sparse matrix A, then do: DenseVector y = A.marked() * x; if you only store the lower part: DenseVector y = A.marked() * x; You can even declare A like that: SparseMatrix A; fill only the upper part, and simply do: y = A * x; hope that works for you. |
Moderator
|
Hi,
now I would like to address the issue of assembling the matrix and I would be very interested in your use case. In Eigen, we currently offer 3 way to fill a sparse matrix: 1 - use startFill()/fill()/endFill() if you can set the coeff in a perfectly sorted order, i.e., if your matrix is col-major, you must first fill the 1st column, then the 2nd one, etc... and you must fill each column in an increasing row index order. 2 - use startFill()/fillrand()/endFill() : like above but you can fill each column in a random order. This approach is OK if you don't have too much nonzero per column/row since it involve a sorted insertion into a sequential array for each call to fillrand() 3 - use RandomSetter for completely random accesses. If I'm right in your case you have a set of N nodes, and each non zero corresponds to some neighborhood relation such that you could assemble your matrix using the 2nd method:
you could even use the 1st method if you can sort your neighbors yourself. Actually, I'm really surprised that you have to use the random setter. To be honest, I thought that this pattern would never be used in practice. Now, if you really need the random setter, I strongly recommend you to use one of the google's hash_map, either the dense_hash_map which is the fastest with reasonable memory usage (like ext::hash_map). If memory consumption is a problem then google's sparse_hash_map is what you need. |
Registered Member
|
Gael, you rock. Performance increased again over my own iterator method with an upper triangular filled SparseMatrix. There are a few things open: (1) bug: When iterating over the nonzeros, I obtain a segfault. (2) questions, questions, questions: (a) Do matrix(i,j) and matrix(j,i) refer to the same storage element? (b) When copying a full matrix, I suspect that matrix(i,j) = matrix_source(i,j) will be okay if I do this for every element? (c) When assembling a matrix from "full symmetric data", I suspect that matrix(i,j) += data_to_be_assembled(i,j) will be wrong since I would assemble the lower AND upper triangular part to the same storage? (= double the result?) (d) what happens with the diagonal? If B is defined being the upper triangular of A, should I say B(i,i) = 0.5*A(i,i) ??? Thanks! |
Registered Member
|
Unfortunately, it is not that simple. In the general case, our Hessian matrix of the strain energy is sparse unsymmetric. A computational intensive part is the evaluation of the material points. These are points in space where we evaluate constitutive material relations, say stress=stress(strain) with strain=strain(displacements) and displacements=interpolation of the coordinate displacement of the geometrical nodes of some "finite element" (eg. a brick or tetrahedron). The degrees of freedom (=the indices in the Hessian matrix) are the 3 translatory displacements and 3 rotations of each geometrical node. The mentioned functions may be highly nonlinear. To speed up the material evaluation we go once through each material evaluation point and add the corresponding portions to the Hessian. Moreover, the Hessian of a single material point is more or less dependent on all surrounding geometrical nodes and their degrees of freedom. Your idea leads to the question if it is possible to sort the material points in such a way that one could use a more efficient fill order, but this is a new and rather complex topic which restrains my freedom in implementing new mechanical models in a simple way. At the moment I still use uBlas for the general assemble process and Eigen2 for the storage of constant and symmetric parts of the Hessian. I tried the RandomSetter at a - say - unfair position, that is copying uBlas data to Eigen2 where the matrix structure, indeed, is supposed to be known such that even fill() could be used. I still have to investigate how I can make use of Google's Hash. I have seen so far that Eigen's cmake driver is looking for some library. Is there an example where it is actually used? |
Registered Member
|
RandomSetter does not yet work with an UpperTriangular matrix. (The result A*u is wrong by magnitudes).
Update Forget everything! I forgot that the svn version requires the RowMajor flag being set! No wonder that something went wrong! In the online documentation "RowMajor" is the only flag which is defined in the SparseMatrix class header. Since reading this over and over again I will always believe, sparse storage is row major by default... :-O sorry!
Last edited by Seb on Fri Jan 16, 2009 1:48 pm, edited 1 time in total.
|
Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell