This forum has been archived. All content is frozen. Please use KDE Discuss instead.

[SOLVED] Are there bindings to other libs?

Tags: None
(comma "," separated)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Seb wrote:(1) bug:
When iterating over the nonzeros, I obtain a segfault.


can you be more specific ? in which case ?

(a) Do matrix(i,j) and matrix(j,i) refer to the same storage element?


Ok, let me be clear, currently the SelfAdjoint and UpperTriangular/LowerTriangular are only used to tells eigen it can optimize some computation, and managing the coefficient is left up to the user. For instance, if you set SelfAdjoint only, then the product will use this information to only over an half of the matrix, but that's it. If you set SelfAdjoint|UpperTriangular, then Eigen knows that the strict lower part is empty. If that's not the case, then you'll likely get an error. Also not that with sparse matrix, doing mat(i,j) is inherently slow (involve a binary search), might be broken if the coeff does not exist, and should be avoided as much as possible in favor of more high level algorithm built on top of efficient bricks.

(b) When copying a full matrix, I suspect that matrix(i,j) = matrix_source(i,j) will be okay if I do this for every element?


I guess that matrix and source_matrix are both sparse, so the best is to simply do matrix = matrix_source; and let Eigen find the best algorithm to do the copy (eg., if matrix is row-major and matrix-source is col-major, then the copy is not trivial). Again, I'm not sure to understand your issue.

(c) When assembling a matrix from "full symmetric data", I suspect that matrix(i,j) += data_to_be_assembled(i,j) will be wrong since I would assemble the lower AND upper triangular part to the same storage? (= double the result?)


again, I think it is better to keep these flags only as hint for Eigen's high level algorithm, and let the user honors these flags the way he wants. So currently, the result won't be doubled.

(d) what happens with the diagonal? If B is defined being the upper triangular of A, should I say B(i,i) = 0.5*A(i,i) ???


Let's recap. First you have a sparse matrix A which is symmetric. Then in order to optimize the product A * v, you built a matrix B, upper triangular, such that B + B' = A. So in that case you had to take the upper part of A, and scale the diagonal by 0.5.

With the current state of Eigen, what you can do is: during the computation of A discard all computation of the strict lower part ( if (j() * v

From such a matrix A, if you want to assemble the full matrix A, you cannot do A = A + A'; because this will double the diag.

does that answer your concerns ?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Seb wrote:Unfortunately, it is not that simple. In the general case, our Hessian matrix of the strain energy is sparse unsymmetric. A computational intensive part is the evaluation of the material points. These are points in space where we evaluate constitutive material relations, say stress=stress(strain) with strain=strain(displacements) and displacements=interpolation of the coordinate displacement of the geometrical nodes of some "finite element" (eg. a brick or tetrahedron). The degrees of freedom (=the indices in the Hessian matrix) are the 3 translatory displacements and 3 rotations of each geometrical node. The mentioned functions may be highly nonlinear. To speed up the material evaluation we go once through each material evaluation point and add the corresponding portions to the Hessian. Moreover, the Hessian of a single material point is more or less dependent on all surrounding geometrical nodes and their degrees of freedom.
Your idea leads to the question if it is possible to sort the material points in such a way that one could use a more efficient fill order, but this is a new and rather complex topic which restrains my freedom in implementing new mechanical models in a simple way.

ok, thanks for the details.

At the moment I still use uBlas for the general assemble process and Eigen2 for the storage of constant and symmetric parts of the Hessian. I tried the RandomSetter at a - say - unfair position, that is copying uBlas data to Eigen2 where the matrix structure, indeed, is supposed to be known such that even fill() could be used.


Which uBlas matrix type do you use. I thought that for random access, uBlas also provided a matrix type based on a map ? A common way is also to accumulate all elements per column and at the end do a big sort per column and sum all entries which correspond to the same coeff. Do you use such a strategy ? this looks quite expensive.

I still have to investigate how I can make use of Google's Hash. I have seen so far that Eigen's cmake driver is looking for some library. Is there an example where it is actually used?


1 - you need to have it installed (they are just a couple of headers, and I know debian has a package for them)

2 - include google hash_map headers before eigen's Sparse header

3 - do:
RandomSetter, GoogleDenseHashMapTraits> setter(...);

RandomSetter, GoogleSparseHashMapTraits> setter(...);
Seb
Registered Member
Posts
99
Karma
0
ggael wrote:does that answer your concerns ?


Yes!


One more bug report:

SparseMatrix += SparsematrixUpperTriangular

does not work properly. I counted the nonzeros and they are less than supposed to be.

The segfault problem has been solved: wrong element access of myself.


And some funny benchmark:

I replaced the ublas::vector by its Eigen2 counterpart. During assembling process I often use the relation
myvector(i) += value;
It seems that uBlas is doing so considerably faster (added a third of the complete assembly runtime after doing this replacement). I even tried coeffRef - no change. Any ideas what is going wrong here?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Seb wrote:SparseMatrix += SparsematrixUpperTriangular

does not work properly. I counted the nonzeros and they are less than supposed to be.


sorry but I don't understand what could the problem be. I mean, the fact that the matrix is upper triangular does not change the way the sum is performed. If you actually meant:

generic_sparsemat += symmetric_sparse_mat_where_only_upper_part_is_stored;

then, yes that won't work out of the box, and I don't think that's a good idea to make it working transparently because it would hide expensive memory allocation and computation. On the other hand that'd be a good idea to add static assertions and to add a convenient method to transform such a matrix to a matrix with full storage.




And some funny benchmark:

I replaced the ublas::vector by its Eigen2 counterpart. During assembling process I often use the relation
myvector(i) += value;
It seems that uBlas is doing so considerably faster (added a third of the complete assembly runtime after doing this replacement). I even tried coeffRef - no change. Any ideas what is going wrong here?


this does not make sense since myvector(i) is equivalent to myvector[i] where myvector is a plain C vector (double* myvector;). Are you sure you compile with optimizations (-O2) and with NDEBUG defined (-DNDEBUG) ?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
I also wanted to add to bench about the filling of a sparse matrix.
Here is what I did:
1 - precompute 24M of random coefficients (coordinates + values)
2 - insert these 24M value into a 1Mx1M sparse matrix in a random order.

I benched:
Eigen SparseMatrix + RandomSetter
Eigen SparseMatrix + RandomSetter
Eigen SparseMatrix + RandomSetter
ublas mapped_matrix (+ copy to a compressed_matrix)
ublas compressed_matrix
ublas coordinate_matrix

the results are (in sec):

Eigen std::map 66.1888
Eigen google::dense 17.3848
Eigen google::sparse 43.7848
ublas mapped 63.9181
ublas compressed hours...
ublas coordinate days...

about the memory consumption, the worst is with ublas mapped and Eigen+std::map, then comes Eigen+google::dense, and far better is Eigen::sparse.

So honestly, I really don't know what to do to provide something faster than ublas, since Eigen's is already much better according to that benchmark...

The benchmark is in eigen/bench/sparse_setter.h
Seb
Registered Member
Posts
99
Karma
0
I am not in office, so I will try to recover the details from memory:

For random filling one usually takes
generalized_vector_of_vector > > (or similar - I hope I remember correctly)
and copies the final matrix into a compress_matrix. This ansatz is much faster than those you have tested.

On my example, the filling time is about the same as RandomSetter (6.5 vs. 6.2sec). The memory is almost the same as storing a compressed_matrix, whereas I discovered significant memory grow using Eigen2. I did not yet try the Google Hash (had some conflicts with gcc 4.3 - but I found a solution this morning) and will hope that it will solve most of my issues.

I will look into the benchmark you provided next week in order to apply the uBlas types I have used here.

generic_sparsemat += symmetric_sparse_mat_where_only_upper_part_is_stored;

Yes, this is exactly what I meant. Currently I help myself using RandomSetter

I am a little sorry for waking up sleeping dogs. Eigen2 is great and will provide features that want me to use it now though Eigen2 is not ready yet. It is my library of choice now - I am just accustomed to the logic behind boost and go into problems whenever both libs behave differently.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Thanks to you to beta test our sparse module and for your helpful feedback. Its good for my motivation and it already helps to improve the module a lot.

New results using ublas::vector_of_vector, 2.4M of zero in a 100k^2 matrix:

____________________time in sec____% of memory
Eigen std::map_______4.71893_______35.5
Eigen google dense____1.08547_______27.2
Eigen google sparse___3.36835_______14.0
ublas mapped________4.72259_______35.4
ublas vecofvec_______3.76116_______19.1

so clearly google's hash_map really rocks ! Note that in all case I perform a conversion to a compressed format. This conversion is included in the time but the memory only includes the wrapper object AND the precomputed data stored as a std::vector of struct {int i; int j; double v; }; which represents 5.8%.

Last edited by ggael on Sat Jan 17, 2009 4:55 pm, edited 1 time in total.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
One more result: if you manage to fill your matrix such that you fill it one row (or column) after the other, i.e., using fillrand();, then it is 10 times faster than the fastest previous method with zero memory overhead:

Eigen fillrand________0.132481____11.4

that let me think that perhaps there exists a solution in-between and I have a few idea...
Seb
Registered Member
Posts
99
Karma
0
ggael wrote:One more result: if you manage to fill your matrix such that you fill it one row (or column) after the other, i.e., using fillrand();, then it is 10 times faster than the fastest previous method with zero memory overhead:

Eigen fillrand________0.132481____11.4

that let me think that perhaps there exists a solution in-between and I have a few idea...


You mean, when assembling a column wise matrix mxn, creating a vector of n sparse vectors with dimension m which are then assembled using fillrand?
Seb
Registered Member
Posts
99
Karma
0
Now that we talked about matrix assembly some ideas came into my mind. From my understanding, the sparse data is stored in vectors. That means, a random modification (say adding some element) needs a shift of all subsequent data in memory making it rather expensive. A dynamically linked list would solve this problem, but would introduce new memory and decrease performance of linalg operations. This is what I know from uBlas: generalized_vector_of_vector (the equivalent of Eigen's RandomSetter with the only difference that generalized_vector_of_vector is a real matrix class and RandomSetter is not) is used whenever the matrix is going to be modified and may be even used in linalg operations where it is not that much slower compared with compressed_matrix.

During assemble process in FEM apps I usually have the following operations:

(1) matrix(i,j) += scalar
(2) matrix += matrix_B (or blockwise)
(3) matrix = mat_A'*mat_B*mat_A
(4) eliminate elements which are close to zero (optimizes linalg operations and memory)
(5) (1..4) with mixed symmetric/general matrices

I wonder if an efficient RandomSetter could be used to efficiently perform at least (1..3) ?
Seb
Registered Member
Posts
99
Karma
0
Where I want to head to is:
When RandomSetter is applied to a matrix, it copies the contents of this matrix to itself and copies back when the destructor is called. When you really consider to add some specialized functions, I ask myself, why RandomSetter is not a matrix_type itself which only needs some fast copy functions.

When applying points (1..3) I will need:
o an efficient random access, probably even here with configurability (such as the built-in fillrand() per vector or the Google SparseHashmap instead for each vector)
o an iterator concept allowing random insertion/deletion/modification of elements
o basic operations like +=,*=, =scalar*mat
o Other operations, particularly bindings, should not be required and should be referred to SparseMatrix

I believe this is more flexible than the current concept.

I am happy that my input is welcomed. I was in care if my many misunderstandings (and not complete reading of the documentation) lead to some annoyance. Thanks for the support so far - its like having a commercial license ;)
Seb
Registered Member
Posts
99
Karma
0
updates:

GoogleHash:
The provided rpm seems to be optimized for Debian - I can not use it on openSuse using gcc 4.3
The provided tar (version 1.3) of GoogleHash compiles fine using the built-in example files. But the symbol 'GoogleSparseHashMapTraits' is not defined. Could you please check your version? I would love to test GoogleHash as well...

UpperTriangular:
On January 15th I tried the new types and was impressed:
Without A*v multiplication the assembly took 0.11sec
With vec+=A*v where A was general it took 0.16 sec
With vec+=A*v where A was UpperTri it took only 0.13 sec
On friday I send you a call for help because the performance dropped down to 0.19 sec, but I only changed the type of vec from uBlas to Eigen2::VectorXd - hence my first idea was that it is the assembly process itself. This morning I tried the whole assembly without vec+=A*v and obtained again 0.11sec. That means: The product became slower. I guess, either the product got slower or the contents of A. Did you change anything since 01/15 what may be the source of this?
Seb
Registered Member
Posts
99
Karma
0
Seb wrote:UpperTriangular:
[..]
without vec+=A*v and obtained again 0.11sec. That means: The product became slower. I guess, either the product got slower or the contents of A. Did you change anything since 01/15 what may be the source of this?


I do now know what happened:
(1) First I assemble matrix A.
(2) Then assemble vector v1;
(3) Then I add v1 += A*v2;

Before I used Ublas to assemble A and copied it to Eigen2-storage
Now I use Eigen2's RandomSetter to assemble A from the beginning, i.e. the memory usage grew from 24% to 45% for the whole application. I guess the computer was catching and cleaning up memory while I already performed and tested steps (2..3). Repeating (2..3) gained the old times again.

I did never experienced such a behaviour. I beg for pardon and hope this observation may be interesting for your own later benchmarks...
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Hi,

for Google's sparse hash stuff, you actually need to include yourself before Eigen's Sparse header.

Also I agree with you about the need for a more flexible true sparse matrix representation. Actually, this was my initial idea, and then I dropped it in favor of something much simpler: a basic RandomSetter. The reason why RandomSetter is not a matrix is because it relies on a hash_map implementation. Since the elements of a hash_map are not sorted (in contrast to std::map), it is not possible to perform any matrix operation on it. This is also why it is so fast. You probably think that because the elements are not sorted, the copy from the RandomSetter to the sparse matrix should be very expensive (because you would have to sort all coeffs)... the trick is to store the RandomSetter in an opposite storage order than the sparse matrix, such that during the copy we have to transpose the data that has for effect to naturally sort the data without any explicit sort !

Anyway, I'll add a DynamicSparseMatrix class that will provide all the needed flexibility. After some tests, it will basically be equivalent to a std::vector of SparseVector. My initial tests show that is almost as efficient than google::dense_hash_map (assuming the number of non zero entries per vector is not too much) with much lower memory overhead. Moreover such a DynamicSparseMatrix is almost the same than a SparseMatrix, so all basic matrix operations should be as efficient.

gael.
Seb
Registered Member
Posts
99
Karma
0
Hi,

ggael wrote:for Google's sparse hash stuff, you actually need to include yourself before Eigen's Sparse header.

I just found out that GoogleSparseHashMapTraits is part of Eigen's namespace :lightbulb:! Though I do not know about 'Bits' that you used in the benchmarks. It compiles even without it anyways.

ggael wrote:Anyway, I'll add a DynamicSparseMatrix class that will provide all the needed flexibility.

Great! This is wonderful news.

Sebastian


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell