This forum has been archived. All content is frozen. Please use KDE Discuss instead.

[SOLVED] Are there bindings to other libs?

Tags: None
(comma "," separated)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Seb wrote:I was thinking about a comment of yours on sparse*sparse products: When developing Finite Element solvers, one often has to deal with constraint equations. As an example, a 2nd derivative of the energy function with say dimension=1e6 and many nonzeros is paired with a set of constraint equations whose gradients are very sparse, say each constraints depends on 5..50 variables, then it would be, indeed, very interesting to have at least the choice between two sparse=sparse*sparse products that are either optimized using dense vectors or sparse arrays... The default method ('*') may be tuned to rather dense vectors, which sounds reasonable for most cases.


Actually, the current sparse*sparse product automatically switches between a sparse (dynamically linked list) and a dense vector to accumulate the result. So whatever the density is, you should always get good performance.

To come back onto the symmetric sparse matrices: Right now I implemented the assemble process of a matrix A by assembling B with A=B+B' using uBlas.


right, currently B + B.transpose() is not directly doable, you have to manually force the evaluation of the expression B.transpose() such that B and B' have the same storage order, eg:

A = B + TypeOfB(B.transpose());

I'm still not sure I want to auto-magically evaluate the transpose expression, since this would hide a copy.

The product f=A*v using simple uBlas routines Bv+B'v is considerably slower than iterating over the nonzeros and doing for each iterate 'it' f(it.i)+=it.value*v(it.j); f(it.j)+=it.value*v(it.i) which is not slower than the product of the full matrix. The memory usage decreased significantly, though.

I think that with current eigen, you can simply write:

DenseVector f = B * v + B.transpose() * v;

and it should compile to exactly what you did without any memory overhead. As I said, computing A = B + B'; requires a temporary to store B' and another storage overhead to store A, so the overhead peak memory should be up to 2 * .
Seb
Registered Member
Posts
99
Karma
0
ggael wrote:Actually, the current sparse*sparse product automatically switches between a sparse (dynamically linked list) and a dense vector to accumulate the result. So whatever the density is, you should always get good performance.

Good to hear!

right, currently B + B.transpose() is not directly doable, you have to manually force the evaluation of the expression B.transpose() such that B and B' have the same storage order, eg:

A = B + TypeOfB(B.transpose());

I'm still not sure I want to auto-magically evaluate the transpose expression, since this would hide a copy.

ah, I think I expressed myself misunderstandable: I mean: I converted A into B in order to express efficient storage and matrix-vector products.

DenseVector f = B * v + B.transpose() * v;
and it should compile to exactly what you did without any memory overhead. As I said, computing A = B + B'; requires a temporary to store B' and another storage overhead to store A, so the overhead peak memory should be up to 2 * .

Good to hear - I will test this!


Btw:

I couldn't stand the temptation: I had to test the new release this morning! The speedup is - I repeat myself - impressive: I have a speedup of factor 3 for a sparse_matrix*dense_vector product compared with uBlas!

What is of concern is the time (more important) and memory usage during assembly process (random fill). I have to test the assemble efficiency using random access in detail, but on first sight it seems that there is not much speedup to uBlas. Regarding memory usage I found that Eigen2 takes more using random access. Consider a ublas matrix taking 6% RAM. While simply copying it using Eigen2 additional 16% are token (finally 4% are used). uBlas requires only negligably more RAM during filling process ending with additional 6%. One may argue that using a hash map for temporary storage always creates some overhead, but I believe it should be limited by a factor around 2 as target.
Seb
Registered Member
Posts
99
Karma
0
There is another thing that is unclear to me:

What is the most efficient way to iterate over the nonzero elements of a sparse matrix?
Seb
Registered Member
Posts
99
Karma
0
And a small bug report in between:

Code: Select all
   linear_stiffness_matrix_tri2 = TSparseMatrix( source.size1(),source.size2() );
//   linear_stiffness_matrix_tri2.startFill(1);
//   linear_stiffness_matrix_tri2.endFill();
   {
      Eigen::RandomSetter setter(linear_stiffness_matrix_tri2);


This code causes a egfault. Valgrind finds an uninitialized value. The code is working perfectly when allocating at least 1 element in the sparse matrix (commented lines).
Seb
Registered Member
Posts
99
Karma
0
And the next benchmark:

Using a large triangular sparse matrix A and a dense vector v.

I perform
f = Av + A'v
applied to uBlas objects like
Code: Select all
Map(ublas -> f.data) = A* Map (ublas -> v.data) + A.transpose()*Map (ublas -> v.data)


The time is of the same magnitude (but little slower) compared the ublas-method, i.e.:
iterate over all nonzeros of A and perform Av+A'v for each iterate (like described somewhere above in detail)
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Seb wrote:And a small bug report in between:

Code: Select all
   linear_stiffness_matrix_tri2 = TSparseMatrix( source.size1(),source.size2() );
//   linear_stiffness_matrix_tri2.startFill(1);
//   linear_stiffness_matrix_tri2.endFill();
   {
      Eigen::RandomSetter setter(linear_stiffness_matrix_tri2);


This code causes a egfault. Valgrind finds an uninitialized value. The code is working perfectly when allocating at least 1 element in the sparse matrix (commented lines).


thanks, bug fixed in rev 911464
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Seb wrote:There is another thing that is unclear to me:

What is the most efficient way to iterate over the nonzero elements of a sparse matrix?


here is an example:

Code: Select all
for (int k=0; k::InnerIterator it(m1,k); it; ++it)
  {
    it.value();
    it.row();
    it.col();
    it.index() == row-major ? it.col() : it.row();
  }


we could add a NonzeroIterator which would integrate the outer for foop, but well...
Seb
Registered Member
Posts
99
Karma
0
ggael wrote:here is an example:

Code: Select all
for (int k=0; k::InnerIterator it(m1,k); it; ++it)
  {
    it.value();
    it.row();
    it.col();
    it.index() == row-major ? it.col() : it.row();
  }


we could add a NonzeroIterator which would integrate the outer for foop, but well...


Thanks you! By replacing ublas by eigen2 in the operation Av=Bv+B'v using this iterator I am even faster than ublas (though I wonder why - I hope the result won't be different?).
User avatar
bjacob
Registered Member
Posts
658
Karma
3
Seb wrote:Thanks you! By replacing ublas by eigen2 in the operation Av=Bv+B'v using this iterator I am even faster than ublas (though I wonder why - I hope the result won't be different?).


Because Gael is very clever :)


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!
Seb
Registered Member
Posts
99
Karma
0
And his implementation speed leaves even the performance of Eigen2 behind! :star:

Last edited by Seb on Thu Jan 15, 2009 3:54 pm, edited 1 time in total.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
I have even better. Instead of decomposing A to B + B', what about:

DenseVector X;
DenseVector y = A.marked() * x;

if you only store the upper part of the symmetric sparse matrix A, then do:

DenseVector y = A.marked() * x;

if you only store the lower part:

DenseVector y = A.marked() * x;

You can even declare A like that:

SparseMatrix A;

fill only the upper part, and simply do:

y = A * x;

hope that works for you.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Hi,

now I would like to address the issue of assembling the matrix and I would be very interested in your use case. In Eigen, we currently offer 3 way to fill a sparse matrix:

1 - use startFill()/fill()/endFill() if you can set the coeff in a perfectly sorted order, i.e., if your matrix is col-major, you must first fill the 1st column, then the 2nd one, etc... and you must fill each column in an increasing row index order.

2 - use startFill()/fillrand()/endFill() : like above but you can fill each column in a random order. This approach is OK if you don't have too much nonzero per column/row since it involve a sorted insertion into a sequential array for each call to fillrand()

3 - use RandomSetter for completely random accesses.

If I'm right in your case you have a set of N nodes, and each non zero corresponds to some neighborhood relation such that you could assemble your matrix using the 2nd method:
Code: Select all
for each node j
 nei = neighbors of node j
 for each j in nei
   if (i<j)
    A.fillrand(i,j) = ...;

you could even use the 1st method if you can sort your neighbors yourself.

Actually, I'm really surprised that you have to use the random setter. To be honest, I thought that this pattern would never be used in practice.

Now, if you really need the random setter, I strongly recommend you to use one of the google's hash_map, either the dense_hash_map which is the fastest with reasonable memory usage (like ext::hash_map). If memory consumption is a problem then google's sparse_hash_map is what you need.
Seb
Registered Member
Posts
99
Karma
0
ggael wrote:I have even better. Instead of decomposing A to B + B', what about:

[..]

SparseMatrix A;
fill only the upper part, and simply do:
y = A * x;
hope that works for you.


Gael, you rock. Performance increased again over my own iterator method with an upper triangular filled SparseMatrix. There are a few things open:

(1) bug:
When iterating over the nonzeros, I obtain a segfault.

(2) questions, questions, questions:
(a) Do matrix(i,j) and matrix(j,i) refer to the same storage element?
(b) When copying a full matrix, I suspect that matrix(i,j) = matrix_source(i,j) will be okay if I do this for every element?
(c) When assembling a matrix from "full symmetric data", I suspect that matrix(i,j) += data_to_be_assembled(i,j) will be wrong since I would assemble the lower AND upper triangular part to the same storage? (= double the result?)
(d) what happens with the diagonal? If B is defined being the upper triangular of A, should I say B(i,i) = 0.5*A(i,i) ???

Thanks!
Seb
Registered Member
Posts
99
Karma
0
ggael wrote:If I'm right in your case you have a set of N nodes, and each non zero corresponds to some neighborhood relation such that you could assemble your matrix using the 2nd method:
Code: Select all
for each node j
 nei = neighbors of node j
 for each j in nei
   if (i<j)
    A.fillrand(i,j) = ...;

you could even use the 1st method if you can sort your neighbors yourself.

Actually, I'm really surprised that you have to use the random setter. To be honest, I thought that this pattern would never be used in practice.


Unfortunately, it is not that simple. In the general case, our Hessian matrix of the strain energy is sparse unsymmetric. A computational intensive part is the evaluation of the material points. These are points in space where we evaluate constitutive material relations, say stress=stress(strain) with strain=strain(displacements) and displacements=interpolation of the coordinate displacement of the geometrical nodes of some "finite element" (eg. a brick or tetrahedron). The degrees of freedom (=the indices in the Hessian matrix) are the 3 translatory displacements and 3 rotations of each geometrical node. The mentioned functions may be highly nonlinear. To speed up the material evaluation we go once through each material evaluation point and add the corresponding portions to the Hessian. Moreover, the Hessian of a single material point is more or less dependent on all surrounding geometrical nodes and their degrees of freedom.
Your idea leads to the question if it is possible to sort the material points in such a way that one could use a more efficient fill order, but this is a new and rather complex topic which restrains my freedom in implementing new mechanical models in a simple way.

At the moment I still use uBlas for the general assemble process and Eigen2 for the storage of constant and symmetric parts of the Hessian. I tried the RandomSetter at a - say - unfair position, that is copying uBlas data to Eigen2 where the matrix structure, indeed, is supposed to be known such that even fill() could be used.

I still have to investigate how I can make use of Google's Hash. I have seen so far that Eigen's cmake driver is looking for some library. Is there an example where it is actually used?
Seb
Registered Member
Posts
99
Karma
0

another bug

Fri Jan 16, 2009 9:31 am
RandomSetter does not yet work with an UpperTriangular matrix. (The result A*u is wrong by magnitudes).



Update
Forget everything! I forgot that the svn version requires the RowMajor flag being set! No wonder that something went wrong!

In the online documentation "RowMajor" is the only flag which is defined in the SparseMatrix class header. Since reading this over and over again I will always believe, sparse storage is row major by default... :-O sorry!

Last edited by Seb on Fri Jan 16, 2009 1:48 pm, edited 1 time in total.


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell