This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Time to assign a triangular view

Tags: None
(comma "," separated)
stephane.rouvillois
Registered Member
Posts
2
Karma
0
OS

Time to assign a triangular view

Tue May 10, 2011 2:27 pm
Hello,

I am new to Eigen and have been experimenting with the triangular view in parts of my code and I am a little surprised at the time it takes to compute a triangular part compared to the full matrix. I came with the following example:
Code: Select all
#include <iostream>
#include "Eigen/Dense"
#include <boost/timer.hpp>

using namespace std;


int main() {
  int sizes[6] = {3, 6, 50, 100, 150, 200 };
  int nRun[6] = {35000000, 22000000, 170000, 25000, 7000, 3000};

  for (int i=0; i<6; i++) {
    Eigen::MatrixXd S1(sizes[i],sizes[i]), M1(Eigen::MatrixXd::Random(sizes[i],sizes[i])), M2(Eigen::MatrixXd::Random(sizes[i],sizes[i]));

    int num=nRun[i];

    boost::timer t;

    while (num--)
      S1 = M1.transpose()*M1;

    cout << "Size = " << sizes[i] << "\n\tfull M1^t*M1                 = " << t.elapsed()/double(nRun[i]) << "s";

    num=nRun[i];

    t.restart();

    while (num--)
      S1.triangularView<Eigen::Upper>() = M1.transpose()*M1;

    cout << "\n\ttriangular M1^t*M1           = " << t.elapsed()/double(nRun[i]) << "s";

    num=nRun[i];

    t.restart();

    while (num--)
      S1 = M1.transpose()*M2 + M2.transpose()*M1;

    cout << "\n\tfull M1^t*M2 + M2^t*M1       = " << t.elapsed()/double(nRun[i]) << "s";

    num=nRun[i];

    t.restart();

    while (num--)
      S1.triangularView<Eigen::Upper>() = M1.transpose()*M2 + M2.transpose()*M1;

    cout << "\n\ttriangular M1^t*M2 + M2^t*M1 = " << t.elapsed()/double(nRun[i]) << "s" << endl;
  }

  return 0;
}

leading to this output:
Code: Select all
Size = 3
    full M1^t*M1                 = 5.36857e-07s
    triangular M1^t*M1           = 5.21429e-07s
    full M1^t*M2 + M2^t*M1       = 1.05029e-06s
    triangular M1^t*M2 + M2^t*M1 = 1.05629e-06s
Size = 6
    full M1^t*M1                 = 9.40909e-07s
    triangular M1^t*M1           = 8.89545e-07s
    full M1^t*M2 + M2^t*M1       = 1.70955e-06s
    triangular M1^t*M2 + M2^t*M1 = 1.75455e-06s
Size = 50
    full M1^t*M1                 = 0.000132941s
    triangular M1^t*M1           = 7.97647e-05s
    full M1^t*M2 + M2^t*M1       = 0.000263882s
    triangular M1^t*M2 + M2^t*M1 = 0.000260529s
Size = 100
    full M1^t*M1                 = 0.0010512s
    triangular M1^t*M1           = 0.0005612s
    full M1^t*M2 + M2^t*M1       = 0.0021012s
    triangular M1^t*M2 + M2^t*M1 = 0.0020744s
Size = 150
    full M1^t*M1                 = 0.00333286s
    triangular M1^t*M1           = 0.00181571s
    full M1^t*M2 + M2^t*M1       = 0.00686143s
    triangular M1^t*M2 + M2^t*M1 = 0.00698s
Size = 200
    full M1^t*M1                 = 0.00779s
    triangular M1^t*M1           = 0.00421667s
    full M1^t*M2 + M2^t*M1       = 0.01623s
    triangular M1^t*M2 + M2^t*M1 = 0.0163667s

I can buy that my example does not really make sense on small matrices. However, for larger matrices, I am confused to see that computing the upper triangular part of M1^t*M1 leads to a time benefit close to what is expected (over computing the full result), when it does not for M1^t*M2 + M2^t*M1.
Am I missing something fundamental here?
I am running Eigen 3.0, GCC 4.1.2 (Redhat 4.1.2-50) and compiling with -O3 and -msse2
Thanks!
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
this is because the products have to be evaluated into temporaries thus discarding the efficient triangular = general_matrix_matrix_product path. For best efficiency you should write:

S1.triangularView<Eigen::Upper>() = M1.transpose()*M2;
S1.triangularView<Eigen::Upper>() += M2.transpose()*M1;

In the future we should be able to do such expression level optimization for you, but currently you have to understand that matrix products are a bit special and it is better to write expressions containing no more than a single matrix product.

Here is an updated benchmark:

Code: Select all
#include <iostream>
#include "Eigen/Dense"
#include <bench/BenchTimer.h>

using namespace std;


int main() {
  int sizes[6] = {3, 6, 50, 100, 150, 200 };
  int nRun[6] = {350000, 220000, 1700, 250, 70, 30};
  int tries = 10;
  for (int i=0; i<6; i++) {
    Eigen::MatrixXd S1(sizes[i],sizes[i]), M1(Eigen::MatrixXd::Random(sizes[i],sizes[i])), M2(Eigen::MatrixXd::Random(sizes[i],sizes[i]));

    int num = nRun[i] / 10;

    Eigen::BenchTimer t;

    BENCH(t,tries,num, S1.noalias() = M1.transpose()*M1);

    cout << "Size = " << sizes[i] << "\n\tfull M1^t*M1                 = " << t.best()/double(num) << "s";

    t.reset();

    BENCH(t,tries,num, S1.triangularView<Eigen::Upper>() = M1.transpose()*M1);

    cout << "\n\ttriangular M1^t*M1           = " << t.best()/double(num) << "s";

    t.reset();

    BENCH(t,tries,num, S1.noalias() = M1.transpose()*M2;
                       S1.noalias() += M2.transpose()*M1);

    cout << "\n\tfull M1^t*M2 + M2^t*M1       = " << t.best()/double(num) << "s";

    t.reset();

    BENCH(t,tries,num, S1.triangularView<Eigen::Upper>() = M1.transpose()*M2;
                       S1.triangularView<Eigen::Upper>() += M2.transpose()*M1);

    cout << "\n\ttriangular M1^t*M2 + M2^t*M1 = " << t.best()/double(num) << "s" << endl;
  }

  return 0;
}

stephane.rouvillois
Registered Member
Posts
2
Karma
0
OS
I effectively overlooked that one.
Thanks!


Bookmarks



Who is online

Registered users: bartoloni, Bing [Bot], Evergrowing, Google [Bot], q.ignora