This forum has been archived. All content is frozen. Please use KDE Discuss instead.

strange speed changes when using Eigen throught interface

Tags: None
(comma "," separated)
maarten
Registered Member
Posts
6
Karma
0
I am busy with replacing our home brew (dense) matrix class with eigen in our main program for genome wide association study (http://en.wikipedia.org/wiki/Genome-wid ... tion_study) called ProbABEL (http://www.genabel.org/packages/ProbABEL). This is a successful attempt to speed up the program significantly and is a programming exercise for me personally. Right now, replacing the matrix class with eigen delivered a 4 time speedup for a common problem we have.

I wanted to keep the way to call the functions to eigen in the same way it was done in the old home brew matrix class: this prevents to rewrite all parts of code where this function is used. I succeed: it was harder to understand the old class functions than write the class to call the eigen variables.

The objects in the new interface class looks like:
Code: Select all
public:
    int nrow;
    int ncol;
    int nelements;
    Matrix<DT, Dynamic, Dynamic, RowMajor> data;//this is the eigen matrix object


However, I did some benchmarks between the implementation with the class and calling eigen directly. I wanted to check my implementation was alright. I found odd results in two instances, where the speed is not roughly equal in the two implementations. The code of the interface can be found at https://r-forge.r-project.org/scm/viewv ... ot=genabel and https://r-forge.r-project.org/scm/viewv ... ot=genabel

I pasted the to be discuses samples in a compiling file at http://pastebin.com/gwDvBrk4. The sizes of matrix are representative for reality. Only the amount of times is limited to 200 to prevent long waits(In reality it can go to millions, but I do not have the time to wait for this)

1. When filling a matrix value-by-value
when filling a matrix with eigen directly (repeated 915,064,200 times):
Code: Select all
symMat(i,j)=putvalue;

it takes eigen direct 10.16 seconds.

However if I use our interface
Code: Select all
probSym.put(putvalue,i,j);

where probSym.put() is:
Code: Select all
  template<class DT>
  void mematrix<DT>::put(DT value, int nr, int nc)
    data(nr, nc) = value;
  }

using the same operation takes 1.60 seconds with this extra function in between.(this is a speedup over 5 times!) This does not make sense to me. Can someone enlighten me?

2. Multiplying t(a)*b*a
Code: Select all
MatrixXd randomNumbers = MatrixXd::Random(2139, 3);
//symMat should be a symmetric matrix since this is a covariance matrix--> is this a target for optimization?
MatrixXd symMat = MatrixXd::Ones(2139, 2139);
MatrixXd tx = randomNumbers.transpose();
tx=tx*symMat;
newmat=tx*randomNumbers;

takes 2.82 seconds

If I use the own interface(matrices have the same size and contains same data):
Code: Select all
newmat=probRandom.data.transpose()*probSym.data*probRandom.data;

where
Code: Select all
template<class DT>
mematrix<DT> mematrix<DT>::operator*(const mematrix<DT> &M)
{
    if (ncol != M.nrow)
    {
        fprintf(stderr, "mematrix*: ncol != nrow (%d,%d) and (%d,%d)", nrow,
                ncol, M.nrow, M.ncol);
    }

    mematrix<DT> temp;
    temp.data = data * M.data;
    temp.ncol = temp.data.cols();
    temp.nrow = temp.data.rows();
    temp.nelements = temp.nrow * temp.ncol;
    return temp;
}

It runs in 3.45 seconds. This difference is not as big a the other case, but the program spends most of the time (90%+) on this example.

I compiled with:
g++ -Wall -O2 -I. -I/tmp/eigen-eigen-43d9075b23ef/ -DNDEBUG eigen_direct_vs_interface.cpp -o benchmark
with eigen 3.1.2 and g++ (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2 under 64bit ubuntu 12.10
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
Do you observe these difference in your production code? Because on a simple benchmark code like this, the compiler might do many evil things that could explain such differences. The best way to check what's going on is to add stuff like:

EIGEN_ASM_COMMENT("BEGIN_SET_I_J");

before calling, e.g., symMat(i,j)=putvalue; or probSym.put(putvalue,i,j);, and then compile with -S and look at the differences between generated assembly. There is no reason that adding an intermediate call would speed up things.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]