This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Efficient manipulation of columns of a matrix

Tags: None
(comma "," separated)
Mitmischer
Registered Member
Posts
2
Karma
0
Hello,

at the moment, we introduce Eigen into a project that formerly used primitive arrays for matrix/vector manipulation.

While that decision was great in terms of readability, performance could not quite keep up.
After profiling, I believe the following scenario to cause a heavy performance hit.

We have a class that does some encapsulation around a MatrixXd, let's call it A.
A provides a function getCol() that should return a certain column of the matrix values:

Map<const VectorXd> A::getCol(int idx){
return Map<const VectorXd>(values.data()+offset+idx*size,size);
}

This function is efficient, as soon as the vector is directly manipulated - omit the constness shortly and consider:

Map<VectorXd> col = A::getCol(4);
col[0] = 24;

==> That is not slower that primitive arrays.

However, this column is then used differently, like this:

MatrixXd B;
MatrixXd C;
VectorXd v;
/////
v = A.getCol(i)-A.getCol(j);
C = B * v;
///////

where the last two lines are called very often and therefore need to be efficient.

When profiling the function with this statement, I see memory allocation and deallocation that was not there in the former, primitive version that looks something like this:

double* v;
double** B;
double** C;
v = minus(A.getCol(i), A.getCol(j), /*...*/);
C = times(B, v, /*...*/);

It looks to me, that in the version using Eigen, temporaries or copies are created, although there is no need for them - or there is some type-casting going on. Is that right and if so, how can I speed up that part?

To make my use case more clear: I think an array of VectorXd would suit my needs best, where getCol would be implemented like:

const VectorXd& A::getCol(int idx){
return values.at(offset+idx);
}

- but if feels wrong to me not to use a matrix when I want to store a bunch of column vectors.



Strangely enough (this might be a different issue), rewriting the expression as follows is even slower - why is this?

C = B * (A.getCol(i)-A.getCol(j));
Mitmischer
Registered Member
Posts
2
Karma
0
Hello,

I got news on my problem. I am convinced that an implicit cast Map<VectorXd> -> VectorXd makes the program slow. Consider the following test case:

Code: Select all
#define EIGEN_DEFAULT_TO_ROW_MAJOR

#include <Eigen/Dense>
#include <iostream>
#include <chrono>
#include <functional>

using namespace Eigen;
using namespace std;
using namespace chrono;
using hrc = std::chrono::high_resolution_clock;

const size_t ITERATIONS = 100000;

Map<VectorXd> returnColFromMatrix(MatrixXd& mat, const int col)
{
    Map<VectorXd> column(mat.data()+3*col, 3);
    return column;
}

double workOnCol(double* loadVector, double* vec, const int i)
{
    double* result = new double[3];
    for(size_t i = 0; i < 3; ++i)
        result[i] = loadVector[i]-vec[i];

    double res = 0;
    res += result[0]/i;
    res += result[1]/i;
    res += result[2]/i;

    delete result;
    return res;
}

double workOnCol(const VectorXd& loadVector, const VectorXd& vec, const int i)
{
    VectorXd result = loadVector-vec;

    double res = 0;
    res += result[0]/i;
    res += result[1]/i;
    res += result[2]/i;

    return res;
}

Map<VectorXd> returnFirstColFromMatrix(MatrixXd& mat)
{
    Map<VectorXd> column(mat.data(), 3);
    return column;
}

double secSpan(hrc::time_point tstart, hrc::time_point tend)
{
    return (duration_cast<duration<double>>(tend-tstart)).count();
}

double tc_slow(MatrixXd& testCpy){
    VectorXd loadVector(3);
    loadVector[0] = loadVector[1] = loadVector[2] = 1;
    double resultIndirect;
    for(size_t i = 1; i < ITERATIONS; ++i)
        resultIndirect += workOnCol(loadVector, returnFirstColFromMatrix(testCpy), i);

    return resultIndirect;
}

double tc_fast(double* &arr){
    double* loadVector = new double[3];
    loadVector[0] = loadVector[1] = loadVector[2] = 1;
    double resultDirect;
    for(size_t i = 1; i < ITERATIONS; ++i)
        resultDirect += workOnCol(loadVector, arr, i);

    return resultDirect;
}


int main()
{
    high_resolution_clock::time_point tstart, tend;

    MatrixXd test(3,3);
    test(0,0) = 1;
    test(1,0) = 4;
    test(2,0) = 2;
    test(0,1) = 5;
    test(1,1) = 4;
    test(2,1) = 8;
    test(0,2) = 5;
    test(1,2) = 6;
    test(2,2) = 8;

    double* arr = new double[9];
    for(size_t i= 0; i < 9; ++i)
        arr[i] = *(test.data()+i);

    cout << "performance test:" << endl;

    tstart = hrc::now();   
    double resultDirect = tc_fast(arr);
    tend = hrc::now();

    cout << "direct: " << secSpan(tstart, tend) << "s" << endl;

    MatrixXd testCpy = test;
    tstart = hrc::now();
    double resultIndirect = tc_slow(test);
    tend = hrc::now();

    cout << "indirect: " << secSpan(tstart, tend) << "s" << endl;

    if(fabs(resultDirect - resultIndirect) < 1e-10)
        cout << "passed" << endl;
    else
        cout << "failed" << endl;

    cout << resultDirect;
    cout << " " << resultIndirect << endl;

    return 0;
}


Here, the primitive version is two times faster than eigen.

If you however, change the signature of the second workOnCon-function, which is called by tc_slow, to
Code: Select all
double workOnCol(const VectorXd& loadVector, const Map<VectorXd> vec, const int i)


the functions perform similar.
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
I did not read the entire message, so sorry if I'm off and ping me if I missed something important, but if want to accept both a VectorXd, a column of a MatrixXd or a Map<VectorXd>, then declare the argument with Ref<const VectorXd>:

void foo(Ref<const VectorXd> vec) { /* do something with vec */}

See the respective doc for further details.


Bookmarks



Who is online

Registered users: Bing [Bot], blue_bullet, Google [Bot], rockscient, Yahoo [Bot]