Registered Member
|
Hi,
I'm converting some functions of our library that operate on matrices to use Eigen. I was expecting to get speed ups when using Eigen as it would be using SSE2 instructions. However, when I compare the speed of the Eigen implementations with the old ones, which are handmade code using pointer arithmetic but not SSE, the old code is up to 30% faster. I wonder if I can do something to the code I written using Eigen to speed it up? I use Visual Studio 2010 and I have enabled use of SSE2 instructions and full optimization. For example, I attach the code of a function that I have converted to use Eigen but it's still about 30% slower than the old one using handmade code: m_Transform is Eigen::Transform<double, 3, Eigen::Affine> and m_f1, m_f2, m_Scale, m_c1 and m_c2 are doubles void projection(Matrix3Xd &pointsWorld, Matrix2Xd &pointsImage) { Matrix2Xd pointsCamera; pointsCamera.noalias() = m_Transform * pointsWorld; Matrix2Xd pointsIdeal(2, pointsCamera.cols()); pointsIdeal.row(0).array() = m_f1 * pointsCamera.row(0).array() / pointsCamera.row(2).array(); pointsIdeal.row(1).array() = m_f2 * pointsCamera.row(1).array() / pointsCamera.row(2).array(); pointsImage.resize(2, pointsWorld.cols()); pointsImage.row(0).array() = m_Scale * (- pointsIdeal.row(0).array() + m_c1); pointsImage.row(1).array() = m_Scale * (pointsIdeal.row(1).array() + m_c2); } How could I speed up this? I wonder if the reason why the Eigen code is not as fast as the old one is because I'm trying to write vectorized code and maybe sometimes it would be better to do a loop, mainly if I'm working with Array as opposed to Matrixes. For example in the function above I traverse the columns of the matrixes twice, once for row(0) and another for row(1), whereas the old code traverses the columns only once processing row(0) and row(1) individual elements sequentially within a loop for the number of columns. In another case I've written code that in the vectorized form it traverses various array of the same size 12 times, whereas if I do a single loop for the number of elements and I do all the operations within the loop for each individual element I get a 15% speed up, and that without SSE2. So what's the recommended way of coding using Eigen for maximum speed? use vectorized code for matrixes and loops for arrays? Vectorized is much more readable to me but I need the speed too. Thanks for your help Martin |
Moderator
|
The slowdown might be caused by MSVC not being very good at handling the abstraction layers. Nevertheless, your current code cannot be vectorized because you explicitly broke the continuity of the elements and there are also some weird usage of Eigen.
First, m_Transform should be a Transform<double,2,Projective>. Second, you should factor all the transformations into a unique such projective object: Transform<double,2,Projective> T; T = Scaling(-m_Scale,m_Scale) * Translation2f(Vector2d(c1,c2)) * Scaling(f1,f2) * m_Transform; and then you could directly do: pointsIdeal = (T * pointsCamera).colwise().hnormalize(); and let Eigen does the job for you. For even better performance you could use row-major matrices (Matrix<double,3,Dynamic,RowMajor>) and do the final homogeneous normalization as you did. |
Registered Member
|
Hi ggael,
Thanks very much for your answer. I didn't realise I was breaking the data continuity in my code. I copied the points onto RowMajor matrixes and the same code worked faster, I guess now it was able to vectorize. Just another question. If I want to change the data from ColMajor to RowMajor in some function because the particular operation there would vectorize better RowMajor than ColMajor, what I've done is to asign the ColMajor matrix into the RowMajor matrix, ie. RowMajorMatrix = ColMajorMatrix Is this a correct way of doing this? Thanks, Martin. |
Moderator
|
yes that's correct, but this operation is costly, so that might not be always relevant to do the convert ion....
|
Registered Member
|
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]