This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Pure C solution faster and I dont know why...

Tags: None
(comma "," separated)
danielb
Registered Member
Posts
1
Karma
0
Dear guys.

The problem I am working on has (let's say) 3 arrays of some length (arr1, arr2, arr3), some scalars (s1, s2, s3) and the objective is to calculate s1*arr1+s2*arr2+s3*arr3. When I do this in C without eigen, I get approximately 60 percent faster code than when I use eigen with sse2 (measured via runtime in VTune). What can be wrong? The sample code I use is

Code: Select all
#include <stdio.h>
#include <iostream>
#include <Eigen\Dense>

#define MULTIS         4
#define ARR_LENs   100000
#define REP_COUNT   100000

#define PURE_C

using namespace Eigen;
using namespace std;

int main(void)
{
   double multis[ MULTIS ];      
   for ( int i = 0; i < MULTIS; i++ )
      multis[i] = (double)i;

#ifdef PURE_C
   /*   Normal C version   */
   double ** arrays = new double * [MULTIS];   
   for ( int i = 0; i < MULTIS; i++ )
   { 
      arrays[i] = new double [ARR_LENs];
      for ( int j = 0; j < ARR_LENs; j++ )
         arrays[i][j] = (double).1235 * (double)j;
   }
   
   double * res = new double [ARR_LENs];
   double val;

   for ( int rep = 0; rep < REP_COUNT; rep ++ )
   {
      for ( int bar = 0; bar < ARR_LENs; bar ++ )
      {
      val = 0;

      for ( int imulti = 0; imulti < MULTIS; imulti++ )
         val += multis[imulti] * arrays[imulti][bar];
      
      res[bar] = val;
      }
      
   }
#else
   ArrayXd * arrays = new ArrayXd [ MULTIS ];
   for ( int i = 0; i < MULTIS; i++ )
   {
      arrays[i] = ArrayXd( ARR_LENs );
      for ( int j = 0; j < ARR_LENs; j++ )
         (arrays[i])(j) = .1235 * j;
   }

   ArrayXd res = ArrayXd(ARR_LENs);

   for ( int rep = 0; rep < REP_COUNT; rep ++ )
   {
      res = multis[0] * arrays[0];
      for ( int imulti = 1; imulti < MULTIS; imulti++ )
         res += multis[imulti] * arrays[imulti];
   }
#endif
}


any hint is much appreciated.
Daniel
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
first make sure to compile with -O2 -DNDEBUG, second if MULTIS is known at compile time and small enough you should really write:

res = multis[0] * arrays[0] + multis[1] * arrays[1] + multis[2] * arrays[2] + multis[3] * arrays[3];

otherwise you cannot get advantage of expression template and thus performs much more memory loads and stores to the res Array.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]