New to Eigen3, simple example not vectorizing at all.

Board index

Page 1 of 1 (5 posts)

Tags:

sayguh Registered Member Posts 3 Karma 0	New to Eigen3, simple example not vectorizing at all. Wed Feb 01, 2017 2:38 am I apologize for the simple example/question. I've looked around stackoverflow and the forum a little for similar examples but haven't had much luck. I am just playing around with eigen3 now and wanted to test an FIR filter using the fir_double_h method from dspguru. Since most of the work is dot products I expected it to vectorize really well however my speed tests show a slow down when compiled with -march=native My code is at the bottom; here is the output with the different compiler options. ylb@Atlas:~/tmp$ g++ -std=c++11 -O3 -I/usr/include/eigen3 test.cc -o speedtest ylb@Atlas:~/tmp$ ./speedtest 163.656ms I'm outputing this so that the compiler doesn't outsmart me: 1.94693e+16 Then with -march=native I would expect a significant speedup ylb@Atlas:~/tmp$ g++ -std=c++11 -O3 -I/usr/include/eigen3 -march=native test.cc -o speedtest ylb@Atlas:~/tmp$ ./speedtest 173.98ms I'm outputing this so that the compiler doesn't outsmart me: 1.94693e+16 Clearly I am misunderstanding something about eigen, or gcc or the vectorization process. Any tips before I start making more complicated eigen3 based libraries? #include <Eigen/Dense> #include <vector> #include <numeric> #include <iostream> #include <chrono> using namespace std; using namespace Eigen; int main() { int numTaps = 1024; int numSamples = 10000000; // Create random input vector<float> input(numSamples); generate(input.begin(), input.end(), rand); // Generate taps, then create double taps, a vector of taps twice. VectorXf taps = VectorXf::Random(numTaps); VectorXf doubleTaps; doubleTaps.resize(2*numTaps); doubleTaps.head(numTaps) = taps; doubleTaps.tail(numTaps) = taps; // The delay line VectorXf delay = VectorXf::Zero(numTaps); float tot = 0; int state = 0; sleep(0.0); auto begin = chrono::high_resolution_clock::now(); // I would expect this to vectorize really well. The bulk of the computation is a dot product. for (const float &i : input) { delay[state] = i; tot += doubleTaps.segment(numTaps - state, numTaps).dot(delay); if (--state < 0) state += numTaps; } auto end = chrono::high_resolution_clock::now(); cerr << chrono::duration_cast<chrono::nanoseconds>(end-begin).count()/(10e6) << "ms" << endl; cout << "I'm outputing this so that the compiler doesn't outsmart me: " << tot << endl; return 0; }
sayguh Registered Member Posts 3 Karma 0	Re: New to Eigen3, simple example not vectorizing at all. Wed Feb 01, 2017 11:58 am I just wanted to add that my original post was using Eigen3.2.5 installed from the Ubuntu 14.04 repository. Installing Eigen 3.3 from source yields more positive results. Using eigen 3.3 and no march=native 111.006ms I'm outputing this so that the compiler doesn't outsmart me: 1.94693e+16 Using eigen 3.3 with march=native 71.2423ms I'm outputing this so that the compiler doesn't outsmart me: 1.94694e+16 Using eigen 3.2 and no march=native 162.678ms I'm outputing this so that the compiler doesn't outsmart me: 1.94693e+16 Using eigen 3.2 with march=native 175.795ms I'm outputing this so that the compiler doesn't outsmart me: 1.94693e+16
ggael Moderator Posts 3447 Karma 19 OS	Re: New to Eigen3, simple example not vectorizing at all. Wed Feb 01, 2017 1:15 pm The version without -march=native is already vectorized using SSE2, so at best if your CPU supports AVX and that you are using Eigen 3.3, then you could expect a x2 gain with arch=native. Moreover, with gcc, it is a bad idea to bench code within the main() function. For some unknown reason, gcc usually does weird things there.... Here is a more proper version with more guarantee on the reproducibility: Code: Select all #include <iostream> #include <vector> #include <Eigen/Dense> #include <bench/BenchTimer.h> using namespace Eigen; using namespace std; EIGEN_DONT_INLINE int foo(vector<float> &input,VectorXf &delay, VectorXf& doubleTaps) { int numTaps = doubleTaps.size()/2; float tot = 0; int state = 0; // I would expect this to vectorize really well. The bulk of the computation is a dot product. for (const float &i : input) { delay[state] = i; tot += doubleTaps.segment(numTaps - state, numTaps).dot(delay); if (--state < 0) state += numTaps; } return tot; } int main() { int tries = 2; int rep = 1; BenchTimer t; int numTaps = 1024; int numSamples = 10000000; // Create random input vector<float> input(numSamples); generate(input.begin(), input.end(), rand); // Generate taps, then create double taps, a vector of taps twice. VectorXf taps = VectorXf::Random(numTaps); VectorXf doubleTaps; doubleTaps.resize(2*numTaps); doubleTaps.head(numTaps) = taps; doubleTaps.tail(numTaps) = taps; VectorXf delay = VectorXf::Zero(numTaps); float tot = 0; BENCH(t, tries, rep, tot += foo(input, delay, doubleTaps)); std::cout << "Time: " << t.best() << "s (" << tot << ")" << std::endl; }
ggael Moderator Posts 3447 Karma 19 OS	Re: New to Eigen3, simple example not vectorizing at all. Wed Feb 01, 2017 1:18 pm BTW, 10e6 == 10^7 not 10^6
sayguh Registered Member Posts 3 Karma 0	Re: New to Eigen3, simple example not vectorizing at all. Wed Feb 01, 2017 10:18 pm ggael wrote:BTW, 10e6 == 10^7 not 10^6 What's a factor of 10 between friends! haha. Thanks for the quick reply, that makes a lot more sense now and explains the roughly 2x speedup I saw when using Eigen3.3. I'll have to take a look at the bench utility you used, seems very helpful, and less error prone than anything I'd come up with Thanks again for the info.