Registered Member
|
I wrote a minimal code in which I multiply a VectorXcd v1 with a double/std::complex<double> g and store it in VectorXcd v2. The length of the vectors is 1024. I did this operation 2E+7 times and found that multiplying with a double takes about 10 seconds, whereas multiplying with an std::complex<double> takes about 25 seconds on a 4 GHz processor. I would like to know why multiplying with std::complex<double> is so much more expensive, and if there is a way I can improve this part of the code.
I am compiling this with a bunch of flags turned on. For clarity, I am reproducing the g++ flags that I'm using below: g++ -DNDEBUG -O3 -pg -o prog_name prog_name.cpp -I./ -I/usr/include/eigen3 -L/usr/lib64/openmpi/lib -lgsl -lgslcblas -ftree-vectorize -ffast-math -fassociative-math -funroll-loops -mfpmath=sse Thanks in advance, Athreya |
Moderator
|
Multiplying a complex with a real requires 2 muls, whereas multiplying two complexes requires 4 muls and 2 adds.
|
Registered Member
|
Thank you for your reply.
I was wondering if the reason for the slow-down was the number of times the elements of the VectorXcd were accessed for complex multiplication, rather than the number of arithmetic operations involved in the multiplication itself. This is because I find that if I do the same number of complex multiplication operations on a complex type variable (instead of a VectorXcd) and store it in another complex type variable (instead of a VectorXcd), the code is pretty much as fast as multiplication operations on double data type. In any case, I would still like to know if there is any way I can speed this up: optimal expression, optimization flags, anything you think that might speed it up. Thanks in advance, Athreya |
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]