This forum has been archived. All content is frozen. Please use KDE Discuss instead.

Strange dot product performance with gcc

Tags: None
(comma "," separated)
hijokpayne
Registered Member
Posts
25
Karma
0
I am getting an unusually slow performance in gcc 4.8.2 compared to clang 3.4.2 for the following piece of code:

Code: Select all
#include <Eigen/Geometry>
#include <iostream>

int main(int argc, char *argv[]) {
  unsigned long int acc = 0; 

  for (int z = 0; z < 2000; ++z)
   for (int y = 0; y < 2000; ++y)
     for (int x = 0; x < 2000; ++x) {
       acc += Eigen::Vector3i(1, 1337, 345637).dot(Eigen::Vector3i(x, y, z));
     }

  if (acc != 2774412100000000000ul)
    std::cout << "Incorrect\n"; 
  return 0;
}


Compile Flags: -O3 -DNDEBUG -march=native

Clang 3.4.2 runtime : real 0m0.003s user 0m0.003s sys 0m0.000s
Gcc 4.8.2 runtime : real 0m2.247s user 0m2.248s sys 0m0.000s

Any insight into why there is such a big difference?
User avatar
ggael
Moderator
Posts
3447
Karma
19
OS
This has nothing to do with Eigen, it's simply that in your example the compiler can aggressively remove all the loops. For instance, you can remove the inner loop as follow:
Code: Select all
for (unsigned long z = 0; z < 2000; ++z)
{
  acc += (345637ul*2000ul*2000ul) * z + 1999000ul * 2000ul;
   for (unsigned long y = 0; y < 2000; ++y)
     acc += (1337ul*2000ul) * y;
}

and this version is obviously 2000x faster! gcc 4.8 does not perform this optimization. Of course, you can go even further and remove all the loops but apparently gcc 4.9 does not go that far and only remove the most inner one. Clang is able to go that far and compute 'acc' at compile time, thus leading to the following generate code which is essentially a no-op:
Code: Select all
   .section   __TEXT,__text,regular,pure_instructions
   .globl   _main
   .align   4, 0x90
_main:                                  ## @main
   .cfi_startproc
## BB#0:
   pushq   %rbp
Ltmp2:
   .cfi_def_cfa_offset 16
Ltmp3:
   .cfi_offset %rbp, -16
   movq   %rsp, %rbp
Ltmp4:
   .cfi_def_cfa_register %rbp
   xorl   %eax, %eax
   popq   %rbp
   ret
   .cfi_endproc


.subsections_via_symbols
hijokpayne
Registered Member
Posts
25
Karma
0
Thanks ggael.


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]