Registered Member
|
Hi, I am trying to speed up matrix multiplication with multi-core. I enabled openMP and did some benchmark. I discovered that the default eigen multiplication speeds up well with 4 cores, but with 8 or 12 cores the speed is only marginally better than 4 cores (in rare occasions even worse). There must be some improvement that can be done about the efficiency.
I read some literature and decided to try to implement OpenMP threads myself. There is some double buffering that can be done to reduce the latency of moving data from SDRam to local memory. My question is: 1. does eigen already implement double buffering (if so why the efficiency is not so good) and 2. is it possible to have the control of low level buffering behavior and still call eigen to benefit from it? BTW I am not an expert on optimization - I just read a few papers just now. Thanks. |
Moderator
|
From my experience Eigen's matrix product scales well up to 16 cores. However, you should not count on hyper-threading. For instance, if you have 6 real cores and 12 with hyper-threading, then you not try to run your code with more than 6 cores. This is because matrix products already occupies almost 100% of the arithmetic units.
Regarding double buffering, I'm not sure that it is useful for matrix products that already have to implement advanced cache management, even in the sequential case. Nevertheless, feel free to give it a try and ask here if something is unclear in the code. Also, I'd be interested by a good reference to know more about double buffering. EDIT: here is a measurment plot with 8 real cores: https://plafrim.bordeaux.inria.fr/lib/e ... _16_02.png |
Registered users: Bing [Bot], Google [Bot], Yahoo [Bot]