Registered Member
|
I'm new of Eigen, write two test case to compare double/float matrix multiply time. Found that float version is slow than double version.
The calculaiton is that:
If the calculation replaced with:
My question is: Why float version is slow than double version, and how to resove it ?
Link command looks like this(seems no optimization):
Last edited by kde-roderick on Thu Oct 29, 2020 3:04 am, edited 1 time in total.
|
Registered Member
|
In another case, float version is slower than double version too.
Test Result:
gcc compile command:
CPU infomation processor : 31 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz stepping : 4 microcode : 0x2006906 cpu MHz : 2999.998 cache size : 25344 KB physical id : 0 siblings : 36 core id : 13 cpu cores : 18 apicid : 27 initial apicid : 27 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 ida arat bogomips : 5999.99 clflush size : 64 cache_alignment : 64 address sizes : 46 bits physical, 48 bits virtual |
Registered Member
|
You say you're using matrix-multiplication, but your code represents element-wise multiplication. Is this intended?
And there's no reason why the float implementation would be faster or slower than the double implementation in your tests. There are some interesting things to say about doubles and floats in high performance computing, but the difference in your tests is more likely random noise. With the time difference being the value of 783ms or factor of 1.3 times between one profiled test, I don't think you can really say it's slower. The variability I get between running the same code (similar to yours) over and over is about that large anyways. |
Registered users: bartoloni, Bing [Bot], Google [Bot], Yahoo [Bot]