This forum has been archived. All content is frozen. Please use KDE Discuss instead.

[v3.3.5]Report a performance issue with denomal(&inf) float

Tags: None
(comma "," separated)
zineks
Registered Member
Posts
1
Karma
0
I met a performance issue when using dot() (or, prod().sum()) of Eigen3.3.5. After some efforts, I found it's caused by denormal floating point.
I have done with a piece of benchmarking:
Code: Select all
#include <iostream>
#include <iomanip>
#include <chrono>
#include <cmath>
using duration = std::chrono::duration<long double>;
template<typename E>
static inline duration measure_duration(E const& e)
{
    auto tp1 = std::chrono::steady_clock::now();
    e();
    auto tp2 = std::chrono::steady_clock::now();
    return tp2 - tp1;
}
#include <Eigen/Eigen>
template<typename F, unsigned D>
using vec = Eigen::Matrix<F, 1, D>;

template<typename F, unsigned D>
vec<F,D> gen_random_weights()
{
    vec<F,D> r = vec<F,D>::Random().array().abs(); return r / r.array().sum();
}
template<typename F, unsigned D>
vec<F,D> gen_very_small(F range = 8)
{
    return vec<F,D>::Random().unaryExpr([&](F f){return std::pow(F(10), f*range + std::numeric_limits<F>::min_exponent10);});
}
template<typename F, unsigned D>
vec<F,D> gen_very_large(F range = 8)
{
    return vec<F,D>::Random().unaryExpr([&](F f){return std::pow(F(10), f*range + std::numeric_limits<F>::max_exponent10);});
}
template<typename F, unsigned D>
void benchmark()
{
    auto loop_in_eigen = [](duration& large_dur, duration& small_dur) {
        auto weights = gen_random_weights<F, D>();
        auto large = gen_very_large<F, D>();
        auto small = gen_very_small<F, D>();
        large_dur += measure_duration([&]{ return large.dot(weights); });
        small_dur += measure_duration([&]{ return small.dot(weights); });
    };
    auto loop_ex_eigen = [](duration& large_dur, duration& small_dur) {
        auto weights = gen_random_weights<F, D>();
        auto large = gen_very_large<F, D>();
        auto small = gen_very_small<F, D>();
        large_dur += measure_duration([&]{ F sum = 0; for (unsigned i = 0; i < D; ++i) sum += large(i) * weights(i); return sum; });
        small_dur += measure_duration([&]{ F sum = 0; for (unsigned i = 0; i < D; ++i) sum += small(i) * weights(i); return sum; });
    };
    duration large_dur_in_eigen{};
    duration small_dur_in_eigen{};
    duration large_dur_ex_eigen{};
    duration small_dur_ex_eigen{};
    for (int i = 0; i < std::micro::den; ++i) {
        loop_in_eigen(large_dur_in_eigen, small_dur_in_eigen);
        loop_ex_eigen(large_dur_ex_eigen, small_dur_ex_eigen);
    }
    std::cout << std::setw(9) << large_dur_in_eigen.count() << '|' << std::setw(9) << large_dur_ex_eigen.count() << '\t';
    std::cout << std::setw(9) << small_dur_in_eigen.count() << '|' << std::setw(9) << small_dur_ex_eigen.count() << '\n';
}

int main()
{
    std::cout << std::left;
    std::cout << std::setw(9) << "dot-large" << '|' << std::setw(9) << "raw-large" << '\t';
    std::cout << std::setw(9) << "dot-small" << '|' << std::setw(9) << "raw-small" << '\n';
    std::cout << std::setfill('0');
    benchmark<float, 128>();
    benchmark<double, 128>();
    benchmark<long double, 128>();
}


My computer is new MacBook pro, and I use cmake(3.12.2), release mode, no addition flags.

build with Clang10.0.0 then run, console output:
Code: Select all
dot-large|raw-large     dot-small|raw-small
0.0699465|0.0446215     1.4991900|0.0448934
0.0444431|0.0445968     0.0441127|0.0443784
0.0438283|0.0452616     0.0445319|0.0455182

build with Gcc8.2.0 then run, console output:
Code: Select all
dot-large|raw-large     dot-small|raw-small
0.0306550|0.0311240     0.0304260|0.0307660
0.0318250|0.0314800     0.0310810|0.0309640
0.0328490|0.0332040     0.0320400|0.0324230


Obviously, eigen3.3.5 + clang10.0.0 + float type + denormal(&inf) value => performance issue.
But why?


Bookmarks



Who is online

Registered users: Bing [Bot], Google [Bot], Sogou [Bot]