Registered Member
|
Hi All,
After profiling my codes i figured out that the function GaussianLogsumRows (shown below in code) is being called several times (inside EM algorithm) and is where most of the time is being consumed. After reading Eigen docs, i find that writing this way could give Eigen more opportunities for optimization using Expression templates and Lazy evaluation. My question is if there is yet better way of writing such expressions which could perhaps give Eigen even more opportunities for optimization. Any advice is much appreciated !
Sincerely, Parmeet |
Moderator
|
Something like this should be faster:
m_Sumik = (v_logPiek_-0.5*v_Rl_.transpose()*((m_Sigma2kl_.array().log() + m_Mukl2_.array()/m_Sigma2kl_.array()).matrix())).eval().transpose().replicate(nbSample_,1) - (0.5*m_Uil2_)*(m_Sigma2kl_.array().inverse()).matrix().transpose() + m_Uil1_*((m_Mukl_.array()/m_Sigma2kl_.array()).matrix().transpose()); Also makes sure to use Vector* type whenever that's possible instead of a Matrix* type with 1 column or 1 row. Knowing the types of v_Rl_, m_Mukl2_, m_Uil1_, etc. would help to further refine the expr. You can also decompose this expression with a few temporaries that are created anyway: tmp1 = m_Sigma2kl_.array().log() + m_Mukl2_.array()/m_Sigma2kl_.array(); tmp2 = v_logPiek_; tmp2 -= 0.5*v_Rl_.transpose()*tmp1; m_Sumik = tmp2.transpose().colwise().replicate(nbSample_) - (0.5*m_Uil2_)*(m_Sigma2kl_.array().inverse()).matrix().transpose() + m_Uil1_*((m_Mukl_.array()/m_Sigma2kl_.array()).matrix().transpose()); |
Registered Member
|
Thanks for your answer. Yes , i am able to gain good speed by making my expression removing unnecessary/repeated operations.
To my surprise, "replicate" is performing worst than multiplying vector by appropriate matrix/vector that is to say v_sumikmax*VectorReal::Ones(nbRowCluster_).transpose() is much better than v_sumikmax.replicate(1,nbRowCluster_) where v_sumikmax is a vector. Any ideas? |
Moderator
|
very strange.... is it nested into a more complex expression??
|
Registered Member
|
The expressions are shown below in the code.The second expression is pretty costly in terms of computation cost.
|
Moderator
|
indeed, the fix is to use:
v_sumikmax.rowwise().replicate(nbRowCluster_) then it becomes faster than the product. This version avoids a very expensive modulo when addressing the elements. |
Registered Member
|
Ye, its better than previous version but still performing worst than product version..
|
Moderator
|
Oh indeed, the replicate expression is not vectorized yet. This has to be fixed.
|
Registered users: Bing [Bot], claydoh, Google [Bot], rblackwell, Yahoo [Bot]