This forum has been archived. All content is frozen. Please use KDE Discuss instead.

log sum exp

Tags: None
(comma "," separated)
User avatar
marton
Registered Member
Posts
10
Karma
0
OS

log sum exp

Wed Apr 29, 2009 11:34 am
Hi,

I just started using Eigen recently and am really impressed by it, thanks a lot for this great piece of software! I use Eigen for a machine learning project, in which I need the log-sum-exp function quite often. Before coming to the question, please let me give a short introduction:

In Bayesian statistics, one often has to deal with very low probabilities. In order to avoid numerical underflow, the usual trick is to perform all calculations in the log domain. Let's say we've just calculated some discretized probability density and want to normalize it in order to make it sum up to 1:
Code: Select all
logp  = log(p);
p    /= sum(p);                //this is how you would normalize the pdf the usual way
logp -= log(sum(exp(logp)));   //...and this is how to do it in the log domain


This is one of the situations where the logsumexp function appears. Unfortunately, because of the small coefficients in logp, the exp() of which would be 0, the calculation can not be performed directly, one rather uses the following trick:
Code: Select all
function logsumexp(x):
  m = max(x);
  return log(sum(exp(x-m)))+m;


Using this algorithm, the smallest coefficient in the sum will be exp(0)=1 and we get meaningful results. So far so good. Now for the question: what if we are interested in logsumexp(x+y), where x and y are huge vectors. Could Eigen's cost model determine which one of the following to possibilities is better:
Code: Select all
// 1. Calculate m on the fly and save memory
  m = max(x+y);
  return log(sum(exp(x+y-m)))+m;
// 2. Waste some memory on z but calculate it only once
  z = x+y;
  m = max(z);
  return log(sum(exp(z-m)))+m;


Maybe it would even make sense to take a chunk of the data that is slightly smaller than the cache, find and remember m1, perform s1=sum(exp()), do the next chunk and if the new m2 is greater, correct s1, and so on. Just thinking....
User avatar
bjacob
Registered Member
Posts
658
Karma
3

RE: log sum exp

Wed Apr 29, 2009 12:19 pm
No, Eigen won't do this transformation for you. Actually, all reductions (like taking the sum or the max of all coeffs of a vector) are done immediately, they are not lazy.

On the plus side, Eigen makes it easier for you to do precisely what you want, for example if you want to do this cache-friendly optimization you have in mind. Indeed, once you have coded your logsumexp() function, making it work only on a segment of a vector is just a matter of doing .segment(start, size). You could even write a logsumexp() function that takes an Eigen expression, instead of just a plain vector, and pass .segment(start,size) directly to it. and finally, if you take a small enough size, you could even make it a compile-time constant and use .segment(start), and Eigen would automatically unroll the loop for you.


Join us on Eigen's IRC channel: #eigen on irc.freenode.net
Have a serious interest in Eigen? Then join the mailing list!


Bookmarks



Who is online

Registered users: Bing [Bot], Evergrowing, Google [Bot], rblackwell