With Matlab I am trying to evaluate differential entropies. These are integrals like
$$\int_\mathbb{R} p(x) \log (p(x)) \mathrm{d}x$$
where $p(x)$ is a probability density function. My $p(x)$ is computed by an external function. I tried to define
g = @(x) x.*log(x);and compute the entropy using
integral(@(x) g(p(x)),0,inf)(my $p(x)$ is defined on $[0;+\infty)$). But this fails due to NaN issues. However, even if I take the precaution to define g as an external function and set NaN outputs to zero by a conditional statement, the computation fails. Why?
Consider the following simple example:
I define g as an external function to avoid the $0 \log 0$ issue as follows:
function y = g(x)
if isnan(x.*log(x)) y = 0;
else y = x.*log(x);
endNow let's try to compute the entropy integral for $p(x) = e^{-x}$, which should obviously return $-1$. But it doesn't:
>> p = @(x) exp(-x);
>> integral(@(x) g(p(x)),0,inf)
Warning: Infinite or Not-a-Number value encountered.
> In funfun/private/integralCalc>iterateScalarValued at 349 In funfun/private/integralCalc>vadapt at 133 In funfun/private/integralCalc at 84 In integral at 89What is Matlab's problem?
$\endgroup$2 Answers
$\begingroup$The reason is because you've not defined your function g correctly in Matlab. The way it's written will not trap NaN cases. As per the documentation for the integral function:
For scalar-valued problems, the function y = fun(x) must accept a vector argument, x, and return a vector result, y. This generally means that fun must use array operators instead of matrix operators. For example, use .* (times) rather than * (mtimes). If you set the 'ArrayValued' option to true, then fun must accept a scalar and return an array of fixed size.
The integral function is vectorized and passes multiple values to your integrand function on each iteration. You've correctly vectorized x.*log(x), but your if statement that checks for NaN will only be triggered if the first element happens to be NaN. Using so-called logical indexing, you can rewrite g concisely as
function y = g(x)
y = x.*log(x);
y(isnan(y)) = 0;Then the following code
p = @(x)exp(-x);
integral(@(x)g(p(x)),0,Inf)now returns -1.000000000000000.
Also, though it may not be numerically scaled as well, you could use the mathematically equivalent g = @(x)log(x.^x); instead, which cannot evaluate to NaN.
MATLAB does not simplify your integrand - it treats the integrand exactly as specified. So in your first example, MATLAB tries to build a table of the function
$$ \exp(-x)\exp(-x)\log(\exp(-x)) $$
rather than $$ -x \exp(-x)\exp(-x) $$
as you would expect.
The first integrand will cause trouble as $x$ becomes very large due to floating point issues. For instance, if you get rid of all the function handles and just try
integral(@(x) exp(-2.*x).*log(exp(-x)),0, inf)
you get a Nan and the same message.
However, considering how rapidly your integral dies off,
integral(@(x) exp(-2.*x).*log(exp(-x)),0, 20)
does the job just fine. In fact, using an upper limit of 15 returns a double precision integral of -0.249999999999275, which is far more accurate than anything you might need.
Edited to add: If you want more technical details, for values of x more negative than a certain number, you cause an underflow - a number whose exponent in base 2 is smaller than -1022 cannot be represented in the usual exponent-mantissa format. Underflows can be rounded off to 0 automatically (this is probably what MATLAB does), so numbers like log(exp(-1400)) evaluate to log(0), throwing the NaN.
$\endgroup$ 1