Continuous distribution entropy

One way to generalize the concept of entropy to continuous distributions is to introduce uncertainty when measuring a sample point from a continuous distribution. Without such measurement noise, a non-degenerate continuous distribution has infinite entropy, as each sample point is a random real number.

To do so, a family of noise distributions is introduced, where $\eta_x$ is the noise distribution associated with measuring the real number $x$, so that the actual observed value is distributed according to $\eta_x$.

It is then straightforward to obtain a formula for the entropy of a continuous distribution $f$, by making use of the KL-divergence $D_{\text{KL}}$:

$$H_\eta(f) = \int_{-\infty}^{\infty} \!\!\!\!\!{\small\text{d}x}\; f(x)\, D_{\text{KL}}(\eta_x\,||\,f)$$

This can be expanded to:

$$H_\eta(f) = \int_{-\infty}^{\infty} \!\!\!\!\!{\small\text{d}x}\; f(x) \int_{-\infty}^{\infty} \!\!\!\!\!{\small\text{d}y}\;\, \eta_x(y)\log\left(\frac{\eta_x(y)}{f(y)} \right)$$

By abuse of notation, when $\eta$ is a single distribution, $H_\eta$ is understood to use the family defined by $\eta_x(y)=\eta(y-x)$.

The formula for differential entropy $h$ can be recovered by taking various limits of $H$:

$$h(f) \;=\; \lim_{\epsilon \rightarrow 0}\left(H_{\mathcal{N}_\epsilon}(f)-h(\mathcal{N}_\epsilon) \right)\;=\;\lim_{\epsilon \rightarrow \infty}\left(\frac{1}{2}\log(2\pi\epsilon)-H_{f}(\mathcal{N}_\epsilon)\right)$$

where $\mathcal{N}_\epsilon$ is the normal distribution with mean $0$ and variance $\epsilon$.