Continuous distribution entropy
One way to generalize the concept of entropy to continuous distributions is to introduce uncertainty when measuring a sample point from a continuous distribution. Without such measurement noise, a non-degenerate continuous distribution has infinite entropy, as each sample point is a random real number.
To do so, a family of noise distributions is introduced, where \(\eta_x\) is the noise distribution associated with measuring the real number \(x\), so that the actual observed value is distributed according to \(\eta_x\).
It is then straightforward to obtain a formula for the entropy of a continuous distribution \(f\), by making use of the KL-divergence \(D_{\text{KL}}\):
$$H_\eta(f) = \int_{-\infty}^{\infty} \!\!\!\!\!{\small\text{d}x}\; f(x)\, D_{\text{KL}}(\eta_x\,||\,f)$$
This can be expanded to:
$$H_\eta(f) = \int_{-\infty}^{\infty} \!\!\!\!\!{\small\text{d}x}\; f(x) \int_{-\infty}^{\infty} \!\!\!\!\!{\small\text{d}y}\;\, \eta_x(y)\log\left(\frac{\eta_x(y)}{f(y)} \right)$$
By abuse of notation, when \(\eta\) is a single distribution, \(H_\eta\) is understood to use the family defined by \(\eta_x(y)=\eta(y-x)\).
The formula for differential entropy \(h\) can be recovered by taking various limits of \(H\):
$$h(f) \;=\; \lim_{\epsilon \rightarrow 0}\left(H_{\mathcal{N}_\epsilon}(f)-h(\mathcal{N}_\epsilon) \right)\;=\;\lim_{\epsilon \rightarrow \infty}\left(\frac{1}{2}\log(2\pi\epsilon)-H_{f}(\mathcal{N}_\epsilon)\right)$$
where \(\mathcal{N}_\epsilon\) is the normal distribution with mean \(0\) and variance \(\epsilon\).