Given observed samples
- GANs: model the sampling procedure of a complex distribution (in a adversarial manner).
- Likelihood-based models seek to assign a high likelihood to the observed data samples. There are autoregressive models, normalizing flows, and Variational Autoencoders (VAEs).
- Energy-based models learn an arbitrarily flexible energy function which is then normalized.
- Score-based models learn the score function of the energy-based model and then integrate it to get the energy function.
Diffusion models work as both likelihood-based and score-based models.
Let
a. Marginalization:
b. Apply the Chain rule of probability:
But this means we must either integrate out all latent
This however gives us a proxy objective to optimize the log likelihoood of the observed data Evidence Lower Bound:
$$ \mathbb{E}{q\Phi(z|x)} \left[ \log \frac{p(x,z)}{q_\Phi(z|x)} \right] \leq \log p(x) $$
where
Directly maximize the ELBO, by optimizing for the best