Denoising Diffusion Probabilistic Models (DDPM)
VFDIFF is built upon the DDPM framework (Ho et al., 2020), which learns to generate data by progressively denoising a random signal. We adapt this for 2D vector fields $\mathbf{x} \in \mathbb{R}^{H \times W \times 2}$.
The process involves two Markov chains: a forward process that adds noise, and a learned reverse process that removes it.
Forward Process (Diffusion)
The forward process $q(\mathbf{x}_t | \mathbf{x}_{t-1})$ gradually adds Gaussian noise to the data according to a variance schedule $\beta_t$:
We can sample $\mathbf{x}_t$ directly from the initial data $\mathbf{x}_0$ using the closed-form property:
Where $\alpha_t = 1 - \beta_t$, $\bar{\alpha}_t = \prod_{s=1}^t \alpha_s$, and $\boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$.
Reverse Process (Generation)
The generative process $p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)$ learns to invert the diffusion. A neural network $\boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t)$ is trained to predict the noise added at each step.
By starting with pure noise $\mathbf{x}_T \sim \mathcal{N}(\mathbf{0}, \mathbf{I})$ and iteratively applying this denoising step, we generate a high-fidelity vector field $\mathbf{x}_0$.
Mathematial Notation
| Symbol | Description |
|---|---|
| $\mathbf{x}_0$ | Original clean data (ground truth vector field). |
| $\mathbf{x}_t$ | Noisy data at timestep $t$. |
| $\beta_t$ | Variance schedule parameter (how much new noise is added). |
| $\alpha_t$ | Defined as $1 - \beta_t$. |
| $\bar{\alpha}_t$ | Cumulative product $\prod_{s=1}^t \alpha_s$. Defines signal-to-noise ratio. |
| $\boldsymbol{\epsilon}$ | Standard Gaussian noise $\mathcal{N}(\mathbf{0}, \mathbf{I})$. |
| $\boldsymbol{\epsilon}_\theta$ | Neural network predicting the noise component. |
Comparison Baseline: SVD
We compare our deep learning approach against Streamline Vector Diffusion (SVD), a classical numerical method.
SVD minimizes an energy functional $E = \iint \mu |\nabla \mathbf{v}|^2 + \dots$ to propagate flow from sparse streamlines. While effective for simple interpolation, it lacks the ability to hallucinate complex non-linear structures learned by our DDPM.