“but as for me, I am tormented with an everlasting itch for things remote” — *Moby Dick*

Connections to Machine Learning $$ \nonumber \newcommand{\xb}{\boldsymbol{x}} \newcommand{\yb}{\boldsymbol{y}} \newcommand{\zb}{\boldsymbol{z}} \newcommand{\thetab}{\boldsymbol{\theta}} \newcommand{\grad}{\nabla} \newcommand{\RR}{\mathbb{R}} $$
In my previous post, I introduced the notion of proximal gradient descent and explained the way in which the “geometry” or the metric used in the proximal scheme allows us to define gradient flows on arbitrary metric spaces. This concept is important in the context of statistical mechanics because analysis of the Fokker-Planck equation naturally yields a gradient flow in the Wasserstein metric.

Wasserstein Gradient Flows and the Fokker Planck Equation (Part I) $$ \nonumber \newcommand{\xb}{\boldsymbol{x}} \newcommand{\yb}{\boldsymbol{y}} \newcommand{\grad}{\nabla} \newcommand{\RR}{\mathbb{R}} $$
The connection between partial differential equations arising in chemical physics, like the Fokker-Planck equation discussed below, and the notions of distance in the space of probability measures is a relatively young set of mathematical ideas. While the theory of gradient flows of arbitrary metric spaces can get exceedingly intricate, the fundamental ideas are not unapproachable.