## About

A one-week, 12.5 hour course given at the Columbia Statistics Department in July 2023. Covers two-sample testing with the MMD, goodness-of-fit testing with the Kernel Stein Discrepancy, causal effect estimation with context and confounders, GANs and Generalized Energy-Based Models, and MMD diffusion models.

Day 1 slides on RKHS fundamentals.

- Definition of a kernel, how it relates to a feature space
- The RKHS norm and function smoothness

Further reading:

Support vector machines (Steinwart and Christmann, 2008),

Reproducing Kernel Hilbert Spaces in Probability and Statistics (Berlinet and Thomas Agnan, 2004),

Scattered Data Approximation (Wendland, 2004),

Learning with Kernels (Schoelkopf and Smola, 2002).

Day 2 slides on two-sample testing and the MMD.

- Distance between means in RKHS, integral probability metrics, the maximum mean discrepancy (MMD)
- Two-sample tests with MMD
- Characteristic kernels to distinguish any difference in distributions
- Kernel bandwidth tuning and minimax guarantees, learned neural net kernels for testing on images

Further reading:

A Kernel Two-Sample Test (JMLR 2012),

Hilbert Space Embeddings and Metrics on Probability Measures (JMLR 2010),

MMD Aggregated Two-Sample Test (JMLR 2023),

Learning Deep Kernels for Non-Parametric Two-Sample Tests (ICML 2020).

Day 3 slides on (relative) goodness-of-fit testing and the kernel Stein discrepancy.

- The Kernel Stein Discrepancy for fully observed models
- A relative goodness-of-fit test for latent variable models

Further reading:

A Kernel Test of Goodness-Of-Fit (ICML 2016),

A Kernelized Stein Discrepancy for Goodness-of-fit Tests and Model Evaluation (ICML 2016),

Measuring Sample Quality with Kernels (ICML 2017),

A Kernel Stein Test for Comparing Latent Variable Models (JRSS 2023).

Day 4 pt. 1 and day 4 pt. 2 slides on causal effect estimation with context and confounders

- Visible context: average treatment effect, conditional average treatment effect, average treatment on treated
- Hidden confounders: instrumental variable regression, proxy methods

Further reading: for observed confounders,

Kernel Methods for Causal Functions: Dose, Heterogeneous, and Incremental Response Curves (Biometrika 2023),

A Neural Mean Embedding Approach for Back-door and Front-door Adjustment (ICLR 2023). For hidden confounders,

Kernel Instrumental Variable Regression (NeurIPS 2019),

Learning Deep Features in Instrumental Variable Regression (ICLR 2021),

Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction (ICML 2021),

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation (NeurIPS 2021).

Day 5 pt. 1 and day 5 pt. 2 slides on kernel methods for generative modelling.

- GANs and Generalized Energy-Based Models (aka GANs that also use the critic function to generate samples)
- Diffusions using the MMD and the KL Approximate Lower-bound Estimator (KALE)

Further reading:

Demystifying MMD GANs (ICLR 2018),

Generalized Energy-Based Models (ICLR 2021),

Maximum Mean Discrepancy Gradient Flow (NeurIPS 2019),

KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support (NeurIPS 2021).

## Contact

arthur.gretton@gmail.com