## Reproducing kernel Hilbert spaces in Machine Learning## Arthur Gretton (with Heishiro Kanagawa) |
---|

This course represents half of Advanced Topics in Machine Learning (aka COMP GI13 / COMP M050) from the UCL CS MSc on Machine Learning. The other half is an Introduction to Statistical Learning Theory, taught by Carlo Ciliberto.

Course announcements will be posted on the **mailing list**.

This page will contain slides and detailed notes for the kernel part of the course. The assignment may also be found here (at the bottom of the page). Note that the slides will be updated as the course progresses, and I modify them to answer questions I get in the classes. I'll put the date of last update next to each document - be sure to get the latest one. Let me know if you find errors.

There are sets of practice exercises and solutions further down the page (after the slides).

For questions on the course material, please email Heishiro Kanagawa.

- Definition of a kernel, how it relates to a feature space
- Combining kernels to make new kernels
- The reproducing kernel Hilbert space and smoothness

Lecture 3 **slides** (notes same as for lectures 1 and 2), last modified 17 Oct 2018

- Basic kernel algorithms: difference in means, kernel PCA, kernel ridge regression

Lecture 4 **slides** and **notes**, last modified 31 October 2018

- Distance between means in RKHS, integral probability metrics, the maximum mean discrepancy (MMD)
- Two-sample tests with MMD
- Choice of kernels for distinguishing distributions, characteristic kernels

Lecture 5,6 **slides** and **notes** (notes same as lecture 4), last modified 07 Nov 2018

- Covariance operator in RKHS: proof of existence, definition of norms (including HSIC, the Hilbert-Schmidt independence criterion)
- Application of HSIC to independence testing

Lecture 7 **slides** and **notes** (notes same as lecture 4), last modified 07 Nov 2018

- Application of HSIC to feature selection, taxonomy discovery.
- Introduction to independent component analysis, HSIC for ICA

Lecture 8 **slides**, last modified 21 Nov 2018

- Kernel Stein Discrepancy for testing goodness of fit of a model

Lecture 9 **slides** and **notes**, last modified 30 Nov 2018

- Introduction to convex optimization
- The representer theorem
- Large margin classification, support vector machines for clasification

Lecture 10 **Slides**, last modified 13 Dec 2017

- Infinite dimensional kernel exponential families
- Nystrom approximation for efficient solution
- Conditional infinite exponential family

Theory lectures **Slides 1**, **Slides 2** , and **notes**, last modified 20 Mar 2013

- Metric, normed, and unitary spaces, Cauchy sequences and completion, Banach and Hilbert spaces
- Bounded linear operators and the Riesz Theorem
- Equivalent notions of an RKHS: existence of reproducing kernel, boundedness of the evaluation operator
- Positive definiteness of reproducing kernels, the Moore-Aronszajn Theorem
- Mercer's Theorem for representing kernels

Supplementary lecture **slides**, last modified 22 Mar 2012

- Loss and risk, estimation and approximation error, a new interpretation of MMD
- Why use an RKHS: comparison with other function classes (Lipschitz and bounded Lipschitz)
- Characteristic kernels and universal kernels