Gatsby Computational Neuroscience Unit CSML University College London

Gatsby Computational Neuroscience Unit
Sainsbury Wellcome Centre
25 Howland Street
London W1T 4JG UK

+44 (0)7795 291 705

bottom corner


Arthur Gretton I am a Professor with the Gatsby Computational Neuroscience Unit, and director of the Centre for Computational Statistics and Machine Learning at UCL. A short biography.

My recent research interests in machine learning include the design and training of generative models, both implicit (e.g. GANs) and explicit (high/infinite dimensional exponential family models and energy-based models), nonparametric hypothesis testing, survival analysis, causality, and kernel methods.

Recent news

• Talk on Gradient Flows on Kernel Divergence Measures. Slides from the talk given at the Geometry and Statistics in Data Sciences Workshop IHP, Paris (November 2022).
• Talk on Causal Modelling with Distribution Embeddings: Treatment Effects, Counterfactuals, Mediation, and Proxies. At the Deeplearn Summer School 2022, with talks slides and video.
Self-Supervised Learning with Kernel Dependence Maximization with code, maximizes dependence between representations of transformations of an image and the image identity, while minimizing the kernelized variance of those representations. This framework yields a new understanding of InfoNCE, which implicitly approximates SSL-HSIC. Our approach also gives us insight into BYOL. SSL-HSIC matches the current state-of-the-art for standard linear evaluation on ImageNet, semi-supervised learning and transfer to other classification and vision tasks such as semantic segmentation, depth estimation and object recognition. At NeurIPS 2021.
Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation with code, for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. At NeurIPS 2021.
KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint Support with code, gradient flow for KALE (KL approximate lower-bound estimator) a relaxed approximation to the Kullback-Leibler (KL) divergence. When using a kernel estimate, the KALE continuously interpolates between the KL and the Maximum Mean Discrepancy (MMD). The KALE inherits from the limiting KL a greater sensitivity to mismatch in the support of the distributions, compared with the MMD, making KALE a good choice when the target distribution is supported on a low-dimensional manifold. At NeurIPS 2021.
• NeurIPS 2021 Workshop on Machine Learning meets Econometrics (MLECON). ML for economics offers the potential for better predictions and handling larger, multimodal data to help address substantive questions in economics and the social sciences. Non- and semi-parametric econometrics naturally interface with modern machine learning. The MLECON workshop aims to serve as an interface for experts from both disciplines to meet and exchange ideas and for participants to present their works-in-progress that lies at this intersection.
A kernel log-rank test of independence for right-censored data with code. A general non-parametric independence test between right-censored survival times and covariates, which may be multivariate. The test statistic can be interpreted as a supremum of a potentially infinite collection of weight-indexed log-rank tests, with weight functions belonging to a reproducing kernel Hilbert space (RKHS) of functions; or as a statistic related to the Hilbert-Schmidt Independence Criterion (HSIC). Extensive investigations on both simulated and real data suggest that our testing procedure generally performs better than competing approaches in detecting complex non-linear dependence. In JASA.

Older news

• Talk on Causal Modelling with Kernels: Treatment Effects, Counterfactuals, Mediation, and Proxies. At the Stats Winter 2022 Workshop, with talks slides and video.
Generalized energy-based models at ICLR 2021, with talks slides and video from Georgia Tech (Oct. 2021). Combines a GAN generator as base measure and a generalized energy function derived from the GAN critic. Samples are drawn using a Kinetic Langevin MCMC procedure.
• Probability Divergences and Generative Models slides and video from the MLSS 2021 summer school in Taipei. Earlier slides from the PAISS 2021 summer school in Grenoble. Introduces probability metrics in the context of training generative models: first integral probability metrics, then variational lower bounds on f-divergences that "look a lot like" integral probability metrics. Applications in training GANs and Generalized Energy-Based Models.
Proximal Causal Learning with Kernels: Two-Stage Estimation and Moment Restriction. Causal effect estimation in the presence of unobserved confounding, but where proxies for the latent confounder(s) are observed. We propose two kernel-based methods for nonlinear causal effect estimation in this setting: (a) a two-stage regression approach, and (b) a maximum moment restriction approach. At ICML 2021.

bottom corner