home group publications talks teaching workshops software Gatsby Unit ELLIS Unit, UCL

Gatsby Computational Neuroscience Unit
Sainsbury Wellcome Centre
25 Howland Street
London W1T 4JG UK

bottom corner


Arthur Gretton I am a Professor with the Gatsby Computational Neuroscience Unit, and director of the Centre for Computational Statistics and Machine Learning at UCL. A short biography.

My recent research interests in machine learning include causal inference and representation learning, design and training of implicit and explicit generative models, and nonparametric hypothesis testing.

Recent papers

Efficient Conditionally Invariant Representation Learning. We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural features of data X to estimate a target Y, while being conditionally independent of a distractor Z given Y. Both Z and Y are assumed to be continuous-valued but relatively low dimensional, whereas X and its features may be complex and high dimensional. Relevant settings include domain-invariant learning, fairness, and causal learning. Oral presentation, ICLR 2023 .
A Neural Mean Embedding Approach for Back-door and Front-door Adjustment. Estimate average and counterfactual treatment effects without having an access to a hidden confounder, by applying two-stage regression on learned neural net features. At ICLR, 2023.
Adapting to Latent Subgroup Shifts via Concepts and Proxies. Unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. The optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. At AISTATS, 2023.

Recent talks and courses

• Talk on Causal Modelling with Neural and Kernel Feature Embeddings: Treatment Effects, Counterfactuals, and Proxies. Slides from a talk at CSAIL, MIT (February 2023).
• Talk on relative goodness-of-fit tests for models with latent variables. Slides and accompanying paper from the talk given at the Meeting in Mathematical Statistics CIRM, Marseille, (December 2022) and at the Statistics Department, Harvard (February 2023).
• Talk on Gradient Flows on Kernel Divergence Measures. Slides from the talk given at the Geometry and Statistics in Data Sciences Workshop IHP, Paris (November 2022).
• Course on Kernel methods for comparing distributions and training generative models. Covers the MMD, two-sample testing, GAN training with MMD, and Generalized Energy-Based Model training using the KL Approximate Lower-Bound Estimator (KALE). Given at the Online Asian Machine Learning School (December 2022). Slides 1, Slides 2
• Talk on Causal Modelling with Distribution Embeddings: Treatment Effects, Counterfactuals, Mediation, and Proxies. At the Deeplearn Summer School 2022, with talks slides and video.

Older news

Optimal Rates for Regularized Conditional Mean Embedding Learning, with short video. Establishes consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of Y given X into a target reproducing kernel Hilbert space H_Y. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between between H_X and L2, to H_Y. A lower bound on the learning rate is provided, which shows that the obtained upper bound is optimal. Oral presentation, NeurIPS 2022 .
KSD Aggregated Goodness-of-fit Test with code, a statistical test of goodness-of-fit, called KSDAgg, which aggregates multiple Kernel Stein Discrepancy (KSD )tests with different kernels. KSDAgg avoids splitting the data to perform kernel selection (which leads to a loss in test power), and rather maximises the test power over a collection of kernels. We provide non-asymptotic guarantees on the power of KSDAgg: we show it achieves the smallest uniform separation rate of the collection, up to a logarithmic term. At NeurIPS 2022.
Efficient Aggregated Kernel Tests using Incomplete U-statistics with code, a series of computationally efficient nonparametric tests for the two-sample, independence, and goodness-of-fit problems, using the Maximum Mean Discrepancy (MMD), Hilbert Schmidt Independence Criterion (HSIC), and Kernel Stein Discrepancy (KSD), respectively. Our test statistics are incomplete U-statistics, with a computational cost that interpolates between linear time in the number of samples, and quadratic time, as associated with classical U-statistic tests. The three proposed tests aggregate over several kernel bandwidths to detect departures from the null on various scales, without sample splitting. At NeurIPS 2022.
Causal Inference with Treatment Measurement Error: A Nonparametric Instrumental Variable Approach. A kernel-based nonparametric estimator for the causal effect when the cause is corrupted by error, by generalizing estimation in the instrumental variable setting. The proposed method, MEKIV, improves over baselines and is robust under changes in the strength of measurement error and to the type of error distributions. Oral presentation, UAI 2022.

bottom corner