links

home group publications talks teaching workshops software Gatsby Unit ELLIS Unit, UCL

Contact
arthur.gretton@gmail.com

Gatsby Computational Neuroscience Unit

Sainsbury Wellcome Centre

25 Howland Street

London W1T 4JG UK

info

I am a Professor with the Gatsby Computational Neuroscience Unit; director of the Centre for Computational Statistics and Machine Learning at UCL; and a Research Scientist at Google Deepmind. A short biography.

My recent research interests in machine learning include causal inference and representation learning, design and training of implicit and explicit generative models, and nonparametric hypothesis testing.

Recent papers

• Proxy Methods for Domain Adaptation. Domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. We employ proximal causal learning, demonstrating that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder.
** AISTATS 2024 **

• MMD-FUSE: Learning and Combining Kernels for Two-Sample Testing Without Data Splitting. A procedure to maximise the power of a two-sample test based on the Maximum Mean Discrepancy (MMD), by adapting over the set of kernels used in defining it. For finite sets, this reduces to combining (normalised) MMD values under each of these kernels via a weighted soft maximum. The kernels can be chosen in a data-dependent but permutation-independent way, in a well-calibrated test, avoiding data splitting. Deep kernels can also be used, with features learnt using unsupervised models such as auto-encoders.
** Spotlight presentation, NeurIPS 2023 **

Recent talks and courses

• Causal Effect Estimation with Context and Confounders. Course Slides 1 and Slides 2 from
MLSS 2024, Okinawa.

• Learning to act in noisy contexts using deep proxy learning: slides from the Tandon Lecture at NYU, March 2024.

• Proxy Methods for Causal Effect Estimation with Hidden Confounders: Talk slides from the UCL Centre for Data Science Symposium, November 2023.

• Adaptive two-sample testing: Talk slides from a seminar at the Cambridge Centre for Mathematical Sciences, October 2023.

• Causal Effect Estimation with Hidden Confounders using Instruments and Proxies: Talk slides from the ELLIS RobustML Workshop, September 2023.

• Course on hypothesis testing, causality, and generative models at the Columbia Statistics Department, July 2023 (10 lectures). Slides and related reading.

• Causal Effect Estimation with Context and Confounders. Slides from keynote,
AISTATS 2023.

• Kernel Methods for Two-Sample and Goodness-Of-Fit Testing. Slides from PhyStat 2023.

Older news

• Fast and scalable score-based kernel calibration tests. A nonparametric, kernel-based test for assessing the calibration of probabilistic models with well-defined scores (which may not be normalized, eg posterior estimates in Bayesian inference). We use a new family of kernels for score-based probabilities that can be estimated without probability density samples.
** Spotlight presentation, UAI 2023 **

• A kernel Stein test of goodness of fit for sequential models.
A goodness-of-fit measure for probability densities modeling observations with varying dimensionality, such as text documents of differing lengths or variable-length sequences. The proposed measure is an instance of the kernel Stein discrepancy (KSD), which has been used to construct goodness-of-fit tests for unnormalized densities. The test does not require the density to be normalized, allowing the evaluation of a large class of models. At ICML, 2023.

• Efficient Conditionally Invariant Representation Learning.
We introduce the Conditional Independence Regression CovariancE (CIRCE), a measure of conditional independence for multivariate continuous-valued variables. CIRCE applies as a regularizer in settings where we wish to learn neural features of data X to estimate a target Y, while being conditionally independent of a distractor Z given Y. Both Z and Y are assumed to be continuous-valued but relatively low dimensional, whereas X and its features may be complex and high dimensional. Relevant settings include domain-invariant learning, fairness, and causal learning.
** Top 5% paper, ICLR 2023 **.

• A Neural Mean Embedding Approach for Back-door and Front-door Adjustment.
Estimate average and counterfactual treatment effects without having an access to a hidden confounder, by applying two-stage regression on learned neural net features. At ICLR, 2023.

• Adapting to Latent Subgroup Shifts via Concepts and Proxies.
Unsupervised domain adaptation when the source domain differs from the target domain because of a shift in the distribution of a latent subgroup. When this subgroup confounds all observed data, neither covariate shift nor label shift assumptions apply. The optimal target predictor can be non-parametrically identified with the help of concept and proxy variables available only in the source domain, and unlabeled data from the target. The identification results are constructive, immediately suggesting an algorithm for estimating the optimal predictor in the target. At AISTATS, 2023.