links
home group publications talks teaching workshops software Gatsby Unit ELLIS Unit, UCL
Contact
arthur.gretton@gmail.com
Gatsby Computational Neuroscience Unit
Sainsbury Wellcome Centre
25 Howland Street
London W1T 4JG UK
info
I am a Professor with the Gatsby Computational Neuroscience Unit; director of the
Centre for Computational Statistics and Machine Learning
at UCL; and a Research Scientist at Google Deepmind. A short biography.
My recent research interests in machine learning include causal inference and representation learning, design and training of implicit and explicit generative models, and nonparametric hypothesis testing.
Recent papers
Regularized f-Divergence Kernel Tests. Practical kernel-based two-sample tests from the family of f-divergences. For two-sample testing, certain f-divergences are sensitive to different localized differences. For machine unlearning, we propose a relative test that distinguishes true unlearning failures from safe distributional variations. AISTATS 2026
Kernel Treatment Effects with Adaptively Collected Data . Kernel treatment effects (KTE) represent interventional outcome distributions in an RKHS and compare them via kernel distances. We present the first kernel-based framework for distributional inference under adaptive data collection AISTATS 2026
Learn to Guide your Diffusion Model. Learn the weights for classifier-free guidance of diffusion models, by minimizing the distributional mismatch between noised samples from the true conditional distribution and samples from the guided diffusion process. ICLR 2026
Controlling moments with kernel Stein discrepancies. Design of alternative diffusion Kernel Stein Discrepancies to control both moments and weak convergence, yielding the first KSDs to characterize q-Wasserstein Convergence. Annals of Applied Probability, 2025
(De)-regularized Maximum Mean Discrepancy Gradient Flow. Transport samples to a target distribution using a Wasserstein gradient flow on the deregularized MMD, which approximates the flow on the Chi^2 divergence, and inherits the property of near-global convergence for a broad class of targets. JMLR, 2025
Regularized least squares learning with heavy-tailed noise is minimax optimal . Asymptotic robustness of regularised least squares regression against heavy-tailed noise, via convergence rates for kernel ridge regression that have previously only been derived under subexponential noise. Spotlight presentation, NeurIPS 2025
On the Hardness of Conditional Independence Testing In Practice . On the challenges of applying the Kernel-Based Conditional independence test (KCI) in practice, highlighting the impact of kernel choice on the conditioning variable, and the effect of errors in the conditional mean embedding. Spotlight presentation, NeurIPS 2025
Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings . A doubly robust estimate for counterfactual outcome distributions, based on kernel mean embeddings, allowing for sampling from the counterfactual distribution and hypothesis testing. NeurIPS 2025
Density Ratio-Free Doubly Robust Proxy Causal Learning . A doubly robust approach to Proxy Causal Learning, combining treatment and outcome proxies, without density ratios. NeurIPS 2025
Demystifying Spectral Feature Learning for Instrumental Variable Regression . A generalization error bound for instrumental variable regression based on spectral-features and a two-stage least squares estimator, yielding insights into the method's performance and failure modes. NeurIPS 2025
Composite Goodness-of-fit Tests with Kernels . Kernel-based hypothesis tests, where we are interested in whether the data comes from any distribution in some parametric family: we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. JMLR 2025
Nonlinear Meta-Learning Can Guarantee Faster Rates . Theoretical guarantees for meta-learning with nonlinear representations, showing that with careful regularization, convergence rates scale with the number of tasks observed in training, and not just the samples per task. SIAM Journal on Mathematics of Data Science 2025
Recent talks and courses
Kernel Methods for Causal Effect Estimation, talk Slides and video from the workshop Causality and machine learning at the Isaac Newton Institute, Cambridge. Minimax convergence results for dose-response, heterogeneous response, and instrumental variable regression using kernel methods (June 2026).
Optimized MMD for Detecting Distribution Shift, Slides and video from the keynote talk at the workshop Catch, Adapt, and Operate: Monitoring ML Models Under Drift at ICLR 2026. Covers two ways of choosing the best kernel for testing with MMD: fusing, and aggregation (April 2026).
Proxy Variables for Causal Effect Estimation with Hidden Confounding, Slides and video from the keynote talk at Causal Learning and Reasoning (CLeaR) 2026. Covers the outcome bridge, treatment bridge, doubly robust proxy causal learning, and proxy variables for domain shift (April 2026).
Causal Effect Estimation with Context and Confounders, Slides 1 and Slides 2 from lectures at the Winter school on Representation Learning & GenAI, MBZUAI Abu Dhabi (February 2026).
Causal Effect Estimation with Context and Confounders Slides from the presentation at the Causality-XAI Winter School, Paris (October 2025).
Learning to act in noisy contexts using deep proxy learning Slides from the presentation at the Jump Trading/ELLIS CSML Seminar Series, UCL (October2025).
Causal Effect Estimation with Context and Confounders Slides from the presentation at PAISS Summer school, Inria Grenoble (September 2025).
Gradient Flows on the Maximum Mean Discrepancy Slides from
ProbNum 2025: The First International Conference on Probabilistic Numerics
, September 2025
(earlier version given at the RSS/Turing Workshop on Gradient Flows for Sampling, Inference, and Learning, London, March 2025).
Causal Effect Estimation with Context and Confounders Slides from the presentation at ESSEC Business School, Paris (March 2025).
Learning to act in noisy contexts using deep proxy learning: keynote Slides and video from the NeurIPS Workshop on Causal Representation Learning
Older news
Distributional Diffusion Models with Scoring Rules . Sample generation from diffusion models by learning the posterior distribution of clean data samples given their noisy versions (rather than just the mean), which singnificantly accelerates inference. ICML 2025
Accelerated Diffusion Models via Speculative Sampling. Speculative sampling for accelerating inference in diffusion models, by generating candidate samples using a fast draft model and accepting/rejecting. Includes a strategy where no separate draft model need be trained. ICML 2025
Learning-Order Autoregressive Models with Application to Molecular Graph Generation . Learned-order autoregressive model generation, with state-of-the-art results on the QM9 and ZINC250k molecule generation benchmarks. ICML 2025
Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions . An end-to-end framework for generating synthetic users for evaluating interactive agents designed to encourage positive behavior changes, such as in health and lifestyle coaching. ACL 2025
Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves . Simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression Bernoulli 2025
A Unified Data Representation Learning for Non-parametric Two-sample Testing . Self-supervised learning of features that reflect the underlying data manifold, for improved two-sample testing UAI 2025
Spectral Representation for Causal Estimation with Hidden Confounders . A spectral method for causal effect estimation with hidden confounders, applying to instrumental variable and proxy causal learning. AISTATS 2025
Kernel Single Proxy Control for Deterministic Confounding . Proxy causal learning generally requires two proxy variables - a treatment and an outcome proxy. When is it possible to use just one? AISTATS 2025
Density Ratio-based Proxy Causal Learning Without Density Ratios . Proxy Causal Learning (PCL) estimates causal effects from observed data in the presence of hidden confounding. We propose an alternative bridge function to achieve this. AISTATS 2025
Credal Two-Sample Tests of Epistemic Uncertainty . A new framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty. AISTATS 2025
Deep MMD Gradient Flow without adversarial training
. Adaprive MMD gradient flow trained on samples from a forward diffusion process, with competitve performance on image generation, and the ability to efficiently generate one sample at a time. ICLR 2025
Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression . Convergence analysis of deep feature instrumental variable (DFIV) regression, a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. Shows that NN approaches are better than fixed-feature (kernel or sieve) approaches when the target function has low spatial homogeneity, and that NN approaches are more sample-efficient in the Stage 1 samples. ICLR 2025