Arthur Gretton

links

home group publications talks teaching workshops software Gatsby Unit ELLIS Unit, UCL

Contact arthur.gretton@gmail.com
Gatsby Computational Neuroscience Unit
Sainsbury Wellcome Centre
25 Howland Street
London W1T 4JG UK

info

Arthur Gretton I am a Professor with the Gatsby Computational Neuroscience Unit; director of the Centre for Computational Statistics and Machine Learning at UCL; and a Research Scientist at Google Deepmind. A short biography.

My recent research interests in machine learning include causal inference and representation learning, design and training of implicit and explicit generative models, and nonparametric hypothesis testing.

Recent papers

• Regularized least squares learning with heavy-tailed noise is minimax optimal . Asymptotic robustness of regularised least squares regression against heavy-tailed noise, via convergence rates for kernel ridge regression that have previously only been derived under subexponential noise. Spotlight presentation, NeurIPS 2025
• On the Hardness of Conditional Independence Testing In Practice . On the challenges of applying the Kernel-Based Conditional independence test (KCI) in practice, highlighting the impact of kernel choice on the conditioning variable, and the effect of errors in the conditional mean embedding. Spotlight presentation, NeurIPS 2025
• Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings . A doubly robust estimate for counterfactual outcome distributions, based on kernel mean embeddings, allowing for sampling from the counterfactual distribution and hypothesis testing. NeurIPS 2025
• Density Ratio-Free Doubly Robust Proxy Causal Learning . A doubly robust approach to Proxy Causal Learning, combining treatment and outcome proxies, without density ratios. NeurIPS 2025
• Demystifying Spectral Feature Learning for Instrumental Variable Regression . A generalization error bound for instrumental variable regression based on spectral-features and a two-stage least squares estimator, yielding insights into the method's performance and failure modes. NeurIPS 2025
• Composite Goodness-of-fit Tests with Kernels . Kernel-based hypothesis tests, where we are interested in whether the data comes from any distribution in some parametric family: we are able to estimate the parameter and conduct our test on the same data (without data splitting), while maintaining a correct test level. JMLR 2025
• Nonlinear Meta-Learning Can Guarantee Faster Rates . Theoretical guarantees for meta-learning with nonlinear representations, showing that with careful regularization, convergence rates scale with the number of tasks observed in training, and not just the samples per task. SIAM Journal on Mathematics of Data Science 2025
• Distributional Diffusion Models with Scoring Rules . Sample generation from diffusion models by learning the posterior distribution of clean data samples given their noisy versions (rather than just the mean), which singnificantly accelerates inference. ICML 2025
• Accelerated Diffusion Models via Speculative Sampling. Speculative sampling for accelerating inference in diffusion models, by generating candidate samples using a fast draft model and accepting/rejecting. Includes a strategy where no separate draft model need be trained. ICML 2025
• Learning-Order Autoregressive Models with Application to Molecular Graph Generation . Learned-order autoregressive model generation, with state-of-the-art results on the QM9 and ZINC250k molecule generation benchmarks. ICML 2025
• Sleepless Nights, Sugary Days: Creating Synthetic Users with Health Conditions for Realistic Coaching Agent Interactions . An end-to-end framework for generating synthetic users for evaluating interactive agents designed to encourage positive behavior changes, such as in health and lifestyle coaching. ACL 2025
• Sequential Kernel Embedding for Mediated and Time-Varying Dose Response Curves . Simple nonparametric estimators for mediated and time-varying dose response curves based on kernel ridge regression Bernoulli 2025
• A Unified Data Representation Learning for Non-parametric Two-sample Testing . Self-supervised learning of features that reflect the underlying data manifold, for improved two-sample testing UAI 2025
• Spectral Representation for Causal Estimation with Hidden Confounders . A spectral method for causal effect estimation with hidden confounders, applying to instrumental variable and proxy causal learning. AISTATS 2025
• Kernel Single Proxy Control for Deterministic Confounding . Proxy causal learning generally requires two proxy variables - a treatment and an outcome proxy. When is it possible to use just one? AISTATS 2025
• Density Ratio-based Proxy Causal Learning Without Density Ratios . Proxy Causal Learning (PCL) estimates causal effects from observed data in the presence of hidden confounding. We propose an alternative bridge function to achieve this. AISTATS 2025
• Credal Two-Sample Tests of Epistemic Uncertainty . A new framework for comparing credal sets -- convex sets of probability measures where each element captures aleatoric uncertainty and the set itself represents epistemic uncertainty. AISTATS 2025
• Deep MMD Gradient Flow without adversarial training . Adaprive MMD gradient flow trained on samples from a forward diffusion process, with competitve performance on image generation, and the ability to efficiently generate one sample at a time. ICLR 2025
• Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression . Convergence analysis of deep feature instrumental variable (DFIV) regression, a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. Shows that NN approaches are better than fixed-feature (kernel or sieve) approaches when the target function has low spatial homogeneity, and that NN approaches are more sample-efficient in the Stage 1 samples. ICLR 2025

Recent talks and courses

• Causal Effect Estimation with Context and Confounders Slides from the presentation at the Causality-XAI Winter School, Paris (October2025).
• Learning to act in noisy contexts using deep proxy learning Slides from the presentation at the Jump Trading/ELLIS CSML Seminar Series, UCL (October2025).
• Causal Effect Estimation with Context and Confounders Slides from the presentation at PAISS Summer school, Inria Grenoble (September 2025).
• Gradient Flows on the Maximum Mean Discrepancy Slides from ProbNum 2025: The First International Conference on Probabilistic Numerics , September 2025 (earlier version given at the RSS/Turing Workshop on Gradient Flows for Sampling, Inference, and Learning, London, March 2025).
• Causal Effect Estimation with Context and Confounders Slides from the presentation at ESSEC Business School, Paris (March 2).
• Learning to act in noisy contexts using deep proxy learning: keynote Slides and video from the NeurIPS Workshop on Causal Representation Learning
• Learning to act in noisy contexts using deep proxy learning Talk Slides from the University of Stuttgart ELLIS Unit (2024).
• Causal Effect Estimation with Context and Confounders. Talk Slides from University of Warwick (2024).
• Causal Effect Estimation with Context and Confounders. Course Slides 1 and Slides 2 from MLSS 2024, Okinawa.
• Learning to act in noisy contexts using deep proxy learning: slides from the Tandon Lecture at NYU, March 2024.
• Proxy Methods for Causal Effect Estimation with Hidden Confounders: Talk slides from the UCL Centre for Data Science Symposium, November 2023.
• Adaptive two-sample testing: Talk slides from a seminar at the Cambridge Centre for Mathematical Sciences, October 2023.
• Causal Effect Estimation with Hidden Confounders using Instruments and Proxies: Talk slides from the ELLIS RobustML Workshop, September 2023.
• Course on hypothesis testing, causality, and generative models at the Columbia Statistics Department, July 2023 (10 lectures). Slides and related reading.
• Causal Effect Estimation with Context and Confounders. Slides from keynote, AISTATS 2023.
• Kernel Methods for Two-Sample and Goodness-Of-Fit Testing. Slides from PhyStat 2023.

Older news

• Near-Optimality of Contrastive Divergence Algorithms . Contrastive divergence learns energy-based models with the same rates (and almost the same constant) as maximum likelihood! NeurIPS 2024
• Optimal Rates for Vector-Valued Spectral Regularization Learning Algorithms . Learn conditional mean embeddings with spectral regularizers beyond Tikhonov, and avoid the saturation effect for smooth target functions. NeurIPS 2024
• Mind the Graph When Balancing Data for Fairness or Robustness. Data balancing for fairness: when does it work? When does it not? NeurIPS 2024
• Foundations of Multivariate Distributional Reinforcement Learning. Distributional successor features: zero-shot generalization of return distribution functions across finite-dimensional reward function classes NeurIPS 2024
• Towards Optimal Sobolev Norm Rates for the Vector-Valued Regularized Least-Squares Algorithm. The first optimal rates for infinite-dimensional vector-valued kernel ridge regression, including the misspecified case. JMLR 2024
• Kernel methods for causal functions: dose, heterogeneous and incremental response curves . Kernel ridge regression estimators for nonparametric causal functions, with uniform consistency and finite sample rates. Includes causal functions identified by front and back door criteria. Biometrika 2024
• A Distributional Analogue to the Successor Representation, formulates the distributional successor measure (SM) as a distribution over distributions on states, and theory connecting it with distributional and model-based RL. The distributional SM is learned from data by minimizing a two-level MMD. Spotlight presentation, ICML 2024
• Distributional Bellman Operators over Mean Embeddings, a novel algorithmic framework for distributional RL, by learning finite-dimensional mean embeddings of return distributions. Includes new methods for TD learning, asymptotic convergence theory, and a new deep RL agent that improves over baselineson the Arcade Learning Environment. ICML 2024
• Conditional Bayesian Quadrature, for estimating conditional or parametric expectations in the setting where obtaining samples or evaluating integrands is costly. Applications in Bayesian sensitivity analysis, computational finance and decision making under uncertainty. UAI 2024
• Proxy Methods for Domain Adaptation. Domain adaptation under distribution shift, where the shift is due to a change in the distribution of an unobserved, latent variable that confounds both the covariates and the labels. We employ proximal causal learning, demonstrating that proxy variables allow for adaptation to distribution shift without explicitly recovering or modeling latent variables. We consider two settings, (i) Concept Bottleneck: an additional ''concept'' variable is observed that mediates the relationship between the covariates and labels; (ii) Multi-domain: training data from multiple source domains is available, where each source domain exhibits a different distribution over the latent confounder. AISTATS 2024