About

This package contains Matlab implementations of three statistical hypothesis tests for independence: a kernel test, as described in GreEtAl08a; and tests based on the L1 and log-likelihood, as described in GreEtAl08b,GreEtAl10.

We propose to test whether random variables X and Y are independent based on a sample of observed pairs (x_i,y_i). The software deals with three test statistics. The kernel test uses the Hilbert-Schmidt norm of the covariance operator between RKHS mappings of X and Y: this is called the Hilbert-Schmidt independence Criterion (HSIC). The population HSIC is zero at independence, so the sample is unlikely to be independent when the empirical HSIC is large. An intuitive explanation of HSIC and the associated test may be found in these talk slides. The second test uses the L1 distance between the joint distribution and the product of the marginals as its test statistic (computed on a partitioning of the space), and the third test uses the mutual information.

The test software returns both the test statistic and a threshold, where the latter is a user-specified quantile of the empirical HSIC distribution at independence. When the statistic exceeds this threshold, we reject the independence hypothesis. Three strategies are used to calculate the test threshold:

Code

Code may be downloaded here.

References

[GreEtAl08] Gretton, A., K. Fukumizu, C.-H. Teo, L. Song, B. Schoelkopf and A. Smola: A kernel Statistical test of independence. NIPS 21, 2007. download
[GreEtAl08b] A Gretton and L. Gyorfi: Nonparametric Independence Tests: Space Partitioning and Kernel Approaches. ALT 19, 2008. download
[GreEtAl10] A Gretton and L. Gyorfi: Consistent Nonparametric Tests of Independence. JMLR 11, 2010. download

Contact

arthur.gretton@gmail.com