This package contains Matlab implementations of three statistical hypothesis tests for independence: a kernel test, as described in GreEtAl08a; and tests based on the L1 and log-likelihood, as described in GreEtAl08b,GreEtAl10.

We propose to test whether random variables X and Y are independent based on a sample of observed pairs (x_i,y_i). The software deals with three test statistics. The kernel test uses the Hilbert-Schmidt norm of the covariance operator between RKHS mappings of X and Y: this is called the Hilbert-Schmidt independence Criterion (HSIC). The population HSIC is zero at independence, so the sample is unlikely to be independent when the empirical HSIC is large. An intuitive explanation of HSIC and the associated test may be found in these talk slides. The second test uses the L1 distance between the joint distribution and the product of the marginals as its test statistic (computed on a partitioning of the space), and the third test uses the mutual information.

The test software returns both the test statistic and a threshold, where the latter is a user-specified quantile of the empirical HSIC distribution at independence. When the statistic exceeds this threshold, we reject the independence hypothesis. Three strategies are used to calculate the test threshold:

• Moment matching to a Gamma distribution (HSIC) fits a two-prameter Gamma distribution to the first two moments. Requires the Matlab statistics toolbox.
• Distribution-free consistent test (L1, log likelihood) uses a distribution-free test threshold, which is not computed from the sample.
• Shuffling (HSIC, L1, log likelihood) uses bootstrap resampling on the aggregated data to obtain a test threshold. Slower than the moment matching and distribution-free approaches, but can be more accurate in practice for small sample sizes.