Tutorial on energy models and Deep Belief Networks
Topics: Energy models, causal generative models vs. energy models in overcomplete ICA, contrastive divergence learning, score matching, restricted Boltzmann machines, deep belief networks
Presentation notes: .pdf
This is a scan of my notes for the tutorial. I'm afraid they're not very polished... The first part on ICA is missing, but you can have a look at Rich's notes that refer to the same paper.
Task for the coding session: Implementing a deep belief network for handwritten letters classification and generation.
Data: Begin using only a few letters from the Binary Alphadigits. This dataset
is quite small, so that learning is fast and the hidden layers only need about 100
units each to reach a good perfomance. Test the complete system on
the MNIST database (split the data into mini-batches).
You'll find a copy of the Binary Alphadigits and MNIST databases at Sam Roweis' site.
My solution: .tgz (in python)
- Hinton, G., Welling, M., Teh, Y. and Osindero, S. A new view of ICA. Proceedings of ICA-2001. 2001.
- Hinton, G. Training products of experts by minimizing contrastive divergence. Neural Computation, 14(8), 2002.
- Teh, Y., Welling, M., Osindero, S. and Hinton, G. Energy-based models for sparse overcomplete representations. Journal of Machine Learning Research, 4:1235-1260, 2003.
- Carreira-Perpignan, M. and Hinton, G. On contrastive divergence learning. In R. C. adn Z. Gaharamani, editor, Artificial intelligence and statistics, pages 33-41. Fort Lauderdale, 2005. Society for Artificial Intelligence and Statistics.
- Hyvarinen, A. Estimation of non-normalized statistical models using score matching. Journal of Machine Learning Research, 6:695-709, 2005.
- Hinton, G., Osindero, S. and Teh, Y. A fast learning algorithm for deep belief nets. Neural Computation, 18(4):1527-1554, 2006.
- Hinton, G. and Salakhutdinov, R. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.
- Hyvarinen, A. Some extensions of score matching. Computational Statistics & Data Analysis, 51(5):2499-2512, 2006.