Gatsby Computational Neuroscience Unit

UCL Gatsby Unit

Ilya Sutskever

(Toronto, Geoff Hinton’s group)

Thursday 8th December 2011

16:00

Gatsby Seminar Room, 4th Floor

Alexandra House, 17 Queen Square, London, WC1N 3AR

Training and Applications of Recurrent Neural Networks

Recurrent Neural Networks (RNN) form a powerful class of sequence models whose high-dimensional hidden state and nonlinear dynamics enable it to express highly complex sequence relations. Despite the
RNNs' appeal, they did not enjoy widespread use due to the extreme difficulty of their training problem on tasks with long range temporal dependencies.

In this work we resolve the longstanding problem of training RNNs on such tasks. We show that RNNs can be successfully trained to solve tasks exhibiting pathological long range temporal dependencies using the
recently introduced Hessian-Free optimizer and a novel technique termed "structural damping".

We then apply a new multiplicative RNN architecture to the problem of predicting the next character in a stream of text. The RNN made more accurate predictions than the Sequence Memoizer, and learned that
quotes and parentheses balance. Such long-range regularities are fundamentally unrepresentable by any language model based on exact string matches. The RNN was trained on 8 GPUs for 5 days, and is the largest RNNs application to date.

Joint work with James Martens and Geoff Hinton.