Ilya Sutskever
(Toronto, Geoff Hinton’s group)
Thursday 8th December 2011
16:00
Gatsby Seminar Room, 4th Floor
Alexandra House, 17 Queen Square, London, WC1N 3AR
Training and Applications of Recurrent Neural Networks
Recurrent Neural Networks (RNN) form a powerful class of sequence
models whose high-dimensional hidden state and nonlinear dynamics
enable it to express highly complex sequence relations. Despite the
RNNs' appeal, they did not enjoy widespread use due to the extreme
difficulty of their training problem on tasks with long range temporal
dependencies.
In this work we resolve the longstanding problem of training RNNs on
such tasks. We show that RNNs can be successfully trained to solve
tasks exhibiting pathological long range temporal dependencies using the
recently introduced Hessian-Free optimizer and a novel technique termed "structural damping".
We then apply a new multiplicative RNN architecture to the problem of
predicting the next character in a stream of text. The RNN made more
accurate predictions than the Sequence Memoizer, and learned that
quotes and parentheses balance. Such long-range regularities are
fundamentally unrepresentable by any language model based on exact string
matches. The RNN was trained on 8 GPUs for 5 days, and is the largest
RNNs application to date.
Joint work with James Martens and Geoff Hinton.