Gatsby Unit | Research

GATSBY COMPUTATIONAL NEUROSCIENCE UNIT

Gradient Temporal-difference Learning Algorithms

Rich Sutton

Gradient Temporal-difference Learning Algorithms

In this talk I will summarize recent work introducing the first practical learning algorithms for large-scale learning from interaction. To be practical for large-scale problems, the complexity of a learning algorithm must be no more than linear in the number of weights of its function approximator. To be applicable to large-scale learning from interaction, the algorithm should be able to learn in parallel about many policies---all policies with a significant probability of selecting the actions actually taken. The only known way to do this is by temporal-difference (TD) learning, but unfortunately all standard linear complexity algorithms, such as linear TD(lambda) and linear Q-learning, are known to be unsound (possibly diverging) under these conditions. Our new gradient-TD algorithms overcome this problem at the minor cost of a doubling of their computational complexity. A secondary cost, which we are seeking to eliminate, is that gradient-TD algorithms may be slower than conventional TD on the subset of training cases in which conventional TD is sound. The latest gradient-TD algorithms are "hybrid" in that they become equivalent to (and thus just as fast as) conventional TD in these cases.

For afters, in there is interest, I can show some applications of gradient-TD algorithms to learning from interaction in robots.