
Gradient Temporaldifference Learning Algorithms
Rich Sutton
Gradient Temporaldifference Learning Algorithms
In this talk I will summarize recent work introducing the first practical learning algorithms for largescale learning from interaction. To be practical for largescale problems, the complexity of a learning algorithm must be no more than linear in the number of weights of its function approximator. To be applicable to largescale learning from interaction, the algorithm should be able to learn in parallel about many policiesall policies with a significant probability of selecting the actions actually taken. The only known way to do this is by temporaldifference (TD) learning, but unfortunately all standard linear complexity algorithms, such as linear TD(lambda) and linear Qlearning, are known to be unsound (possibly diverging) under these conditions. Our new gradientTD algorithms overcome this problem at the minor cost of a doubling of their computational complexity. A secondary cost, which we are seeking to eliminate, is that gradientTD algorithms may be slower than conventional TD on the subset of training cases in which conventional TD is sound. The latest gradientTD algorithms are "hybrid" in that they become equivalent to (and thus just as fast as) conventional TD in these cases.
For afters, in there is interest, I can show some applications of gradientTD algorithms to learning from interaction in robots.