Dayan & Sejnowski (1994)

TD() Converges with Probability 1.

Peter Dayan Terry Sejnowski
Machine Learning, 14, 295-301.

Abstract

The methods of temporal differences (Samuel, 1959; Sutton 1984, 1988) allow agents to learn accurate predictions about stationary stochastic future outcomes. The learning is effectively stochastic approximation based on samples extracted from the process generating the agent's future. Sutton (1988) proved that for a special case of temporal differences, the expected values of the predictions converge to their correct values, as larger samples are taken, and Dayan (1992) extended his proof to the general case. This paper proves the stronger result that the predictions of a slightly modified form of temporal difference learning converge with probability one, and shows how to quantify the rate of convergence.

compressed postscript pdf

TD() Converges with Probability 1.

Peter Dayan Terry Sejnowski Machine Learning, 14, 295-301.

Abstract

back to: top publications

Peter Dayan Terry Sejnowski
Machine Learning, 14, 295-301.