Analytical Mean Squared Error Curves in Temporal
Difference Learning.
Satinder Singh   Peter Dayan
In NIPS 9, 1054-1060.
Abstract
We have calculated analytical expressions for how the bias and
variance of the estimators provided by various temporal difference
value estimation algorithms change with offline updates over trials in
absorbing Markov chains using lookup table representations. We
illustrate classes of learning curve behavior in various chains, and
show the manner in which TD is sensitive to the choice of its
step-size and eligibility trace parameters.
compressed postscript   pdf