Analytical Mean Squared Error Curves in Temporal
Difference Learning.
Satinder Singh   Peter Dayan
Machine Learning, 32, 5-40.
Abstract
We provide analytical expressions governing changes to the bias and
variance of the lookup table estimators provided by various Monte Carlo
and temporal difference value estimation algorithms with off-line updates
over trials in absorbing Markov reward processes. We have used these
expressions to develop software that serves as an analysis tool: given a
complete description of a Markov reward process, it rapidly yields an
exact mean-square-error curve, the curve one would get from averaging
together sample mean-square-error curves from an infinite number of
learning trials on the given problem. We use our analysis tool to
illustrate classes of mean-square-error curve behavior in a variety of
example reward processes, and we show that although the various temporal
difference algorithms are quite sensitive to the choice of step-size and
eligibility-trace parameters, there are values of these parameters that
make them similarly competent, and generally good.
compressed postscript   pdf