Improving Generalisation for Temporal Difference Learning:
The Successor Representation.
Peter Dayan
Neural Computation, 5, 613-624.
Abstract
Estimation of returns over time, the focus of temporal difference (TD)
algorithms, imposes particular constraints on good function
approximators or representations. Appropriate generalisation between
states is determined by how similar their successors are, and
representations should follow suit. This paper shows how TD machinery
can be used to learn such representations, and illustrates, using a
navigation task, the appropriately distributed nature of the result.
compressed postscript   pdf