Nici Schraudolph   Peter Dayan   Terry
Sejnowski
In LC Jain & N Baba, editors, Soft Computing Techniques in Game
Playing. Berlin, Germany: Springer-Verlag.
Abstract
The game of Go has a high branching factor that defeats the tree
search approach used in computer chess, and long-range spatiotemporal
interactions that make position evaluation extremely
difficult. Development of conventional Go programs is hampered by
their knowledge-intensive nature. We demonstrate a viable alternative
by training neural networks to evaluate Go positions via temporal
difference (TD) learning.
Our approach is based on neural network architectures that
reflect the spatial organization of both input and reinforcement
signals on the Go board, and training protocols that provide
exposure to competent (though unlabelled) play. These techniques
yield far better performance than undifferentiated networks
trained by self-play alone. A network with less than 500 weights
learned within 3000 games of 9x9 Go a position evaluation
function superior to that of a commercial Go program.
compressed postscript   pdf