**Marc Toussaint**

(Professor at the Department of Maths and Computer Science)

Wednesday 23rd May 2012

16:00

** **

B10 Seminar Room, Basement,

Alexandra House, 17 Queen Square, London, WC1N 3AR

**Optimal control and model-free Reinforcement Learning as KL minimization**

I'll first give a general introduction on formalizing Stochastic Optimal Control problems in terms of probabilistic inference (more precisely: KL divergence minimization), providing a unifying perspective of previous approaches. I'll then focus on novel algorithms that can be derived from this formulation, including an efficient model-free RL algorithm.

Besides providing novel efficient algorithms, the motivation of this work is to unify perspectives from control theory (including risk-sensitive control) and machine learning, and perhaps to contribute to the discussion how neural systems can solve stochastic optimal control and RL problems. This is joint work with Konrad Rawlik and Sethu Vijayakumar from U Edinburgh. Related papers are:

http://arxiv.org/abs/1009.3958

http://userpage.fu-berlin.de/~mtoussai/publications/12-rawlik-toussaint-vija

yakumar-RSS.pdf