Learning and Planning in Partially Observable MDPs

Denis Steckelmacher
Vrije Universiteit Brussel
Friday, 22 February, 2019 - 11:30
NO-5, Solvay Room

Many real-world settings require the control of a Partially Observable Markov Decision Process, where the controller does not have access to the complete state of the process. In this talk, I will review several approaches that allow to learn or produce high-quality controllers for POMDPs. These approaches are divided in three families: ones that try to infer the hidden state of the process from observations, and then fall back to MDP planners/learners; ones that implement an explicit form of memory, leading to a new Markovian process that evolves in lockstep with the POMDP; and approaches that simply consider the history of observation as the state of a big MDP, of which the POMDP is a subregion. This last approach is approximate, but easy to implement using modern neural network techniques, and is therefore the prevailing approach in the learning literature. Finally, this talk will present the Predictive State Representation framework, considered to be more expressive than the POMDP framework, while slightly easier to solve.