Printable Version of this PageHome PageRecent ChangesSearchSign In
"Hierarachical Control and Learning for Markov Decision Processes" by Ron Parr

The topic of the thesis has direct application to robotic path planning, the area of research I'm interested in.

A Markov Decision Process is a 4-tuple consists of a set of states, a set of actions, the probability transition function between states with each action and the rewards for visiting the states. The objective is to maximize total reward while reaching the goal state. The transition function is defined to observe the Markov property, i.e., probability of transition to next state only depends on the current state and chosen action.

While the Markov property ensures a few relative simple method to solve a problem modeled in MDP (value and policy iteration), it may force the problem to be modeled with a huge number of states. For robot navigation in 2 dimensional space, the space must be discretized (usually divided into equal squares). It's possible that any narrow passage would force finer discretization of the state space. In addition, any variables (light condition, GPS accuracy, etc) that may affect the state transition need to be discretized multiplied to the current state space. I.e. if we want to model the light condition in 5 discrete intervals and there are already 100 states, then the state space would need to increase to 500 states.

Of course, most real problems have features that can be exploited. A typical problem for navigation is the natural division of state space for when there is only 1 pathway between 2 regions (i.e. a door way). The problem does not need to be considered as a whole, but can be considered as seperate problems of navigating to the pathway and navigating from the pathway to the goal. Also, any discretized variables modeled into the MDP (referred to as factored MDP) gives structure to the state space.

Instead of manually model a larger problem into separate smaller problems, Parr proposed automated hierarchical and problem decomposition of large MDP problem. The Hierarchies of Abstract Machine was proposed ((in)formally defined as machines that can make calls to other machines as a subroutine) to help transfer procedural knowledge between machines. A number of problem decomposition algorithms were also proposed to help reused learned policies and apply them to similar domains (i.e. learning a particular skill set for navigating around an obstacle and apply it to a similar encounter of a different obstacle).

This is one of the providing-"tools" type of thesis in that although the author certainly had in mind some of the applications of the thesis work, the proposed algorithms targeted the academic problem of solving a large sized MDP. While the result may not have the "sex" appeal of an thesis solving an interdisciplinary problem, the broad technical background required to develop the work cannot be understated. As an application to robotic navigation to driver-less vehicles (which has broad social and policy implications), this work may serve as an important piece of the puzzle.

Last modified 4 December 2007 at 10:55 am by shumin