Sridhar Thiagarajan: Diverse Density Estimation for Automatic Subgoal discovery

Hey guys,

* Code for BottleNeck detection using Diverse Density*

Head over to the Machine Learning section for an introduction to Diverse Density Estimation.

Need for options
Natural way to have temporally extended actions. This will facilitate efficient exploration of the state space, hence enabling faster convergence.

Also, options enable us break down large problems into smaller sub problems, facilitating better policies.
Options discovered may be reusable in the same state space to do a different task, or another task in a related state space.

Some of the plots obtained:

First Visit Frequency Plots Averaged over 30 trails.

Frequency visitation counts alone cannot be used to identify bottleneck states in the MDP, as it is a noisy process, and also may gave more weightages to stages near start state, where it is more likely to spend more time.

Negative Log of Diverse Density of States after 25 runs.Note how the subgoal locations have the least value, hence being detected as subgoal states.

Image representation of the table above.

One of the disadvantages of this framework is that diverse density depends on Physical distances.
This is may affect the algorithm's working in large state spaces, giving weightage to states which may not be frequently visited, however may lie in a region of symmetry.

Advantage of this method is its natural extension to continous state spaces, where rather than calculating the exact diverse density, we can use optimization methods to find a true concept point.

Navigation

Diverse Density Estimation for Automatic Subgoal discovery

Some of the plots obtained: