Every Visit Exploring Starts Monte Carlo For Blackjack

Hey guys!
This post is about applying Monte carlo exploring starts to  game of blackjack.                  

Matlab Code for Every Visit Monte Carlo on Blackjack game.

Note that this version of blackjack is slightly different from the book in RL by Rich Sutton, in that we use exactly 52 cards, whereas the book version relies on a infinite deck. In this version, all sums are not equally likely.

We use exploring starts, to ensure all state action pairs are visited frequently enough, regardless of policy being followed.

Monte Carlo methods for value estimation rely on repeated sampling to obtain the true value function (or state action values).

The advantage of this is its simple implentation, and its zero bias nature.

The problem with this is its high variance.

Both on and off policy monte carlo methods exist.
On policy monte carlo methods evaluate the policy it is following.

The off policy monte carlo methods estimate the value function of a different policy than the one which it is following.

This is the  policy obtained after 3 milion simulations.




Granted, the no usable ace could do with more simulations.The plots are however similar to those in the book for their described version of blackjack.

Overall, Monte Carlo control techniques are extremely helpful for game playing, with limited state action space and when we have simulator easily available and can afford to run many of them with little computational cost.

Cheers.