
7:24 Your explanation of MCTS is not correct. For one instance of simulation: It picks the top move recommended by the network (greedy) most of the time, with random moves some of the time (epsilon). Then it walks into that move and repeats the same. It does it to completion. Then it backs up and keeps track of win vs visit ratio for each state as shown in the picture. It repeats this whole process 1600 times. As it is performing these walkthroughs it trains the networks and updates the values. So eventually, the more often you see a state, it will statistically converge to optimal value. MCTS runs to completion, its not a depth pruning algorithm. Temporal Difference stops somewhere in the middle, this was not used in AGZ. MCTS algorithm is discussed by David Silver in his lecture #8 towards the end.


We, humans, run simulations in our heads all the time because sometimes simple intuitions are not enough... So, I guess, it isn't surprising that inclusion of Monte Carlo Tree Search would always drastically improve performance no matter how good the value function estimates are, even with the help of deep learning... The question is how to search more efficiently and also how to build an efficient model...


I've been programming board game engines for 25 years and I've followed the development of CNNs to play go quite closely. This video is a really good description of the AlphaGo Zero paper, with very clear explanations. Well, the explanation of MCTS was completely wrong, but other than that this video was great. I'll make sure to check out more from this channel.


