thanks for sharing! this is very interesting. why did you use GBDTs instead of NNs?

> Thanks for sharing!

You're welcome.

> Why did you use GBDTs instead of NNs?

I mostly wanted to build an implementation to see how it worked; I was more familiar with GBDTs than NNs, so I figured I'd start with that. At its heart, AlphaZero is the marriage of two great ideas: using a Monte Carlo Tree Search (MCTS) to efficiently look ahead and find good moves and using a powerful ML model (like a ResNet) as a bot's intuition about which positions are good to be in (value network) and which moves are good when you're in which positions (policy network). So if a GBDT is powerful enough for your use case, the "ML Model" component in the MCTS+ML Model AlphaZero setup should be able to be swapped out with it if you want.

But I was also curious if GBDTs would do almost as well as a NN, because GBDTs can be much more efficient w.r.t. cost/energy. At the time when AlphaZero came out, I think it cost >$10M to train a superhuman Go algo. Nowadays KatoGo [1] can do it for <$50K. The most expensive part of training is the self play. You basically have bots play millions of games against each other and learn from the results of those games. Getting value/policy predictions each move from the ML models is a majority of the computation during self play, so if you make that more efficient, you should be able to train a bot faster/cheaper.

Check out this HN thread if you're interested in more AlphaX shenanigans: https://news.ycombinator.com/item?id=23599278

[1] https://github.com/lightvector/KataGo