更多精彩内容,欢迎访问恋花蝶的博客!我是在 aigamedev.com 上的2008年第17周的 RoundUp 里看到这篇文章的推荐的,出于自己对中国象棋及其计算机博弈方面的兴趣,虽然对于围棋和围棋AI一窃不通,但还是挺仔细地阅读了这篇文章,觉得这里的内容跟自己以前了解的计算机博弈方面的知识有不同。所以把它翻译一下,为的是让自己更好地理解其中的知识。本人英语甚差,如有译错,敬请赐教。
Monte Carlo Method in Game AIs
Friday, April 25th, 2008 12:28 pm
Written by: qqqchn
As many of my classmates have posted, the Monte Carlo method isn’t actually any single method, but actually represents an entire class of methods which involve taking random samples to find a result. An interesting application my partner and I found for the Monte Carlo method was for one of the GO AIs we made for one of our other projects. (GO is an ancient Chinese Board Game that is still very popular today in East Asia, the rules and details can be found here)
像我很多同学说过的,蒙特卡罗算法不是一个算法,而是一系列关于通过随机抽样来求解的算法。我的 partner 和我发现了一个有趣的蒙特卡罗算法应用:把它用在围棋的人工智能上。(围棋是一种来自中国的古老的智力游戏,直到今天在东亚仍然非常流行,参考这里)
One of the reasons we chose to use the Monte Carlo method was because the immense number of possible moves in GO made using the Minimax Algorithm (one of the more common methods used for finding the next ”best” move in many game AIs like chess by consecutively maximizing and minimizing the score for a player up to a certain depth, more details here) far too computationally intensive when looking at more than 2 or 3 moves ahead (looking only 4 moves ahead on a mere 9×9 board takes about 81^4 > 4 million board evaluations). An interesting quote illustrating the computational intensity of GO games on a full 19×19 board is that “the number of possible GO games far exceeds the number of atoms in the universe” (more details and derivation here) Interesting Facts: Lower bound on number of possible GO games on 19×19 board is about 10^(10^48) . Upper bound is 10^(10^171).
我们选择蒙特卡罗算法的原因之一是围棋中应用极小极大算法(Minimax Algorithm,一种在棋类中常用的选择“最佳”的下一步着法的算法,参考这里)来计算2步或3步之后的着法产生的计算量就非常巨大(在9x9的棋盘上计算4步着法就需要做81^4(大于4百万)次盘面估值)。有一句非常形象的话来形象围棋(19x19)的计算复杂度:远大于宇宙中所有原子的个数(参考这里)。实际上围棋(19x19)的计算下限的 10^(10^48),上限是10^(10^171)。
So another way we used to evaluate how “good” a move is was to use the Monte Carlo method. What the Monte Carlo method does in this case to estimate how good or bad a certain move is for a given board position is to play “virtual games” illustrating what would happen if two Random AIs (AIs playing completely randomly) played out those moves. The way it does this is to start from this board position and play each of the viable moves in a fixed number of games with all subsequent moves being completely random. Then after all of the ”virtual games” are finished, we would average the total scores of each game and let it represent the “goodness” of the original move which spawned that game. Finally by choosing the move with the highest average score, the Monte Carlo AI would then play this move in the actual game itself, based on the assumption that the moves which score better over a large number of random games would be “better” moves in general.
For our project, we let our AI play about 500 virtual games for each move, which on slower computers actually can take a while, but it is still far faster than trying to use the Minimax Algorithm to look ahead just 4 moves (just over 1 million evaluations compared to 4 million +). In addition, the results of the Monte Carlo AI are pretty good as it can generally defeat most of our other AIs (Minimax AI looking 2 or 3 moves ahead and Random AIs), and it even put up a decent fight against some beginner human players as well.
在我们的项目中,我们让AI对每一个着法下500局“虚拟棋”。这也有不小的计算量,如果机器比较“破落”,可能需要计算挺长的一段时间。但它仍然比用极小极大算法向前计算4步(计算量大约是9x9棋盘计算4步(约需评估4百多万个盘面,见前文)的1百万倍)要快得多。蒙特卡罗AI 的效果很好,它通常能够打败极大极小算法AI(计算2或3步)和随机AI,这样的棋力跟初学围棋的人类差不多。
Worth noting is that one very important factor for how well the Monte Carlo method works in this case is the scoring function which you use to decide a player’s score given a certain board position. The one we used which is very straightforward and relatively simple in that it just assigns an empty spot to whoever has the closest stones to that spot, with ties being broken by number of stones near it. This isn’t the most accurate or effective scoring method, but it worked decently well enough for our purposes.
The AI we developped using Monte Carlo methods was one of the better AIs we made, but it is still nowhere near the capabilities of a decently experienced amateur human player. Especially, the AI starts losing out near the end game when tactics mean a lot more than overall strategy (which Monte Carlo and Minimax seem to do well at). And the fact that we are using random moves to play each “virtual game” means that we can get very different results each time we play it, especially near the end game where results of moves really depend on the quality of subsequent moves, which in this case are completely random.
我们开发的蒙特卡罗算法AI是我们开始的AI中较好的一个,但它与训练有素的棋手仍然相距甚远。尤其在游戏将结束时,战术比策略显得更为重要,AI 就容易输棋(蒙特卡罗算法和极小极大算法都有这种问题)。我们使用随机着法来下每一个局“虚拟棋”,所以我们每一次都会得到不同的结果。在将近结局的时候,最后的结果依赖于后续着法的质量,而在这里后续着法是完全随机的,所以效果差强人意。
GO is considered by many to be the most complicated game we know of to date, and it is very unlikely that we will be able to come even marginally close to solving the game anytime soon (want to even try writing out 10^(10^48)?). But it seems equally unlikely that people will give up on trying anytime soon either, as has been proven by human tenacity in the face of other “insurmountable” odds in the past (landing on the Moon…).
NOTE: when I said “random” in this post, I naturally mean the pseudorandom number generators computers use, which isn’t really random, but was more than close enough for our project.

GO AI Project CS478 (Gordon Briggs, Qin Chen) -unfortunately not finished yet so don’t really have any statistics yet to cite-
围棋AI项目CS478(Gordon Briggs, Qin Chen)尚未完成,所以无法提供真正的统计数据。
