本文档的原文标题为“Mastering the game of Go with deep neural networks and tree search”。
对于人工智能来说,围棋是如今最具挑战的传统游戏,因为它有着广阔的搜索空间,并且需要对棋盘中的位置与移动的困难评估。我们在此介绍一种应用电脑围棋游戏的新方法,即通过使用“价值网络”估计棋盘中的位置,再通过“策略网络”选择方向。这些深度神经网络通过将指导人类学习与加强自我学习相结合,来进行训练。它并没有计划未来的需求,这个深度神经网络运用最先进的蒙特卡洛树形检索项目来进行围棋游戏,刺激到了成百上千个随机博弈。我们也介绍了一个新型搜索算法,它包括蒙特卡洛刺激价值与策略网络。通过使用这种搜索算法,我们的AlphaGo程序与其他围棋游戏程序相比有着99.8%的胜率,也以5:0打败了欧洲围棋的冠军赢家。这是有史以来计算机程序在实际围棋游戏中第一次打败了人类专业玩家;也是一项人们认为还需十年才能完成的壮举。
更多
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.
收起
文档评论