Last week, two superstars of similar yet very different worlds went head to head, and one could say that the result was simultaneously a win and a loss for humanity. But enough with the oxymorons; who were they? On one side was Lee Sedol, one of the greatest living players in the game of Go, a board game that originated in China over 2000 years ago (with over 40 million players worldwide), and on the other, an opponent one you would not typically expect: AlphaGo, a computer program that was developed by Google’s DeepMind. Final score: 4-1 AlphaGo.
At the Four Seasons Hotel in Seoul that week, AlphaGo not only achieved something that was not anticipated for at least another half a decade, but also opened up a whole new world of possibilities for AI, short for artificial intelligence. Go was one of the very last board games not conquered by AI due to its vast complexity, and up until the game itself no one really expected AlphaGo to even do well against Lee, currently ranked 5th internationally. The game itself is relatively simple: on a 19×19 grid, two players take turns putting down black and white stones to see who can hold the most “territory” on the board. However, even with only two types of pieces the game has 10170 possible board positions and around 10760 possible games in its game tree (a diagram that starts at the beginning of the game and depicts all possible play-throughs of the game). That’s several times larger than the number of atoms in the observable universe! As such, methods that were used by computers to conquer games like chess and checkers – games with far less possible play-throughs and positions – do not work the same way for Go. Even with AlphaGo’s capabilities of computing tens of thousands of moves per second, it simply was impossible for it to utilize the conventional method used for chess, which was to use raw computing power to examine every possible move, the possible subsequent moves, and the results stemming from that and to select the one with the highest probability of success. To get an idea of how impossible that is for even today’s supercomputers, let’s think from a machine’s perspective. With Go, where on average there are 250 choices per move, if we were to attempt in analyzing all the possibilities it would take 250 evaluations for the first move, around 62500 (2502) if it wanted to analyze 2 moves ahead, roughly 15625000 (2503) for 3 moves ahead, 3906250000 for 4, and, well, you get the idea. Of course, with every move the next number of possible choices would decrease by one, but it balances when you factor in the game rule that when you completely surround a group of your opponent’s pieces they are removed from the board. With all this complexity, Go was not a game to be beaten by brute computational force.
How does AlphaGo “think”, then? How does it determine the strength of each move and decide which moves to make? The answer lies in the principles of a particular type of tree search method, the “Monte Carlo tree search”. As we mentioned earlier, the size of Go’s game tree is simply too large to assess as a whole, so the Monte Carlo method samples simulations where random moves are made for each player until the game ends. AlphaGo uses this method to some extent; to decide which branches to sample (branches would in this case refer to moves, since each move opens up a whole new section of the tree) in each simulation, instead of randomly choosing, AlphaGo uses two types of networks, value networks and policy networks, to determine through human like “intuition” the optimal moves in each case. When given the current state of the board, policy networks output a probability value for each move and value networks provide a probability value for the current board itself, with a higher value corresponding to a better chance of leading to a win. After analyzing the simulations AlphaGo then chooses the optimal move to make. What’s interesting about these networks is that they also “learn” with experience, just like a human would. Through examining over 30 million positions from experts’ games and repeatedly playing with itself, AlphaGo was able to improve itself without its programmers having to do anything!
So now that they’ve won, what’s next? The endgame for the programmers at DeepMind is not to just conquer the game of Go, because quite frankly, having a machine that can play a board game is not exactly useful in any way. What they are really aiming to do at the end of the day is to progress artificial intelligence to eventually do things like plan medical treatments, diagnose diseases, fight cyberwars, and perform other physically and mentally demanding tasks that humans are unable to perfect. The path to such achievements is still long and arduous, but at least now they’ve gotten Go-ing.
March 25, 2016
Author: Jerry Jiao
Editor: Sherry Yuan