The Power of Self-Learning Systems
Date Posted:
March 27, 2019
Date Recorded:
March 20, 2019
CBMM Speaker(s):
Demis Hassabis All Captioned Videos CBMM Special Seminars
Description:
Demis Hassabis, Co-Founder & CEO, Google DeepMind
Abstract: Demis Hassabis discusses the capabilities and power of self-learning systems. He illustrates this capability with some of DeepMind's recent breakthroughs and discusses the implications of cutting-edge AI research for scientific and philosophical discovery.
Speaker Biography: Demis is a former child chess prodigy, who finished his A-levels two years early before coding the multi-million selling simulation game Theme Park aged 17. Following graduation from Cambridge University with a Double First in Computer Science he founded the pioneering video games company Elixir Studios producing award winning games for global publishers such as Vivendi Universal. After a decade of experience leading successful technology startups, Demis returned to academia to complete a PhD in cognitive neuroscience at UCL, followed by postdocs at MIT and Harvard, before founding DeepMind. His research into the neural mechanisms underlying imagination and planning was listed in the top ten scientific breakthroughs of 2007 by the journal Science. Demis is a 5-times World Games Champion, a Fellow of the Royal Society of Arts, and the recipient of the Royal Society’s Mullard Award and the Royal Academy of Engineering's Silver Medal.
TOMASO POGGIO: I'm Tomaso Poggio . I'm the Director of CBMM, which is hosting this talk. Back in 2011, Josh and I organized-- Josh is there-- one of the symposium for the MIT 150 anniversary. And the symposium was brains, minds, and machines. And the last day was the most exciting day of the symposium. [INAUDIBLE] about the marketplace for intelligence. We invited some of the big companies, like IBM, Microsoft, Google, and so on, and a couple of startups.
One was Mobileye and you have heard from Amnon yesterday, about the state of autonomous driving. The other one was DeepMind. And Demis spoke then, he had been a post-doc for a short period of time with me at Harvard. And he was at CBMM, in what was already then at the heart of machine learning at MIT. Demis spoke then about the unofficial business plan of DeepMind, which was to create the first AI in the virtual world of games.
Since then, you have come back a few times to keep updated about the progress of DeepMind. Last time I think was three years ago. And this was shortly after AlphaGo won the competition in South Korea in Seoul against Lee Sedol. And so this time, Demis may just very well declare a victory in the virtual world of games, because they've done everything, you know, beyond all hopes. Maybe we'll issue a new business plan, a new challenge for the real world. We'll see.
I have often said that intelligence, the problem of intelligence, is the greatest problem in science today and tomorrow. This also means it's not going to be particularly easy to solve it. I think, it's not only one problem but like biology, the science of life, in the same way, the science of intelligence is a lot of problems. We'll need a lot of breakthroughs, not one, but many Nobel prizes. So both Demis and CBMM share the view that neuroscience and cognitive science will be at the core of progress, that ultimately will lead to develop-- to understanding better human intelligence and developing intelligent machines.
The journey may be longer than many people think. But it will be very rewarding in many different ways, intellectually and otherwise. And we should enjoy this journey and enjoy making history. So please welcome Demis.
[APPLAUSE]
DEMIS HASSABIS: Thanks Tommy. Hope you can all hear me OK. Thanks all of you for coming. It's amazing to see you all here. And it's always really fun for me to come back to MIT and catch up with old friends, and also see how much amazing work CBMM and all of you are doing, both in neuroscience as well as machine learning, and in the crossover, which is very close to my heart. So today what I thought I'd talk about-- I mean there's lots of things I could discuss that we've done since I was last here three years ago.
But I've titled the talk, The Power of Self-learning Systems. Because I think what we and others have shown over the last few years is how surprisingly useful they can be, and how powerful quite simple idea can end up being. So I'm just going to talk about-- begin with framing. You know I've always thought about AI as effectively bifurcating into two different types of approaches. And this is when we think about AI and the history of AI. And on the one hand, you know, we can try to build expert systems that rely on hardcoded, knowledge, that basically are handcrafted with the solution to a problem.
And they're usually inspired by logic systems and mathematics. And that, for a long time, was the way that most people attempted to build AI. And the problem with that is that those kinds of systems can't deal with the unexpected. They basically usually fail catastrophically, if something hasn't already been programmed into them, if they encounter something unusual or that the programmer had not foreseen. And the other interesting issue is that, of course, they're limited in scope to the sorts of solutions that we are able to articulate, we as the human programmers. So, of course, by definition, they're limited to these pre-programmed solutions.
On the other hand-- and I think why this is such an exciting moment in scientific history and why you're all here-- is that there's been this sort of big renaissance, if you like, of the learning system approach. Where instead of programming solutions, we build systems that are able to learn for themselves from first principles, learn their own solutions to problems. And what we hope is that these systems will be sufficiently general. They can generalize to all sorts of new tasks, perhaps tasks they've never seen before, and actually indeed even solve things that we as human scientists are not able to do. Right.
So maybe the promise of these systems is that they could go beyond what we're able to solve on our own. And I'm going to talk about that in the latter part of this talk. Now [INAUDIBLE] in learning systems, and why I think what CBMM is doing is so great and also what we do at DeepMind is that we can look to the best learning system we have, the brain, the human brain, and see if we can be inspired by understanding that better, inspired about new algorithms that we could use, new representations, new architectures that are inspired by neuroscience and our understanding of the brain, incomplete though that is.
And I would say, not only can we be inspired, but we can also validate algorithms that we've come up with ourselves from perhaps for mathematical approaches or physics approaches in orthogonal approaches from neuroscience. If we build a system like that, reinforcement learning I think is a good example of that, which was pushed forward in engineering quite a lot in the '80s and '90s. We can then see, you know, when we find that the brain implements a form of DTD learning in some famous results in the '90s, you know, we can be confident then that reinforcement learning is plausibly part of a kind of overall AI solution.
And so we can push harder on those techniques, if we know that the brain also uses those techniques. And that point of validation is often overlooked. But it's very important when you are running or in charge of a big engineering program, you know, where do you decide to put more effort in. If something doesn't work, which you know things often don't work first time or even many times in research and engineering, how much more should you push that approach. And if you can take some guidance from the brain and some comfort that these systems are-- the brain does implement them, then that can be a very important source of information.
So as Tommy mentioned, last time I was here and maybe some of you were in the audience three years ago now, we'd just come off-- fresh off the back of our big AlphaGo match that we played in Seoul, and really overturned a lot of the traditional thinking in the game of Go but also was very surprising to many people in AI. And many of the experts sort of proclaimed that this was a decade before they expected it to happen. So I'm not going to talk about AlphaGo today.
But if you're interested in a sort of behind the scenes look at what happened with AlphaGo and the whole project, I'd recommend this documentary that was done by-- it's an award winning documentary, It was done by a great filmmaker, who followed us behind the scenes and had access behind the scenes all during this journey. And it's on Netflix and Amazon Now and other places. And I'd recommend you take a look at that if you're interested in that story.
So today I'm going to focus on what we've been doing in the last 12 months. And it's been quite a watershed year for us at DeepMind. And we've had quite a few interesting breakthroughs that I'm going to cover today. So the first thing I'm going to talk about is AlphaZero. And AlphaZero is our latest incarnation of the AlphaGo program, a sort of series of programs. So I'm just going to show you, for those of you who don't know, the lineage of AlphaZero and how we came about working on this project.
So first of all, there was the original AlphaGo. So this is 3-- three plus years ago now. And AlphaGo was amazingly strong. But we had to do a bootstrapping step, which was to learn from human games first, by predicting what human players, not experts but strong amateurs, that we downloaded some games from online databases. And we tried to initially train our neural network systems to predict what the human player would do, sort of by mimicking these human players. And then once it got sort of reasonably strong, like weak amateur level, then we started this process of self-play and self-learning by playing against itself to improve.
But what we wanted to do is-- and the way we work at DeepMind is, we try to-- we always got generality in mind, that's the final goal, like the kind of purer system you can build with the least amount of assumptions in it. And maximally general, that works across as many possible domains as possible, without any adjustment. So we start off, can Go be cracked at all? That was obviously the original question. Then once we did AlphaGo, then we looked at all the components of the system. And we saw-- we sort of tried some systematic work then to remove all the things that were specific to Go, right, that were still left remaining specific to Go.
So the next step was what we called AlphaGoZero. And what we did here is remove this initial step of needing some human games to bootstrap from from the beginning. So AlphaGoZero started from totally random play, and improved all the way to become stronger than the original AlphaGo just through self-play, just from playing millions of games against itself and modifying its neural network based on whether it won or lost. So-- and so zero here refers to the use of zero human knowledge, domain specific knowledge. Obviously, we need the human knowledge to make the system. But we didn't use any Go specific knowledge was required.
And that's important. Because eventually if we want to use these systems for real world problems, you know you may not have a treasure trove of data, like millions of human amateur games that are freely downloadable on the internet. You may not have access. Or you may not-- there may not be that kind of data. So you might have to generate it itself. And then the final step, which we're going to talk a little bit about in this section, was AlphaZero. So now we drop the word Go, because AlphaZero is able to play any two-player perfect information game, to world champion or higher standard.
And I put the asterisk on any, because we only tried it with three. The three-- the three biggest games that are played professionally. So chess, which I'm going to talk about lots and uses the canonical example. Go, so AlphaZero can be AlphaGoZero. And Shogi, Japanese chess, which is a really amazing version of chess. It's very different from Western chess, but extremely complex as well and played professionally in Japan.
So I'm going to talk about AlphaZero. And you can see-- so this is sort of framing of increasing generality. AlphaGo first, remove all the Go specific and human data components of that, and then finally generalize to any two-player game. Now as you all know, chess and AI has had a long and storied history, starting from the dawn of modern computing. Von Neumann, Turing, Shannon, all of-- many of my scientific heroes, even including Babbage actually, going all the way back, all tried their hand at chess programs.
In Turing's case, he wrote it out on a piece of paper and executed it himself. But so it wasn't a computer program as such. But he, you know, he tried his hand at that. And they all imagined what a strong chess program might be like. And in fact, Garry Kasparov recently wrote, in his editorial for our science paper, that chess has always been or can be thought of as like a drosophila of reasoning. And I sort of agree with him on that, actually. That's definitely the kind of place it's occupied for many years in AI research.
Now what you can argue, of course, chess has been done, right. I mean it was done in the late '90s, when famously IBM Deep Blue beat Garry Kasparov in a six game match. So at that point, and since that point, chess programs have been stronger than the best human players. And indeed has changed the way that we play chess. But so you could argue, well, why did we bother trying to apply AlphaZero to chess, when we already know machines can beat the world champion in chess? So this is unlike Go, where obviously there was no program that could beat the Go world champion.
Now the reason is, and this is a debate I had actually with one of the project leaders on Deep Blue, back in 2016, when we'd just done AlphaGo, I think we hadn't actually yet done the Lee Sedol match. But we'd done the Fan Hui match. And I gave a talk at triple-AI. And at the end Murray Campbell, who was one of the project leaders on IBM Deep Blue, came up to me afterwards and congratulated us on AlphaGo, but asked me the question of, what do you think will happen if we apply this to chess?
Is it possible for these learned systems to be stronger than the handcrafted systems that have had 30 years of incredible engineering being done on them, plus the distillation of many hundreds of grandmasters, right. So this is one of the most-- I'd argue chess computers is one of the most heavily engineered areas in all of AI, right. I think it's been probably the longest standing continuous domain that's been worked on. And they're obviously incredibly strong.
And even as a chess player myself, an ex-chess player myself, I was wondering, was chess rich enough as a game to have enough exploration left for a learned system to learn some new ideas and perhaps some new theories or themes about the game, that would allow it to compete with these really big brutes of the machines now that we have that are incredibly optimized to brute force search through chess moves. So you know actually, we both concluded at the end of the discussion that we didn't know the answer to this. Would it be able to be competitive? Is there enough room in chess for this kind of stuff?
When I asked my strong chess player friends, they didn't know either? So, to me, that's always a good sign of a great scientific question, where basically either answer is very illuminating and very interesting. So we decided to try and do that. Now just to give you an example of how amazing and kind of carefully these current chess engines are written, the current world champion, or at least when we were doing these tests, the 2016 world champion is a program called Stockfish. And it's in the lineage, it's an open source program that's in the lineage of Deep Blue, these kinds of systems.
And they have hundreds of rules, even thousands of handcrafted rules about chess, pawn structures, king safety, all different things of valuation. The teams of programmers over many years have tried to distill from human grandmasters and encapsulate in these complex database of rules. And then of course, they have to balance those rules against each other. So there's a phenomenal amount of engineering has gone into this. And then obviously, a lot of optimization to make these systems as fast as possible, so they can actually look at tens of millions of moves per decision they have to make.
And then they have opening databases and end-game databases that tell you-- that exactly solve both the opening and the end, especially the end game. Seven pieces or less has been solved. So you just have a lookup table. Now that's what a chess engine looks like today. That's what we're confronting when we try and build a learning system that could compete against that. What we do is we throw away all those thousands of handcrafted rules and that handcrafted knowledge and all that chess knowledge, and we replace it with two things, self-play reinforcement learning and Monte-Carlo tree search. That's it.
So it's actually a very simple program once you've built it and optimized it. So I'm just quickly going to go through how AlphaZero works, for those of you who don't know. I'm actually going to compound together AlphaZero and AlphaGoZero, because they're slightly different systems. But they're pretty much trained in the same way. So I don't want to go through this twice. But as you can kind of imagine, it's a bit of a hybrid what I'm going to describe between AlphaGoZero and AlphaZero.
So what you do is, first of all you create your neural network architecture. And in AlphaGo, we used to have two neural networks. One that chose, narrowed down, the likely moves that would be played in a certain position. We called that the policy network. And then another neural network that learned to evaluate the current position, and who is winning and the probability chance of which side was going to win. By AlphaGoZero and AlphaZero, we managed to merge these two neural networks into one neural network the output two values, sort of had two outputs. The most likely moves to search and this evaluation.
And that allows the tree search to be very efficient, which I'll come back to the end, much more efficient than we have to do with traditional chess engines. Because we can both reduce the width of the search by using the policy network to narrow down to the most likely moves, and we can reduce the depth of the search by truncating the search at any point and using the value-- calling the value network and evaluating the position at that point. So we start with that, has no knowledge about anything. We do 100,000 games sort of in batches of self-play of the current best version of our neural network against itself, 100,000 times roughly.
That creates a big training corpus of data, synthetic training data, obviously, in this case. Every 100,000 or so batches, we then try to train a new neural network, based off of the old training data and the new training data. And then we create this new neural network. And then this new neural network, we play it roughly 100 times against, or maybe 1,000 times, against the old neural network. And at the point when it wins 55% of the time, then we replace the old neural network with the new one.
So if it doesn't beat the old neural network more than 55% of the time, we do the next 100,000 games with the old network. So now we have 200,000 to train the new neural network and so on. Until eventually, the neural network, the new neural network, is better. If it is better, than it gets-- it replaces the old one. And now the new network is what we use to generate the next batch of self-play data and so on. And this pretty simple regime is incredibly powerful. And you can bootstrap from random to world champion level and above in a matter of hours with this system, from literally starting from random play.
So in chess, this requires about 40 million games. We play a game in about three seconds. That's an interesting tuning sort of parameter. Because what you could do is you could allow the neural network in these training games a longer amount of time to think, then obviously the games would be higher quality. But you would have-- but it would take you longer to generate the games. So you'd have less games in a certain amount of time. So that's an interesting trade-off that is still not clear what the right trade-off is there. We're still doing some experiments on that.
And then we play 5000 games at a time on 5000 TPUs, which we have access to. But obviously if you had less computers, you would have less games at a time. And it would just take longer. So with this amount of compute, it only takes a few hours. So that's AlphaZero, that's the basic training. And then we then tested it under match conditions, which is in our size paper. We talked to Stockfish creators. And we got the exact conditions they use for their world championship matches. This is Stockfish 8, I should say. So there is now Stockfish 10. But two years ago, Stockfish 8 was the world champion.
And we won the 1000-game match, 155 wins to 6 losses, and the rest draws. But chess is a very draw-ish game. But that's a big margin, 155 to 6. And actually when you look at the six games, what you find out in those is, it's generally drawn positions, where AlphaZero has pressed too hard for a win. So that's another interesting thing about how much should we reward a win to a draw. And that that's another interesting thing that we could discuss in the Q&A that we've played around with.
So it's convincingly stronger. And by the way, it surpasses Stockfish in four hours from starting from random. So this is the graph of improvement there, that you can see. So all this decades of human handcrafted programming can be learned, well, different knowledge can be learned in a matter of hours. And then we tried it on Shogi. And we beat the world's best Shogi program, which is roughly human world champion level, in a couple of hours. And in Go, we were able to beat AlphaGo, AlphaGoZero in eight hours of training with this new, more efficient architecture.
So we believe this would work on any two-player perfect information game, which I have to say, was one of my-- always one of my childhood dreams. Because I used to play lots of different games, as well as chess, and we always used to talk about what would the kind of master games player be. And those of you who read any Hermann Hesse, The Glass Bead Game is one of my favorite books. I'd recommend that to all of you. And that in there is about the beauty of games. And in this case, an intellectual getting incredible at a game, and then using that to solve many other domains.
So for me this was always a waypoint that I've been dreaming about for a long time. And I think we've got there now. Now the interesting thing is that-- while there's many things to say actually, and I only cover a few of the interesting things now and maybe we can go more into this in the Q&A. But one thing to look at is the amount of search-- the efficiency of these systems and the amount of search per decision. Now there's lots of ways of measuring efficiency. And obviously, the human brain is incredibly energy efficient. And these systems are not, both the normal chess engines and things like AlphaZero, certainly not compared to human grandmaster.
But what I'm more interested in actually is the compute efficiency or the sample efficiency. And traditional chess engines, like the ones that you can run on your PC at home, now are so optimized they can look at 10 million moves, order of tens of millions of moves per decision they have to make. Now if we compare that to a human grandmaster, on the left there, you know the top grandmasters look at maybe the order of a couple of hundred moves with each decision they have to make. So many orders of magnitude less than the computer engines do.
And what's interesting is AlphaZero is sort of-- it's not as efficient as a human grandmaster, not by a long shot, 100 x more. But it's in the middle. AlphaZero looks at the order of tens of thousands of moves before making a decision, not tens of millions. So I think that's interesting, directionally, to think about there. So you know another interesting test that you can do is, if you turn off all the search completely, right, you can kind of measure how strong these engines are. And with the chess engines, if you turn off-- if you get rid of their search, so just using their evaluation function, they're terrible. They're like weaker than a weak club player, so just about know how to play chess.
But AlphaZero, if you turn off all its search, it's roughly international master standard. So it's pretty good, even though it's not-- even without any search whatsoever. And we think we can make that stronger, too. But what's also cool, for those of you who play chess, is the way AlphaZero plays. So we saw this in AlphaGo. AlphaGo came up with these amazing new themes and motifs that the human Go players had never seen and are now using, by all accounts. But this was the answering-- so the first part of the question was answered, can we-- that Murray asked me-- can we build a learning system that can compete with these brute force search engine type systems? And the answer is yes, handcrafted systems.
And the second question is, what's the richness of chess itself as a domain? And what was amazing and really pleasing to see is the AlphaZero played in this very unique style. And there's actually many unique things about it. But the main key difference between AlphaZero and the way the chess engines play is the AlphaZero really favors mobility of its pieces versus materiality. So the chess engines really like material. And they're known to be grabbing materials. If you give it a pawn, it will always take the pawn.
And then the thing about those systems was, they were quite ugly to the human eye, to human expert aesthetically. Because what these engines would do is they would grab material, greedily grab material, get into a kind of bad looking positional position but have more material, and because they were so good tactically, they couldn't be beaten, even though the moves they were making looked a bit ugly to a human expert. Of course, in the end, the chess players concluded maybe they just-- the computer just knew better, because ultimately it was stronger. And perhaps our intuition of beauty just doesn't map to functionality.
But what AlphaZero did was it plays in this very dynamic and, to human grandmaster, aesthetically beautiful style. And it's able to beat this sort of kind of calculation style that the engines have. And that really excited the chess world. And in fact has rekindled my passion for chess, and I've been playing a lot more recently. And got the chess players really excited. And for those of you who play chess, this is my favorite position from the AlphaZero Stockfish games. AlphaZero is white. Stockfish is black.
And AlphaZero loves sacrificing pieces. And we can maybe talk a little bit about why that is in a second. But it loves sacrificing pieces to get more mobility for its remaining pieces. And this is the perfect example of that. In chess, there's this term, German term, called zugzwang, which means that you've got your opponent into a position where any move they make will make their position worse. Right, that's what zugzwang means. And what AlphaZero has done here, for those of you who know chess, is it swapped its rook for a bishop. And rooks are worth five points. Bishops are only worth three points. And it's done that so it can seal off the black's queen in the corner.
You can see the queen, hopefully with this pointer, right in the corner, can't move. And these two rooks are stuck, because they have to defend this pawn next to the king. Because all of white's pieces are ganging up against it, none of its pawns can move without being taken. So basically-- and the king can't move. So nothing can move in this position. So somehow it's like AlphaZero's hermetically sealed Stockfish in. And literally, it can't move. So it just has to give up his pieces. So this is kind of incredible. And obviously AlphaZero has foreseen this, you know, 10, 20 moves beforehand that it was going to get into this situation, where literally Stockfish cannot do anything.
Now one reason you might think-- what we can debate, how come AlphaZero can do this more freely. And some, by the way, in human history, some amazing chess players, some world champions, were famous for doing this. This guy called Mikhail Tal where in the '60s was world champion. And he was famous for making extraordinary sacrifices and winning these very beautiful games. And AlphaZero is sort of in that style.
And if you think about it, in a normal chess engine, one of the first rules you would build in to a chess engine, if you're writing them. And I used to write these when I was a kid, like different types of engines. You'd write the piece values in, right. So you'd say rooks is five points, knight is three points. And you would total up. One of the first things you would do for your search engine is to basically total up what each side has got. And it's better to have more points, right.
So if you think about it, if a chess engine, normal chess engine, wants to sacrifice a rook for a knight, it's minus-- it's hardcoated rule is telling it that's minus two points. So somewhere down the line, through its tens of millions of calculations, it's got to figure out that it's going to get two points of value back, like objectively figure that out by some other rule, right, or capture that material back or more.
And so that's very constraining, if you think about it, right. Whereas AlphaZero doesn't have that in-built rule, so it doesn't have to overcome this in-built bias that it's about to make this sacrifice and it's worth minus-- it's minus two points. As far as its concerned, rooks are just assets. Knights are just assets. They move in different ways. Right now, the opponent's knight is an outpost and really powerful. My rook's passive and not doing much. So I'm going to swap it, right. It can take into account context, which the handcrafted rules obviously can't.
Doesn't matter how many rules you program in, you can't go in this position, in this specific case, with the queen in the corner, then it's OK to sacrifice, right. I mean you could try. And people have tried. But it wouldn't be very general that rule. And you probably need millions and millions and millions of rules to encapsulate that. And then on top of that, think about something else, how are you going to balance all those rules against each other, right? How are you going to balance material, versus pawn structure, versus king's safety, all of these kind of fairly esoteric concepts?
And obviously you can have a go at doing that, with grandmasters telling you what they think. But it's very difficult, even human grandmasters don't think in that way, right. They can't perfectly balance these things together within a few decimal points. So I think there's many reasons why AlphaZero is-- and this style of program00 is stronger.
Than we showed it to-- the kind of coda to all of this is we showed it to one of my old chess friends Matthew Sadler and Natasha Regan, two of my chest friends from Cambridge. And Matthew Sadler's two-time British champion. And we let them in before we published the paper to look at all the self-play games. And they found that there were so many interesting new motifs that AlphaZero had found. There were seven new themes that they haven't really seen before in professional chess. And they were-- they really asked or petitioned us to kind of write a chess book about these new ideas. And that's just come out in January.
So if any of you are chess players, I'd recommend this book Game Changer which talks about how these new ideas and why they're so exciting. And Garry Kasparov also wrote some very nice things about this. And he's very interested in AI. And one thing he wrote was, about AlphaZero, was programs usually reflect priorities and prejudices of programmers. But because alpha zero learns for itself, I would say its style reflects the truth, which is quite an amazing statement really. Of course he went on to say that it plays like him, which he was very--
[LAUGHTER]
Perhaps he's a bit biased. I think it does a little bit. But he was very pleased to say that as well later on in the article. So that's AlphaZero. So I want to move on now to AlphaStar, which is our newest program. So you can think, well, OK, you know, we've done board games. And I was very clear to caveat that with perfect information two-player games, right. So that's quite a big wide branch of things, encompasses a lot of human activity in the games domain. But you know it's still-- they're still easy in some sense, right.
And in two ways that they're easy, two important ways that they're easy, is that the state transitions are very simple. You make a move. It's pretty clear what the next state is going to look at. And they're perfect information. So there's no hidden information or impartial observability. So we wanted to tackle these two challenges that we feel are beyond what we did with AlphaGo and the AlphaZero series of programs. And so we wanted to pick a domain that was sort of more complex and a dynamic, real-time environment that also had hidden information.
And to do that, we chose the game of StarCraft II, which many of you will be fans of. And you know, you can ask why did we choose StarCraft II. Well, for several reasons, in my opinion and widely held sort of acknowledged, it's the most complex and most exquisitely balanced real-time strategy game, I think, ever made. And real-time strategy games are kind of the hardest of the strategy games in the computer games world. It was also the first e-sport. So it's been played professionally over a decade. So there are many hundreds of professionals, a lot of them in Korea again, like with the Go players. And that's been played very, very high level.
It's also been quite a classic challenge for AI for about a decade, as well. So people have been researching StarCraft AI and having-- organizing global competitions from about 2010, a lot of the first work came out of Berkeley. And also, we had good relation with Blizzard, who make this fantastic game. And they were able to supply us with things like anonymized replays of human games. And they were very excited about exploring this for their own games development. If they can build AI in this way for games, that would save them a huge amount of time and effort.
And so the new challenges are partially observable, as I mentioned, massive action space. So this is another new challenge compared to board games. There's not a good way of estimating this. But we estimate there's roughly 10 to the power 8 possible actions per move, if you like, although it's real time. And then there's very long term dependency, so games can last like half an hour. And they can involve more than 5,000 steps, 10,000 steps. It's kind of a range, which is obviously a lot more than a board game, which is in the order of 100 to 200 moves. And it's real time in multiplayer.
The other thing about this game is that the way you play it, it's a very dynamic game. So unlike a board game where the set of pieces is fixed at the start, like in chess, here in StarCraft and these kinds of real time strategy games, you build up your army and your units. So every game is different. So there's basically four steps in StarCraft. Step one is you collect your resources. You build a base. You build your units. And then you battle with your opponent.
And so it's a very complex game with many, many rich strategies. And there are three different alien races you can play. And there's a kind of paper-scissors-stone element. So it's a very complicated game to play. In terms of our architecture, I haven't got time to go into this in detail. But we'll be publishing a paper on this soon, which will have all the details. But basically, there are three feed forward networks that take in the various observations. So there's spatial observations about the map. There's various numbers that keep track of your economy, how many resources you've collected, and so on.
And then there's information about your units, how many-- there's like 50 different types of units you can build, how many have you go out of each type, and how many are you producing. And so those three feed forward networks that you can see here are piped into a deep LSTM, which is really the core of the system. And it has three layers of LSTMs and a bunch of other stuff. And then the output is a function head, which is what action to take. And then that's parameterized across the units, using a tension mechanism.
So you can think of this as a parameterized function. There's output, move these five units to this position. And that's what is output, and then you see that being executed. Now in terms of the training, we created this thing called AlphaStar league, which is-- we call this population based training. And really this is kind of like self-play on steroids. Because instead of now just playing against yourself, one opponent, we have an entire league of diverse competitors that you are competing against.
So in the beginning, we start with the first agent that we build. And that started by imitation learning and supervised learning, by looking at the replay games that we got from Blizzard. So this is human data, actually not that strong those players. So they were like median level human players. And then we create our first agent here, 001, which is created by imitation. But then we fork this agent, and we start using our self-play and reinforcement learning to improve the level of this agent.
So this is very much how we started like with AlphaGo. Except in this case, we also keep around older agents, the ones in blue, to make sure that as we improve on to new strategies, we don't forget to be-- our old strategies. And then this level improvement carries on for about 1,000 or more different epochs as we keep increasing the strength of these systems.
And at the end, before we challenge the top human players, what we do is we take what we call the Nash of the league. So we look at, if we have five matches, we take the best five strategies. And the ones that basically together dominate the rest of the league but are not dominated by any other individual agent in-- that's outside of that Nash, those five. So and then those are the ones we take into the competition.
Now one way we kind of increase the amount of diversity in our systems is by introducing intrinsic motivation. So obviously the AI systems get rewarded for winning a game. But we also, to increase diversity, we also give them some pseudo rewards or intrinsic motivations. And these can be, make sure you build x number of units and then win, right. So there's like 50 different units as I was saying. So we can kind of make AI-- make some of the agents specialize in certain types of units.
We can also say, only focus on beating this one other agent in the league. So it can kind of pick on one agent. And we can sort of randomize that a bit as well. So we can kind of introduce asymmetries in all sorts of ways. And we're still experimenting with different ways, a very rich area to look at, designing pseudo rewards and involving them potentially.
And what you get is, if you see this graph, is this is the different units you can build or a selection of them, so like 25 things here. And this is the number of days of training the league. And this is how many-- the sort of height of this graph is how many of those units were built at different stages of the league training, by all of the different agents that were active at that moment in time. And you can see that the different strategies ebb and flow, depending on which one is dominant at the time.
So once we did that, just before Christmas last year, we decided we were ready to take on some of the top professionals in the world. So we invited in two of the top professionals, TLO and MaNa from a German team. And they came in and tested our system, behind closed doors in two official five game matches. And here is-- you can see the progress of AlphaStar. Each of these dots is one of the agents in the AlphaStar league. And you can see how they're improving. These are all the rankings, human rankings, up from bronze to grandmaster. And obviously TLO and MaNa are above grandmaster level. And we played them just before Christmas.
And we won those two matches 10, nil. So we play two five game matches, which we won 5, nil and 5, nil. We then showed the replays of that. And it was commentated on online. Some of you may have seen it in January. We also tried one exhibition match, which we actually lost, which we tried the new interface, different way of processing the screen. And we're still working on that now. But overall, our system is extremely strong. So that's AlphaStar.
Now if we now take a look at all of our work together, we've talked about, in the past, we started off with Atari games. So we sort of cracked that in 2014 with DQN. Then there was AlphaGo, AlphaZero, and now AlphaStar. And I think in terms of grand challenges for games, I feel like we're sort of done it now, that's the-- I'm not going to declare a victory like Tommy was saying. But I feel like-- I feel that we've done most of the interesting problems that were inherent in games. Of course, there's still other things, other kinds of games, games like Diplomacy that need lots of language understanding. There's some other interesting things to explore. But we feel like we've done a lot of the core work that was needed.
Now there are many interesting issues with these systems. I haven't got time to talk about all of these right now, because I think we started a bit late. But so we may have to overrun a bit. But there's-- one of the things that was interesting about these systems and we learned about these systems is with AlphaGo, especially the first version of AlphaGo, it lost game four in this famous match we played against Lee Sedol. We won 4, 1. And the one it lost, it basically got very confused. And Lee Sedol did this amazing move, on move 79, that confused our evaluation system.
So when we went back and analyzed this afterwards, we were trying to figure out why-- what was it about that position that had confused AlphaGo. And it wasn't so clear, right. We could sort of tell there were some motifs about the position. But it wasn't-- it wasn't clear exactly what the problem was. And we had another match against the Chinese number one that we won in 2017 that we had to prepare for. And obviously, he had seen-- Ke Jie had seen what happened in Lee Sedol's match. So we had to fix this weakness.
So you can think of it as a little bit like a bug, if it was a traditional program. And obviously, if it's a traditional program, you would just write a new rule or something to fix that hole in the knowledge, database if you like. But the problem is this is a self-learning system. So we can't really just fix it with some patch. We have to kind of encourage it somehow to explore this area of the search space, explore this area of the regime. And that's a pretty tricky thing to coax a system to do. And actually, if people are interested, we can talk about this in the Q&A. There were lots of interesting ideas we had there of how to do that.
And I think there's going to be an interesting notion of what debugging is, when it comes to these new kinds of systems. Like what does debugging mean when you have a problem with one of your self-learning systems. And I think this is really interesting. I think there's a whole kind of new paradigm there to look at in computer science.
I've talked a little bit about covering the knowledge searchspace, sort of related to this first point. How do you know you've covered you know the whole surface area of what you thought you were doing? is there a good mathematical way of describing that? This also links in with understanding the systems. So these systems are incredibly good at what they do. They have all this amazing implicit knowledge that they've built up for themselves. But how do we understand how it's making those decisions?
And then there's maybe philosophical questions about the nature of creativity and what that means. You know, I started thinking about three levels of creativity now. Maybe there's probably others. But I've been thinking about interpolation, extrapolation, and then kind of out-of-the-box creativity, or innovation. And there's three types of things you can do there. And I would claim that AlphaGo exhibited some aspects of creativity. So it was able to come up with new moves that even human players had never thought of or seen before. So I would say that's extrapolation, not just interpolation.
But it can't do full out-of-the-box innovation thinking. Like AlphaGo cannot invent Go, right. That's what I think we're eventually after, can't invent chess, AlphaZero, right. It can just play some-- it can create creative moves, new moves, novel moves, and novel ideas in chess. But it can't invent chess, yet.
So-- and of course, I want to say, despite all these successes, many of the most interesting challenges are left, right. And I won't go through each of these. But these are just some of the ones that I think all of you should be working on, and we're working on. And I'm sure many of you are. Unsupervised learning, memory one-shot learning, imagination-based planning with generative models, learning abstractions and abstract concepts, transfer learning, language understanding, all unsolved. And we need to crack these problems if we want to get to full AGI.
And so I think, actually in some ways, it's the most exciting time to be in the field right now. Because I feel like we've just done the preliminaries. We've got the-- we've got onto the first step. And we've done some interesting things. It looks promising. But now I feel like the next decade or so, it's going to be about tackling, really, the crux of intelligence, which I think is many of these things, and things even beyond this list, which I feel are like really the heart of the intelligence question.
So I'm just going to try and speed up a little bit here. But the-- so that's games. And that's for us kind of a-- it's not the closing of a chapter. We're always going to be using games and simulations forever at Deep Mind. But it's a big sort of watershed moment, I would say, for us in this last year. And the idea is, you know, for us has always been, and if you go back to the 2011 talk-- I think it's on the internet somewhere-- that Tommy was mentioning, I do talk about games and effects for our business plan in 2011. And I feel we've done a lot of the things I talked about in that lecture.
But it's always been a part of the plan. So games I've always felt, and simulations, are the perfect training ground for AI. But the plan is always, always to develop general solutions that could be applied to real world problems. And I think here is also it's very exciting, in the sense that we've just now I feel got powerful enough and mature enough algorithms, by no means anywhere near to full AI, but they are already proving themselves to be useful in many real world domains. And I think that's another really fruitful area for all of you to explore is how can you apply these systems to all sorts of interesting applications.
Now we've applied it commercially within Google and elsewhere on lots of things. I won't go into all of these things. But in healthcare, we've done a lot work on energy and data centers. Other people have worked on personalized education, virtual assistant. I think the possible applications are almost limitless. We've-- I won't spend much time on this. But one of our most recent pieces of work is we've improved the windpower, again, Google uses a lot of wind power and renewables, 700 megawatts. And we got 20% efficiency gain on what they were getting out of their windpower, using machine learning and some of our machine learning systems.
And I'll just skip the various different aspects of that. Because I want to now focus on the last part of the talk, which is-- so that's games. There's commercial stuff you can do. That's all great. But the thing I'm really passionate about is this section, which is using AI for scientific discovery. That has always been the reason-- that's the reason why I work on AI, and the reason why I started my whole journey is I wanted to use AI to help us understand the universe around us better. So having AI is this incredibly powerful tool that we as scientists can leverage.
So I already think even with the current systems, which need lots more improvement as I just mentioned in the previous slides. But even the current systems can be applied usefully to quite a lot of scientific problems. And I put up here sort of three key characteristics, if you like, of problems that I think would-- if they have these characteristics, would already be amenable to this type of AlphaZero like approach. So first of all, number one is that the problem by its nature is a sort of massive combinatorial search problem, right. So if it's got that kind of character, I think that's well suited to this kind of system.
Secondly, can you express a clear objective function or metric that you can then optimize or hill climb against, right. And again, many domains it's possible to do that. And three, are there lots of ground truth data, obviously that's great. But-- and or, so ideally it's and, but it can be or, an accurate and efficient simulator for that domain. And if I posit that, propose that, if those three things hold, then you can probably use this type of system to help solve that problem.
And we ourselves are doubling down on this. We're building a science team. It's around 30, 40 people now at Deep Mind. We want it to grow to about 100 people. If that's something you're interested in, please come and talk to me and apply to us. Because I think there are a lot of different areas where we can apply these kinds of techniques, even the ones we have already today, and make some progress. And here's just a sprinkling of some of the things we've looked at. And in a few cases, we've got serious projects on, all the way from genomics in theorem improving to quantum chemistry and so on.
And it's been used successfully already by us and other groups in lots of areas, exoplanet discovery. Some of their colleagues at Google have done that, discover new planets. Nuclear fusion, we're looking at controlling the plasma and in these fusion reactors. On health care, like diagnosing macular degeneration. And even things like chemical synthesis and material design. But I just sort of final part of my talk, I want to talk about what I think is the most exciting thing that's happened at Deep Mind in the last year, and that's our program called AlphaFold.
And AlphaFold is our attempt to solve the protein folding problem, which many of you will know what that is. But for those of you who don't, this is basically the protein folding problem. You get-- proteins are obviously the fundamental building blocks of life. And all life depends on them, including humans, of course. And what you start with is an amino acid sequence on the left here. So there are 20 amino acids in nature, naturally occurring. And each one is like a letter. And you get this big string of letters, 1D string of letters coming in. And all you gotta do is predict the 3D structure of the protein, from this 1D array of sequence, right.
And you'd like to predict the 3D protein structure. And this is actually a protein structure of hemoglobin. And you can actually see the little hole here in the middle, the hemoglobin that carries the oxygen. Right, so this is kind of amazing. I mean proteins are incredible when you read into them, like what they do. And the reason this is so-- it's such an important problem in biology is the structure of the protein, the 3D structure, determines its function, right. So if we could understand how these proteins are structured, we would much better understand what these things were doing.
And you could think of them as like basically molecular machines. So there's lots of really cool videos of proteins working visualized on YouTube. And you can have a look at them. But these are two really cool ones. Like this is-- this produces-- this is an enzyme in your mitochondria. And it produces ATP, which is basically the energy for all living cells. And on the right here is a calcium pump that replaces calcium when you exercise your muscle contraction. So they're really like little biomechanical, exquisite biomechanical machines.
And I think if we can solve protein folding, and there are ways of getting structures of proteins, right. But it's very painstaking crystallography you have to do. It can take like four, five years to do one protein. And some proteins are not crystallizable. But if we could open that up, then we could-- I think we'll have big impact on disease understanding, drug discovery, and also synthetic protein design, actually building synthetic versions or adjustments to these kinds of proteins.
So the way we tried to do this is by going to our deep learning systems and training a specific new type of system. So just very quickly, how the system worked, AlphaFold, is we had a neural network. And then we basically-- we have this database of 30,000 known protein structures, that we have the sequence for, obviously, but also the 3D structure. It's known. It's known through crystallography and other methods. And what we do is we train this neural network, which takes the amino acid sequence in as an input. We also augment the data in another way, which I hadn't got time to explain. But by comparing the sequence against naturally occurring sequences.
And so those are the inputs. And then the outputs are predictions of angles of each, at each point in the protein. And also a distogram, which is a pair-wise distance in angstroms between estimated distance between every pair of amino acids in that sequence, right. And these are probability distributions of angles and distances. And obviously for the training data, we can recreate the angle outputs and the distogram from the actual structure.
So this is trained in a supervised training way. Then we have to do structure optimization. So once we train that neural network, we can put a new, never seen before, protein structure in. And it will output two outputs, the angle distribution over angles and a distogram. So once you have those two things, how do you then end up with a structure? And we tried all sorts of things here, including neural networks and RL and simulated annealing. But in the end, what's worked best so far is just a simple quasi-Newton method in numerical optimization.
So what you do is you just take-- you randomly sample from this distribution of angles, that gives you a distogram, you compare that against the output of the neural network. That gives you two scores, you also include a chemistry score of [INAUDIBLE] score, which stops atoms being put on top of each other. So that's calculated just through basic chemistry. And that's all combined together into a total score v. And then you carry on hill climbing against this total score until you can't optimize it any further. And then you output the candidate structure.
Now we're actually working on-- we feel like there will be some of these neural network methods will work better ultimately than just the Newton method. But right now, that's our current best method. And this is what a protein looks like as it's getting folded. So you saw it initially, if we wait for the start of the GIF file. So it starts unstretched. And then this is the optimization, folding it more and more, and the smaller and smaller changes as the optimization process continues, until there's nothing left to optimize, which will be shortly. And then that will be the output candidate structure. So that's how it looks sort of visualized.
And so how do we test this system? Oops. We tested it on CASP13, which is kind of like the Olympics of protein folding you can think of. It's held bi-annually every two years. It's been going since 1994. All the top research groups around the world in this field compete, so it's like 100 plus international research groups. And it's a pretty fun competition. Because what happens is-- this happened over last summer. Once the competition starts, they have what-- obviously in biology, they're trying to find the structure of these proteins all the time using crystallography.
And so when a new person finds-- when a new group finds a structure of a new protein, before they published it, if it comes-- if it's around the time of the CASP competition, they'll give the structure to the competition. And it's not been published yet, right. So we have the-- you have the ground truth. The competition organizers have the ground truth. But none of the teams know it, right. So it's a truly blind test. And it's quite fun. So every day for that three months, you get e-mailed an amino acid sequence. And then you have three weeks to scramble to submit a candidate, right. And then the next one comes in the next day. And it's pretty fun-- it was actually pretty fun to work on.
So there's about 80 amino acids sequences that you get like this. And then, you know, you're supposed to hand back these predicted structures. And so to cut a sort of long story short. We won the competition, kind of pretty unexpectedly. And here is the ground truth. We only have one person on the team that's ever worked on protein sequences, so. But he's very good. He leads the team. John Jump-- he's called John Jumper. And so the ground truth's in green. And you can see our predictor in blue. And these are three of the proteins that we folded pretty well. And they're pre-overlapping as you can see.
But we didn't just win. We won by quite a big margin. So we won 25 of the 43 protein categories that we were-- proteins that were in the category we were competing, which was the hardest category. And the next best team only won three out of 43. And also if you look at the average, you know this is us on the purple bar. This is not a very good team here. And then it's pretty linear after that. So you know, we're good like 25% better than even the second best team. And then another-- so this really shows that these methods-- I mean obviously there's a bunch of-- it's been going on for 20 years. So there's a huge amount of careful handcrafted systems in this area, like there was in chess, right. So and obviously this is just a learning system.
And then here's another graph, which shows you basically-- this is the Angstrom. And Angstrom is 0.1 of a nanometer, the Angstrom error of each base in the sequence, each amino acid in the sequence, each residue. And these orange lines are all the other teams. This purple line is us. And this is the Angstrom error. And this is the percent of residues, percent of amino acid sequence sort of in bases that have that much error in them. So you can see all up to like 98% that we're within 10 Angstroms. So that's pretty cool.
So, you know, this is great for me. Because it's really-- we've been-- I've been talking about applying AI to science for a long time. But this is really our first proof of concept that this really could work, and in a really important area of science that will have a lot of impact. And I want to just carry out this. So we are state of the art. And we, you know, you saw the graphs, and we won the competition. But we're still a long way away from this problem being solved. And by solved, I mean useful for biologists so they don't have to do crystallography anymore, or at least as much. And you still need it obviously to check the ground truth but as much.
And what they tell us is they want-- need a one Angstrom error is the kind of tolerance they can deal with. So we're still quite a long way away from being within one Angstrom across the board. So we're continuing on this project. And it's one of the biggest projects we've got going at the moment. And we're exploring many, many additional techniques. I hope to-- and I mean when I next come here to have some new results on that.
So I'm just going to finish by talking about the bigger picture, couple of slides on that. And we'll go to Q&A. So I feel like we're making good progress now on this thesis that I've always had, that AI is a kind of meta solution. I feel like in science and in many other areas of our lives, information overload and system complexity are two of the biggest challenges we're all trying to overcome, especially in the sciences. And I feel like we've all got a lot of data, big data, sort of everywhere. And for a long time people were talking up big data as being great. But I think it's sort of kind of the problem.
So we've got all this data almost everywhere we look. But how do we find the insights and process that data. How do we-- how do we look for the right things inside that data and make sense of it. And I think AI is potentially a very powerful answer to that. And the way I think about it is that intelligence can be thought of as a process, a kind of automated process, that converts unstructured information into useful knowledge. And my dream is to make AI-assisted science possible to allow us to make faster breakthroughs.
It's a very, very exciting time right now with-- and I think AI holds incredible promise for us as a society. But just a couple of notes of caution, you know, like with any powerful tool, we've got to make sure we build AI responsibly and for the benefit of everyone. We build it responsibly and safely. And I think, as with any powerful technology, it's inherently neutral in and of itself. It depends on how we as humans in our society decide we're going to deploy it. And I think a lot more research and discussion is needed on the impact of this technology.
And if you're interested in researching these topics, like robustness and bias and safety, please let us know, because we're, again, we're expanding our work on that front. And we've done a lot of collaborative work with outside teams and companies, like a partnership on AI to try and increase this. And as a final slide, as a sort of the neuroscientist in me, and we were just discussing this yesterday at CBMM.
I think trying to build AI with a neuroscience inspiration behind it and neuroplausibility is a great way to actually understand the mind better, ultimately. Because if we can distill AI and intelligence into an algorithmic construct, maybe we can then compare it to the human mind, and then that will unveil-- allow us to better understand mysteries, like creativity, dreaming, consciousness that we want to understand about our own minds. So thanks for listening. I just want to thank everybody, all these amazing people at Deep Mind who worked on all these different projects. And thank you all for listening.
[APPLAUSE]