Using AI to accelerate scientific discovery
Date Posted:
April 11, 2022
Date Recorded:
April 5, 2022
CBMM Speaker(s):
Demis Hassabis All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
The past decade has seen incredible advances in the field of Artificial Intelligence (AI). DeepMind has been in the vanguard of many of these big breakthroughs, pioneering the development of self-learning systems like AlphaGo, the first program to beat the world champion at the complex game of Go. Games have proven to be a great training ground for developing and testing AI algorithms, but the aim at DeepMind has always been to build general learning systems ultimately capable of solving important problems in the real world. I believe we are on the cusp of an exciting new era in science with AI poised to be a powerful tool for accelerating scientific discovery itself. We recently demonstrated this potential with our AlphaFold system, a solution to the 50-year grand challenge of protein structure prediction, culminating in the release of the most accurate and complete picture of the human proteome.
Speaker Biography: Demis Hassabis is the Founder and CEO of DeepMind, the world’s leading AI research company that aims to solve intelligence to advance science and benefit humanity.
Founded in London in 2010, DeepMind has achieved breakthrough results in many challenging AI domains from Atari games to StarCraft II, and has published over 1000 research papers - including more than two dozen in Nature and Science.
In 2016, DeepMind developed AlphaGo, the first program to beat a world champion at the complex game of Go. In 2020, its AlphaFold program was heralded as a solution to the 50-year grand challenge of protein structure prediction and in 2021, DeepMind launched the AlphaFold Protein Structure Database, which offers the most complete and accurate picture of the human proteome to date.
A chess prodigy, Demis reached master standard aged 13, and went on to program the multi-million selling simulation game Theme Park aged 17. After graduating from Cambridge University in computer science, he founded pioneering videogames company Elixir Studios, and completed a PhD in cognitive neuroscience at University College London. Science listed his neuroscience research on imagination as one of 2007’s top ten breakthroughs, and in 2021, AlphaFold2 was selected as the Breakthrough of the Year.
He is a Fellow of the Royal Society and the Royal Academy of Engineering. In 2017 he featured in the Time 100 list of most influential people, and in 2018 he was awarded a CBE for services to science and technology.
PRESENTER: I'm happy to introduce Demis Hassabis. You probably know, or most of you know all about him. His bio is he was a chess prodigy doing a PhD in neuroscience. Then came to MIT. He founded DeepMind. They developed the AlphaGo. They developed-- before that, Atari Games, they developed Alpha Zero, and then more recently AlphaFold, and yes, in the meantime received so many prizes, and so on. But anyway, that's-- you can read all of this.
What I want to explain is why he's coming back to MIT every three years or so since 2010 to give us an update about the state of AI and DeepMind. And the reason is that in 2009, he came to spend a few months as a visiting postdoc in my lab, and then this went pretty well, and as you can see from this recommendation that I wrote up there. And he applied to a Wellcome Fellowship that he got, with the idea of then spending the next five years working between London and Cambridge here.
And fortunately, or unfortunately, he spent of these five years only a few months with the Wellcome Fellowship, but then he started DeepMind. And this was 2010, I think.
And so in 2011, we had here actually the initial session was exactly in this classroom. We had a symposium that George Tenenbaum and I organized by the name "Brains, Minds, and Machines." It was for the-- one of the five symposia for the 150 years of MIT, in 2011. And it started with a first session in this classroom that was titled "The Golden Age of AI".
And so you can see here Sydney Brenner, Marvin Minsky, Noam Chomsky, and Emilio Betsy and Patrick Winston at the end. And this was, by the way, the first and only time as far as I know that Marvin Minsky and Noam Chomsky were together on a panel, which is pretty amazing.
And this was the first day. The third day was "The Marketplace for Intelligence". And so there were the big guns, like Microsoft and Google and IBM speaking, but there was also startups, or other companies. I think Colby spoke, Colby's here, and Amnon spoke, and Demis. Demis looked a bit younger, but--
And so after that, I asked him to become a member of the Advisory Committee of the Center for Brains, Minds and Machines. Then we got with Josh and several others at MIT with funding for NSF for 10 years. It's actually 2023 is the last year of NSF funding for the CBMM, and Demis has been a great advisor together with all the great advisors in our external advisory committee that will meet, actually, tomorrow to tell us what to do next.
And so that's why Demis is coming back. This is a little piece of his bio that is not usually known. And Demis, I'm very glad to give you the stage.
DEMIS HASSABIS: Great. Thanks, Tommy, for that great introduction. I stayed over with Tommy on Sunday evening, and he was saying that he found in the attic all of these ancient documents of DeepMind from way back in 2010-11 that I completely forgot about and lost track of, but Tommy still had somewhere. So it was great reminiscing about old times.
So it's always fantastic to be back here. It's been a few years now, of course, with the pandemic, but it's always wonderful to be back at MIT and to talk to all of you.
So today, what I'm going to discuss is mostly our latest work, which is AlphaFold, and the impact it's had there. And hopefully, those of you who are biologists in the audience have found it useful yourselves and may be using it. It'd be great to hear your feedback on that in the Q&A session.
But before I talk about AlphaFold specifically, and actually a lot of the other work that we are using AI in the various different sciences, I'll briefly mentioned some of the other work we've had. We've had a bit of a golden year, I would say, last year in terms of applying AI to all sorts of hard problems in the sciences.
But I was going to also just recapitulate a few of the things that have led up to being us being able to do this. It's been a kind of 10 year-- well, I guess 12 years now-- process to get to this point, where we're finally at the point now, I think, where we can apply AI very meaningfully to scientific discovery itself, which was always my aim with DeepMind from the very beginning. So I just want to thread the needle for you in terms of how the different work we've done in the past, largely on games, then culminated in the kind of work that we're doing now.
So as Tommy said, we started DeepMind in 2010 with a very audacious vision. The plan was to effectively build a kind of Apollo program effort for building artificial general intelligence, or human level artificial intelligence. And back in 2010, none of you will remember this now, but you know I was uninvestable, I would say, and certainly a kind of backwater. And it's kind of amazing to see what's happened in a relatively short space of time. Within a decade, really, it's gone from that to perhaps one of the hottest areas of technology, science, and also commercial investing.
So it's kind of amazing. And it's been a privilege to be at the vanguard of that and be part of pushing that transformation forward. And in a way-- I mean, we always, of course, plan for success, but I think even we've been pretty surprised by how quickly things have progressed and where we've got to today.
So the mission statement of DeepMind was to solve intelligence and to use it to advance science and benefit humanity. And we've had this in mind from the very beginning. This has all sorts of implications about what we would think about.
When we use the term solving intelligence, what do we mean? Well, you know, I've been trained as a neuroscientist and as a computer scientist. And really, DeepMind in the early days was an intersection, really, of the best ideas of from computer science and AI and machine learning and the best ideas that we had at that time from systems neuroscience. And that's partly why I was working in Tommy's group, because obviously his group does both of those two things.
And by solving intelligence, what we mean is we'd like to fundamentally understand the phenomena of intelligence, and then, of course, recreate that artificially to create artificial general intelligence. And our thinking was that if this was done in a very general way, then in theory, one should be able to apply the technologies that come out of that, the algorithms that come out of that process to a whole wide range of challenging tasks. After all, that is what the human brain does, obviously, our only existence proof of general intelligence that we have. And there's no, I don't think, theoretical reason why that should not be possible.
So the type of approach we took at DeepMind, which was very early, if you think about that in 2010, and now, of course, is all the rage, is a learning systems approach. And it's turned out, transpired, obviously, over the last decade that they're incredibly powerful and they have huge potential, these types of learning systems.
So of course, I think of it as two components, really, although there are many types of learning systems. But of course, the one we specialize in at DeepMind is called deep reinforcement learning. So one way you can think about this, at least the way I think about this is if an agent or an AI system on the left here finds itself in some kind of environment or some data stream and it's observing, getting observations of that data stream. And I think the first job of the AI system is to basically build a model of that environment from that data or that experience. And that could be a continually updating thing if this is an active learning agent.
So that's the first job of the AI system, I would say. Then the second job, where we use reinforcement learning for this, but there are other approaches, too, is once you have built that model of the environment, you can use that model to learn how to achieve a specific goal or maximize a reward, usually specified by the designers of the system, within that environment.
Reinforcement learning does that by a sophisticated process of trial and error in order to figure out what action to take that would best get it towards its goal. And at least in an active learning system or an agent system, this is an active learning process. So actions are taken that may drive a new change in the environment, which in itself may drive a new observation.
So the cool thing about these systems, deep RL systems, is that they can discover potentially new knowledge from first principles for themselves, through this process of trial and error. So of course, if one thinks about that, then this might be incredibly powerful in certain domains, for example, like science.
So obviously, as all of you will probably know, AlphaGo was our big landmark success with this type of system, probably still the most famous demonstration to date of a deep RL system. And just to remind those of you who have not heard about that, of course this is our program to play the game of Go.
And we know that Go was super hard for computers to play until AlphaGo came along because of its enormous search space and the fact that it's a very esoteric game, so it's very difficult to write down a handcrafted evaluation function in the way that chess programs are written.
And I think the reason this is, is because Go is actually at the edge of difficulty level that even humans can cope with, I would say, whereas chess is probably just below that limit. It's slightly less complex than Go. And I think Go players deal with this complexity by relying heavily on their intuition and their pattern matching skills rather than their explicit calculation skills, because it's just too complicated for us to do that. And that's, of course, the reason why it's so difficult to program these evaluation functions is even the world's best players would not be able to describe to you precisely how they are evaluating a position in terms of a bunch of rules that could maybe be coded.
So we built these systems and then the descendants of AlphaGo, AlphaGo Zero, and then AlphaZero and then more recently MuZero, and they all basically use this underlying approach that I'm going to describe here, which is really more the way the AlphaZero and AlphaGo Zero worked, but you can think of is the whole way all of these types of systems work.
And it's pretty simple to describe. So what happens is you start with a neural network that's basically random, and it's trained through hundreds of thousands and sometimes millions of games of self-play to get better at evaluating a position, whether it's good or not for the opponent or for the agent, and also to predict what are the right-- the kind of most likely moves that should be made in a position so that it can narrow down this enormous search space.
And so what happens is you start with version 1, let's say, of the system and it's going to be pretty bad, almost random at playing. You do 100,000 self-play games. That generates a new data set, and you use that data set of games to train a second neural network, version 2. And version 2 tries to predict which side will win from a certain position in those games, and also what type of move, try to guess which kind of move is made by the v.01 system.
So you train up v2, and then you have a little mini tournament between v.02 and v.01, 100 games or so. And there's some kind of win threshold where you feel like v.02 needs to beat v.01 by a certain threshold. We set that 55% win rate. And then we regard that as significantly stronger, the v.02 than the v.01.
Now, if that happens and v.02 does beat v.01 55% or more of the time, then you replace v.01 with v.02 and you do this whole cycle again. And you generate a new set of data, but this data will be slightly higher quality, because now it's v.02 that's generating that data, and then you train a version 3, right? And then you do the little mini tournament, and you and you and you go again.
Now, if v.02 does not beat v.01 in that first mini tournament, you continue to generate more data with v.01. So you do another 100,000 games, and now you've got 200,000 games to train the next v.02*, one could call it. And then eventually, what we find is that v.02 does eventually beat v.01. And in fact, once one does this around 17 times, depending on which game it is, that kind of order, 17 to 20 times, you go from random play to better than human world champion in any two player game.
And that's the system. So it's pretty elegant, and we generalized it fully to all two-player games, not just Go, but any two-player, perfect information game.
So this will become relevant in a minute when we discuss about how to apply these techniques to science. We think of Go really as when you boil down what AlphaGo is, Go is this enormous search space, more positions than there are atoms in the universe, impossible to exhaustively search by brute force or any brute force method to find out what the right path through the Go game is.
So what you can think of is the neural net has learned this model of the environment, in this case, Go positions and likely Go moves and how good Go positions are. And that constrains this enormous search space. So instead of having to search this entire search space, it just searches-- the Monte Carlo tree search only needs to search a small, tiny fraction, this in blue, of the possibilities. And then of those, of course, it picks the best one that comes out as the best evaluation, this path in pink, and that's the path that is picked once the thinking time runs out.
And this program is obviously enormously powerful. We beat the 18-times world champion in a very famous match in 2016 in Seoul, which was watched by a couple hundred million people. And we won that match 4-1, and it was proclaimed to be a decade before its time. But those of you who are familiar with this will know that was more important was actually how AlphaGo won that match, and the fact that it came up with original Go strategies and Go motifs, including most famously move 37 in Game 2, which is here on the right and encircled in red.
And suffice to say that this piece was put on the fifth line, you can see here the fifth line from the edge of the board. And this early in the game, you should never play on the fifth line. We don't really know why, but that's the rules of Go masters will tell you, you should only play a move like this on the fourth line or the third line, but definitely not the fifth line.
Of course, that's all changed now with AlphaGo doing this. And obviously, the subsequent five-plus years now, all these AlphaGo games have been studied to death. And there are many, many books now written about this, and it's changed Go theory.
And of course, the interesting thing about this was that this move 37 stone, 100 moves later, roughly, was in exactly the right position to decide the game, because the fighting from the bottom left of this position spread across the whole board, and then the move 37 stone happened to be in the perfect position to decide that battle and then win AlphaGo the game.
So this is very interesting. So these are clearly an original strategy, but we can come back to, as well, also in the Q&A about what does this mean about creativity. So maybe just to mention a little bit about that, I think AI systems today are very good, have been very good at for a long time on interpolation.
So you can think of that as distinguishing between cats, pictures of cats and dogs. And you show the system millions of cats, millions of dogs, and it kind of effectively builds a sort of prototype of what a cat and what a dog looks like. And then so that's kind of like interpolation, some kind of average of all those examples and kind of creates a template out of that. This, I would say, is more like extrapolation.
So these strategies have never been played before by human players, right? So this is not just interpolation, some kind of averaging of what we know is good. It's true, I would say, extrapolation. So I think that is some form of creativity, one would have to say.
But on the other hand, I think there's a further level to go, which I don't think AI systems can do, which one could call true invention or out of the box thinking. And for me that would be not coming up with just an original strategy in Go, but inventing Go, right? So we don't have any systems that can invent Go or invent chess.
And that's a whole other level, and it's interesting to speculate, well, what would be missing from that? I think partially we wouldn't know how to even describe that to an AI system in a way that isn't very high level conceptual, and our current systems don't really deal with that very well. So I think there's one more. So I think we're starting to encroach into the boundaries of what on may call or think of as creative, but there's several more, I think, levels still to go.
So if you're interested in this, of course, there's the AlphaGo film that's now on YouTube. And I think if you're interested in the human story, if you like, behind AlphaGo, I highly recommend this award-winning documentary.
So then with AlphaZero, as I was mentioning, we generalize this next to-- and this is, in general, the way we tend to-- I think this is a good approach to engineering sciences. Usually, you-- at least what we do at DeepMind is you try and focus on performance first. So you try and get can you actually master Go at all. Then once that performance is reached, you throw out-- you start throwing out components of the system to make it more and more general, less and less specific to that particular task, while still keeping that performance level up, right?
And then finally, which is the stage we're getting to in certain of our systems, like AlphaZero and a bit AlphaFold, you can also do analytics on that system and try and understand at a deep level, open up the black box and find out how, exactly, it's working and what is the knowledge that it's-- how is it representing the knowledge that it has.
So AlphaZero, amazingly, within a few hours, training from nothing, would beat the best programs available in any of these other games. And these programs are already better than world champion, human world champion level, including, you'll see at the bottom here, AlphaGo, all the versions of AlphaGo itself. So AlphaZero would beat all versions of AlphaGo.
And that's quite interesting in itself, right? So this idea of generalization can also give you a little bit more performance, too, is the lesson that we also took from this, which is quite an interest-- maybe slightly counterintuitive.
Of course, because I'm an ex chess player, chess was always the thing I was eventually going to want to apply this to. And again, in chess, not only was it exceptionally strong, beating Stockfish 8, which was at the time the world champion chess computer of the traditional type, but it sort of invented a new style of chess where it favored mobility over materiality. So almost all chess programs until AlphaZero favored material.
So the classic cliche is that computer grabs a pawn because if it sees some material that it can take, it will try and grab it even if it then has to hang on for dear life and defend. And of course, computers are very good at defending positions because they don't make any mistakes, tactical mistakes.
But this was always considered to be a very ugly way of playing chess. So no one would say, oh, we've seen this wonderful beautiful chess computer game. They were ugly. They were effective but ugly. And so this is what changed about AlphaZero is not only was it better, they were beautiful games, or at least considered to be beautiful, like this game here, where AlphaZero is white.
And you can see what it's done is it sacrificed a lot of pieces in exchange and some pawns in order to totally imprison the black pieces and block all the black pieces in. So in fact, none of the Black pieces can move without losing something, even though the black is winning on material here. And in chess, that's called Zugzwang, that kind of position where any move you make makes your position worse. And this kind of position was never seen before. The famous YouTube chess commentators called this the immortal Zugzwang game because no one had ever seen a position like this, where you've got such powerful pieces like the Queens and the Rooks and they can't do anything.
So this is pretty incredible. How is AlphaZero doing this? Well, my speculation, and we've also analyzed it with some of the ex-world chess champions, and we opened up the system to them, like Vladimir Kramnik, and they helped us understand that. And we've written a paper recently analyzing, seeing if we can reverse engineer AlphaZero and find out what the rules are that it's built implicitly in AlphaZero.
And it's obviously better at evaluating positions than chess engines. And the reason is because it doesn't have to overcome these inbuilt rules. So in order-- if you think about why chess computers don't sacrifice their pieces a lot, it's because they have inbuilt tables, valuation tables that tell them Rooks are worth 5 points, Knights are worth 3 points.
So in order for it to sacrifice a Rook for a Knight, it will have to work out explicitly through its brute force search that it's going to get the 2 points back somehow, right? And if it can't work that out within its event horizon, then it won't do it, whereas AlphaZero doesn't have those input rules, so it's much freer to take into account the contextual factors of the current position and just say in this position, that Knight is super powerful. It's on an outpost. And my Rook's passive, so I'm just going to swap them. It doesn't have to overcome this inbuilt bias, if you like.
It's also better at balancing different factors against each other. So one of the problems with rule-based systems is that you have maybe 1,000 rules at this point in the best chess programs, and not only are those rules are they the right rules and are they expressed correctly, but also one has to balance those rules against each other, which is a super difficult kind of hand optimization problem. And also, probably, you'd want a different balance in different positions. And of course, a chess computer, rules-based computer is not able to do that, right? You have to have one set of balancing rules. But AlphaZero, again, is free to perhaps overweight things dynamically, depending on the position.
The other cool thing about these systems is they do a lot less search for the strength of the play. So a human grandmaster, our models are incredibly sophisticated so we only have to do a very small amount of search to come up with a good move. So maybe a human grandmaster does maybe a few hundred moves of search for every decision. State of the art chess engines, these brute force rules-based engines need to do tens of millions of searches per evaluate, per decision. And AlphaZero is somewhere in the middle. It's not quite as efficient yet as human grandmasters, but it's orders of magnitude more efficient than standard chess programs.
So if you're interested in reading more about this, there's obviously our paper, and also a whole book written by the British chess champion about all the different styles and the implications of that for chess that AlphaZero came up with. And in fact, you don't have to just trust my description of that.
Here's a couple of really nice quotes from Magnus Carlson, who's an amazing world champ, the current world champion, amazing chess player. "I have been influenced by my heroes recently, one of which is AlphaZero." And I think he was the first player really to integrate these new ideas into his own play.
And Kasparov said, "Programs usually reflect the priorities and prejudices of the programmers, but because AlphaZero learns for itself, I would say its style reflects the truth," which is a pretty cool thing for him to say.
So just wrapping this up, then, we've been very fortunate to make a whole bunch of big breakthroughs in game AI, starting with our Atari work back in 2012, actually, and published in 2014, where learning to play directly from the pixels on the screen and no other information, AlphaGo and AlphaZero, I just mentioned to you. And then finally, we tackled the most what's considered to be the most complex strategy game video game, StarCraft II, and reached a grandmaster level at that, as well, with AlphaStar.
So that was really the history of where we've been working on, on games, because they're so efficient to develop these algorithmic ideas, they're really easy to test. They're safe sandboxes, in a way. Like, it doesn't really matter if your AI does something wrong. It's just a game. But also, they are really good metrics to measure how are you progressing. Obviously, the win rate or the grading, or you can evaluate the quality of the move or benchmark it against the best human players in the world in each one of these games.
So the exciting thing was actually our shift in the last few years to start to use these AI systems to explore scientific discovery itself. And what we looked for is three criteria for a suitable problem that maybe we can tackle with these types of systems.
So first of all, we actually actively seek out problems that can be considered as massive combinatorial search spaces or state spaces. And the reason we like those types of problems is that means that exhaustive brute force approaches will not be tractable here, right? So we actively seek out these types of regimes. And we think we're particularly suitable-- our techniques are particularly suitable for these types of regimes.
Second of all, we look for a clear objective function or measure to optimize against, so that we can hill climb against it and figure out if we're making progress or not. And then finally, we look for obviously we need enough data in which to learn our models, and/or potentially the existence of an accurate and efficient simulator.
And-- ideally, it's an and-- and in most of the things that we tackle, we usually have both. It's very interesting, actually, though, that one of the things that we've got very good at over the years we've been doing this is that even if you don't have enough data to learn your model directly, you may have enough data in order to cobble together a simulator. And then the simulator can then be run, and to generate synthetic data, more synthetic data to augment your real data. And you can also analyze that data to see if it's from the right distribution and the quality of it.
And we've actually found that even if the simulator data is of less-- usually, it's not as good quality as the real data. As long as there's some signal in it, it's actually still pretty useful. And we actually used a little bit of a variation on this technique for AlphaFold, so it was actually an important part of AlphaFold, which I'll come back to in a minute.
So those are the three criteria, if you like. And it turns out if you use this, so you can use this criteria to look at what would be useful commercially, and we now have hundreds of products within Google that use DeepMind technology. So pretty much anything, any service that you use from Google will have some DeepMind technology kind of under the hood there, whether that's YouTube recommendations to your battery life saving on your Android phones. Or if you speak to a Google device and it speaks back to you, the text to speech systems are our system powering that underneath.
And so you can look into commercial areas to see what fits these criteria, and also, interestingly, scientifically. And it turns out if you do that, there are actually a lot of things that fit very well with this set of criteria.
And of course, the number one thing that we try to do that ticks all of these boxes is the protein folding problem. I'm sure many of you will know what this is, but for those that don't, it's the problem of going from an amino acid sequence, so the genetic sequence that describes a protein, directly to its final 3D structure that it attains in nature or in the body.
So this is super useful. Why is it useful? Because proteins themselves are essential to life. All of life's mechanisms depend on proteins. And their 3D structure is thought to at least partially determine their function, right? The 3D structure of the protein tells you quite a lot about its function. And also, if you want to do things like drug discovery, you generally want to target the surface of the protein, so you need to know its 3D shape in order to know what the right target areas are for you to target your chemical compound against.
So this is the protein folding problem. But the problem is that determining the structure experimentally usually takes years of painstaking experimental work. I think the normal rule of thumb is that it takes one PhD student their entire PhD to do one protein, and oftentimes it doesn't work out. And it's very difficult, painstaking, technical work.
Now famously, Christian Anfinsen, about 50 years ago, actually, almost to the day, during his Nobel lecture acceptance speech, he-- it's almost a bit like a Fermat's last theorem situation, where he had a throwaway comment in a way of like, well, yes the 3D structure of a protein should be fully determined by its amino acid sequence. So in theory, this should be possible, but who knows how to do it, right? So this sparked off a whole 50-year grand challenge, or quest in biology to try and solve this problem of protein folding, and going directly from this amino acid sequence to the 3D structure.
Now, one of his contemporaries, Levinthal, at this point formulated what's known as Levinthal's paradox. And the question is, the big question for protein folding is can the protein structure prediction problem be solved computationally, right, purely computationally?
And one of the issues is Levinthal's paradox, which is he worked out back of envelope, estimated that there were about 10 to the power 300 possible shapes a protein could take, an average size protein could take. So clearly, exhaustively sampling this is totally out of the question. And yet obviously, the paradox part of this is that in nature and in our bodies, these proteins spontaneously fold, sometimes in milliseconds. So obviously, nature somehow solves this very efficiently, right, in an energetic sense. So that points to the fact that this may be possible if we could attack it in the right way.
And for me, this has been a very long road, actually, to get into protein folding. I first came across the problem in the '90s as a student in Cambridge. One of my friends circle there was obsessed with this problem, the protein folding problem, even back then. And I remember at any opportunity in the bars and playing pool and in the college bars that he would talk about this problem being foundational, like if we could crack this problem then it would open up all sorts of new avenues for biology.
And that really stuck in my mind as he described the problem, and I thought it was a fascinating problem. And I also in the back of my mind felt it was the type of problem that AI might one day be suitable for AI to tackle. So I kind of filed it away, because obviously the AI techniques back in the '90s is when I was still writing computer games and doing my AI in the guise of computer games was not ready to tackle anything remotely like protein folding, but it really stuck to me as a fascinating problem.
And then the second time I came across it was actually when I was doing my postdoc with Tommy here at MIT, and this game came out called Foldit. Some of you might remember. It came out, I think in roughly 2008 from Baker Lab, and it was really, I think, the best and probably still the best example of what one would call a citizen science game.
So what happened with this game, Foldit, is that the Baker Lab turned the protein folding problem into a puzzle game, pretty esoteric but quite fun puzzle game for of that persuasion. And they actually got lots and lots of gamers to play this game and actually choose to make bends in the backbone of the protein. And you get an estimate of the energy score, right, basically. And you were trying to minimize this free energy.
And what was really cool about that is that-- well, firstly, I was fascinated by the idea of the general public having fun with the game, but really they were contributing to science, which seems like if we could actually tap that, that would be pretty cool for gamers and for science, I think. And so I was pretty fascinated by that idea. But also, I was quite intrigued that the best players of the Foldit game actually found some real structures, and actually apparently some pretty important ones that were published in Nature and Nature Structural Molecular Biology in the subsequent years.
And so I was thinking as I was watching this, looking at this in 2010, roughly that kind of time, well, what's happened there? How is this possible? And when you look into the kind of moves that the players made, they were making local moves that were actually increasing the energy, the free energy for the moment, because overall they realized later on they might be able to solve that-- actually globally, that there would be lower energy. And this is the reason why greedy energy minimization methods don't work, because they just get stuck into local maxima.
So they're obviously able to somehow use their intuition, I guess the pattern matching of these gamers. They weren't biologists, they were just gamers. But some of them are really good at pattern matching, and they presumably were using that to find their way, their intuition, if you like, to find their way to a good fold.
So then later on-- so that was filed away, too. And then later on, when we beat Lee Sedol in the Seoul match and I thought back to in the process of that what were we doing, really, with AlphaGo is we mimicked the intuition of these Go masters, right? And believe me, if you talked to these Go masters, that's all they've ever done in their lives. I play Go from the age of two or three years old, go to Go school-- they don't go to a normal school. That's-- you cannot get a more highly trained mind for a particular task than a Go master. I literally-- I think it's kind of impossible.
So if we were able to mimic their intuition in Go, then why would it not be possible to do something analogous in this protein space? So that gave me the feeling that this was definitely a crackable problem with our techniques.
So we actually started the AlphaFold project almost the day after we got back off the plane from Seoul with AlphaGo. And we actually found a bit of recording somewhere where I was talking to Dave Silver, who is the project leader now for Go out in Seoul, and I think we must have just won the third match or something. And we were literally discussing this like I think now is the time to push forward on the protein folding project. So that's what we did.
Another really important part of the story was the existence of this competition called CASP, Critical Assessment of protein Structure Prediction. And it's an amazing competition. It's considered to be like the Olympics for protein folding, and it runs every two years. And it's run by this amazing group of organizers who've been running it since 1994. So they've diligently kept this going for close to 30 years now.
And what they do is that throughout a year, experimental departments produce new crystals and experimental determined structures of proteins, right? But over a kind of three month period, what they do is they hold back publishing those results and they instead submit their newly freshly found structures into this competition. But it's a blind assessment, so the people, the computational teams that compete in this competition don't know the structure. Nobody, in fact, knows the structure other than the group, the experimental group that had just found it.
And then you get-- it's quite exciting. You get a sequence emailed to you and you have one week to get back with your best guess of the structure. And it's pretty fun, and you do it over a summer. And then at the end of all that, the organizers, they release the real structures and they measure how close were your predictor structures of all the different teams to the experimentally determined ground truth.
And so I want to give a big shout out to John Moult, who is the founder of this and the other CASP organizers, who work so hard over many, many years to maintain this competition. And I actually think it's a pretty good model for helping science to progress, and maybe should be thought about in other domains, too.
So historically, then, what have been happening at CASP? Well, it turns out when you look back on it that there'd been very little progress at CASP for over a decade. So here on this graph, I'm showing you the winning-- the median accuracy for the winning team at each subsequent CASP competition, starting in 2006 with CASP 7 and going to 2016 with CASP 12.
And on the y-axis, you can see it's a measure of the accuracy that the winning team got, and it's called GDT. I won't go into details of how it's calculated, but you can think about, roughly speaking, it's the percentage of residues that you have accurately put in the right position within a certain tolerance, certain distance tolerance, right? So obviously, you're trying to get close to 100% if you can.
And you can see here that basically there'd been almost no progress, and they were stuck around this 40% level that's basically a kind of useless structure. So this was sort of stagnated. And then we came along in 2018, which is the first one we entered after working on AlphaFold for a couple of years with AlphaFold 1, and we made this huge leap. And I think it kind of revolutionized the field. So we were the first-- it was the first time really that cutting edge machine learning had been applied at scale as the core component of a solution to the protein folding problem.
And AlphaFold 1 won the competition. But not only did it win, it won by almost 50% above the next best team. And you can see, so that would have meant that the teams were still tracking along the gray bars if it wasn't for this new advance.
And then of course, that's not where we wanted to stop. We wanted to actually solve the problem. And we had-- in 2020, we actually re-architected the whole system for the 2020 competition to reach atomic accuracy. And that was AlphaFold 2. So that was, again, another even bigger leap above AlphaFold 1.
So these were the results in terms of accuracy. So AlphaFold 2 sort of achieved this atomic accuracy at CASP 14. And we were told by the organizers back in 2016-17 that the magic threshold that they considered to be the threshold that needed to be crossed was less than 1 Angstrom error. That was the kind of critical threshold for it to be competitive with experimental methods, and therefore useful in practice. So that's what we were always aiming for, so less than 1 angstrom, so atomic accuracy, basically the width of a carbon atom.
And AlphaFold 2 crossed that line and actually achieve a 0.96 Angstrom error in the CASP 14 competition for the first time ever. And you can see this was still actually three times more accurate than the next best systems from some commercial ones like from Tencent, but also the best labs that work in this area, Baker Labs, Zhang Lab, and so on, which were still around the 3 Angstrom error in CASP 14, even though they'd already incorporated the AlphaFold 1 advances that obviously we'd published after CASP 13. So this led to the organizers and John Moult and the organizers declaring the structure prediction problem, protein folding essentially solved.
And here's some examples of the outputs of AlphaFold. Many of you will have seen this by now, but this was actually incredibly wonderful for us to see this when we got given back the results from CASP 14. And you can see how good our predictions are, the ones in blue, and the ground truth is in green. And they're overlaid over each other, and you can see they're highly overlapping, including on the left hand side, this is a protein from SARS that even the side chain, so these little thin stalks which are the side chains, not on the backbone of the protein, are accurately predicted, too.
And this is the architecture for AlphaFold 2. So there's a whole bunch of innovations that were required here. I haven't got time to go into all of them, but I'll give you the kind of key takeaways. If you're interested in the details there, you can check out our Nature paper that has about 60 pages of supplemental information. So it's maybe just a measure of how complex this system was.
And it was by far the most complicated system that we have ever built at DeepMind. And we built some pretty complicated systems, but this is, I would say, the most complex one ever. There were actually 32 component algorithms that went into AlphaFold, as I mentioned, the 60 pages of SI.
And there was no real silver bullet in the end that solved protein folding. It actually needed a whole host of insights and innovations, and then those innovations had to be carefully integrated together. But the key takeaways were, and the key advances of AlphaFold 2 compared to AlphaFold 1 was that, firstly, it was an end to end system with a kind of iterative recycling stage that actually builds the structure piece by piece and builds that out. And that allowed us to go directly from the amino acid sequence directly to the 3D structure.
With AlphaFold 1, there was an intermediate step, which is that we went from the amino acid sequence to what's called a distogram, so a pairwise distogram of all the residues to all the residues. And then there was a kind of Newton method to optimize that, a numerical method that actually took the longest part of it. It took like one week per protein. And both those two systems had to be optimized separately. And those of you who work on machine learning will know, be familiar with the idea that the more end to end you can make things, generally the better it is. The kind of optimization that you do seems to flow better. And that was definitely the case here when we were able to optimize directly for the 3D structure.
We also used an attention-based neural network rather than convolutional neural network. And that was key because we were using a convolutional neural network for AlphaFold 1. And of course, if you think about it, that's the wrong type of bias, right? Like obviously, pixels on an image, if they're nearby, they're pretty correlated in their properties. But actually, in fact, that's not true at all, necessarily, of amino acids that are near each other in the sequence, right? Obviously, it completely depends on how the backbone is going to bend round.
So this attention-based neural network would infer the implicit graph structure. And then finally, we were able to build in-- and this was critical-- evolutionary and physical constraints into the model architecture without impacting the training and the learning, right? So that was pretty difficult to build in these constraints without hurting the learning. But in the end, that turned out to be totally essential.
So it was a kind of huge research effort, took over five years, at its peak, about 20 people, highly multidisciplinary team. So we have lots of biologists and chemists and physicists on the team, as well as obviously machine learning and engineering. And the other thing to note is if the output of a particular applied system is in itself really valuable like we thought this would be for biologists, then one, maybe you should relax the constraint of being general.
So normally, with all our machine learning research, we try and be as general as possible. But here, we wanted to just solve the problem by any means. So we used all of our general purpose knowledge and general purpose algorithms, but then we actually built in quite a lot of domain knowledge in this case, like the evolutionary physical constraints, because we just wanted to solve the problem here.
In terms of speed and scale, actually, the system turns out to be incredibly fast, especially inference. So it takes about two weeks on 8 GPUs, roughly 150 GPUs to train, so actually a fairly modest amount of compute by today's standards. And the inference is super fast, so order of minutes for an average protein on a single GPU.
So we were thinking about how to give broad access to the biology community using the system. And normally what you do with these systems as you put them on a server, and then people submit their amino acid sequences and you give them back a structure, right? But actually, we thought we could do a lot better than that here and just fold everything for everyone immediately.
And we wanted-- so we thought, OK, what should we do? Because it's fast enough, let's start with the human proteome. So that's what we did, and we actually folded this over Christmas, I guess, 2020. So we had this idea straight off the competition, lots of computers are sitting idle over Christmas, so we just set them going on the human proteome and then came back in the new year and it was done, which is something I love about computers. While you have your Christmas lunch, they can be doing useful work for you.
And so we got all the predictions. Actually, we ran this twice to make sure, and we got all the predictions for all of the proteins in the human proteome, which is the equivalent of the human genome, right? So the human genome codes for the genes, which code for the proteins. And there's about 20,000 proteins in the human body expressed by our genome. So we followed them all.
And just so to put that in perspective, experimentally, after 40, 50 years of painstaking experimental work by the entire biology community, we have determined about 17% of the human proteome. So overnight, we doubled that to 36% with very high accuracy. And by very high accuracy, we mean sub-angstrom accuracy, so equivalent to experimental quality.
And then furthermore, 58% we determined at high accuracy. And high accuracy means that the backbone can be trusted, right? Maybe not the side chains, but the backbone, the general overall structure is probably good. And that can still be useful, maybe not for certain types of things, not for drug discovery, for example, but it can still be useful for fundamental research.
And then one interesting thing is which we've had a lot more data on since we've released all this last summer is what about the other 42%? What's going on there?
And actually, we suspected that most of those were not errors or deficiencies in the system, but were actually indicating something different, which is that those proteins were unstructured in isolation. So there's a lot of proteins in the body that actually don't have a structure until they interact with whatever they're supposed to interact with, and they're kind of floppy before that. And actually, when we looked into this, it was very high correlation between the known areas of intrinsic disorder and where AlphaFold was not good and it knew it was not good.
So this, taken together, is the most complete and accurate picture ever of the human proteome. And we knew we wanted biologists to use this, some biologists who maybe would have no interest in how AlphaFold worked or not have the expertise to worry about that, but we wanted them to trust the downstream data and to use it in their work. So we knew that it would be really important to create a very simple confidence measure that biologists could look at and understand really simply without understanding anything about the underlying architectural details.
And this is always something we would like to have in general with our machine learning models, a kind of measure of uncertainty and how uncertain the system is about its own outputs. But in this case, we thought it was critical. And actually, we developed a system that-- we actually made a slight tweak to AlphaFold that also extracted its own confidence metric about how confident it was about each residue position. And greater than 90 basically means that you can rely on it, it's kind of experimental level, and greater than 70 is this high accuracy regime, and lower than 50 may indicate disorder.
So we founded another 20 model organisms, or the most important ones for research, the mouse, the fruit fly, zebrafish, and then also some major diseases, tuberculosis, and then some big plants like rice and wheat. And we're starting to cover more and more of the biology space.
So we also did, in the first half of last year created a database with EMBL-EBI, the European Bioinformatics Institute, based in Cambridge. They build already a lot of the big databases, like Uniprot, and we went to them and collaborated with them to build the AlphaFold Protein Structure Database to host all of these structures in 3D and all the other tools and link in to all of the other genetics tools and other tools that biologists use every day, so it just becomes part of their everyday workflow. And we did a big first release with 330,000 total structures for these 20 model organisms plus the human proteome, and we put it out there for free and unrestricted access for any use, commercial, academic, freely to use it to maximize the scientific impact and benefit to humanity.
We now have more than a million predictions in the database. We've done 440,000 annotated ones from what's called Swiss-Prot. We've also indexed heavily early on neglected tropical diseases, because they're often very understudied. Big pharma doesn't do much research on them because there isn't much money in it, but it affects millions of people around the world, and it's terrible for their quality of life and so on.
So we actually work with the WHO to prioritize-- this is what we're doing in-house-- prioritize these tropical diseases like Chagas disease, leishmaniasis, that affect millions of people around the world and are understudied. So that means that the nonprofits can now start with drug design because they have the protein structures.
And we've seen it already within barely like nine months now, huge amount being done by the community with AlphaFold for modeling very complex biology that was not modelable before. So this on the left here is an image of the nuclear pore complex. This is what allows things in and out of your nucleus of your cell, and it's one of the biggest proteins in the body. It's really huge.
There's using AlphaFold as a state of the art protein disorder predictor, so many people are now using it to predict disorder. Another project with the WHO is to-- they identify the top 30 pathogens that might cause a future pandemic, and they wanted all the protein structures for those, so we've given that to WHO. And actually, it's helped experiment us a lot to solve their own protein structures because sometimes they have low resolution experimental data, and if they can also use as another source of input AlphaFold's prediction, they can use that to combine that to resolve a very high resolution image of actually their protein.
And there's actually now hundreds of already other papers and applications that have been using AlphaFold, both within industry and also academia. And the impact, be really big. It got listed very nicely in the Breakthrough of the Year by Science and also Method of the Year from Nature. And the database has been used now by over 350,000 researchers. And we think there are only about 400,000 biologists in the world, so we think that pretty much everyone has looked at the database and is using the database. And so it's super gratifying for us to see that.
So now, where are we going next? There's actually over 100 million proteins or actually nearly 200 million proteins known to science. There's all these expeditions that go to the Indian Ocean or somewhere like that and they dip a bucket into the water and you pull it out, and if you sequence what you get you actually get whole loads of new proteins and you stick them in the protein databases, but no one knows the structures of any of these things, right?
So now potentially, we could be in a regime for the first time ever where proteomics can keep up with genomics, right? So you can have as many proteins to know about, protein structures as abundant as we know about genetic material. So that's a really interesting thing, what then one might do a met level about looking at this enormous data set, and I'm very excited to see what protein biologists end up doing with that. Yeah, so then function would be the next thing to look at.
So thanks. I wanted to say thanks to everybody, amazing teams who made AlphaFold possible. There's the methods team, everybody listed on the left there, and John Jumper leading the technical part of that team and the human proteome side and with Kathryn leading that project.
And I also want to thank, obviously, our wider colleagues at DeepMind. Like, almost everybody at DeepMind in some way contributed to this, and also Alphabet, the collaborators at EMBL-EBI, CASP community, and also the PDB, of course, in the experimental biology community who created those first 150,000 structures that then we were able to learn from.
And I just wanted to mention about the augmentation part. So it turns out one of the things we had to do with AlphaFold was do self-distillation, we're calling it self-distillation, where we actually train on 150,000. We output some predictions of some other proteins. And then we did a filter on the high quality ones, or at least the ones AlphaFold thought were higher quality, and we put that back in the training set.
And actually, that helped a lot. And we needed about 250,000 to 300,000 synthetic, in this case, predictions in the training set. It was very helpful and quite important to get the last few percentages out of AlphaFold.
So I'm just going to end, then, by talking a little bit, a couple of slides on the future. So now we're working on many other related projects, protein complexes, disorder proteins, and point mutations, mostly dynamics of proteins you can think of, and also protein design. So there's a whole area, I think, that has been unlocked with AlphaFold, and we're going to push further on. And others are pushing further on all these topics.
And there's just an observation on biology, really, which is that the more I worked on this and spent the last five years looking at the exquisite biological system that goes on in our bodies, it's kind of amazing to think that it can even work given its complexity. And I think that one way to understand biology is at its fundamental level, it's an information processing system. I really believe that's the kind of fundamental way to look at it.
But of course, life is a very complex process and it's an emergent process. And I think because of that, it's pretty difficult to describe mathematically in a kind of clean way. Like, if you think about trying to describe a whole cell, is there ever going to be a Newton's laws of motion for a cell? I don't think so. I think it's too complicated and it's too emergent.
So in a way, it could be the perfect regime for AI to operate in. And I think as maths is to physics, maths is like the perfect description language for physics, I think AI could be the perfect description language for biology. And I hope that AlphaFold will be not just useful in itself, but also kind of proof of concept of maybe a heralding of a new dawn of what I call and others call digital biology.
And the dream would one day be maybe to create a virtual cell that one could make useful predictions in, in silico, right? Obviously, we're very far from that, but AlphaFold is maybe the first tentative step. And then, of course, I haven't got time to talk about all of these things, but we've had a really amazing year of applying these types of AI techniques to quantum chemistry, pure mathematics proving some pretty major conjectures, fusion, controlling the plasma in a fusion m and genomics.
And if it sounds a little bit science fiction, this list, these are all real papers. You can go and have a look at them. And this was always the flourishing that I had imagined when we started DeepMind was that if we crack this in the right way, this kind of diet, this kind of slide should be possible, even though it seems like perhaps science fiction.
So I'll just end then by saying in my view, AI used properly could be the ultimate general purpose tool to help scientists see further, much like the Hubble telescope was for cosmologists to see further into the universe. Thank you very much.
[APPLAUSE]