Machine Learning Accelerating Scientific Discovery
August 20, 2019
August 16, 2019
All Captioned Videos Brains, Minds and Machines Summer Course 2019
Phil Nelson, Google Research
GABRIEL KREIMAN: Welcome, everyone again to a very special evening lecture by Philip Nelson from Google, Google Accelerated Science. It's a great pleasure to introduce him to our evening lecture series. He gave several great talks at MIT and in our summer course already.
In our ultraspecialized world, we have scientists who focus on subunit alpha of a particular protein in layer 4 of pyramidal cells in mice, only on Tuesday mornings. In stark contrast to that, Philip Nelson is a true Renaissance man. He started-- I think he started sometime at MIT. He worked on hip prosthetic devices, working with people at the Harvard Medical School. Then he went on to shed light on problems in optimization, on genome sequence. And I don't know how many other amazing things. There are very few people in his world that are so amazing that they can touch upon a wide diversity of topics.
This breadth does not come at the expense of depth. At the same time, he's extremely deep, as you will see from the discussions today. I don't want to foreshadow a lot of the amazing things that he will talk about. But one of the surprises to many people in the last 70 years is how one can apply these ideas from deep learning to an enormous range of problems, from discovering new planets, to trying to predict heart disease from pictures of the eye, and on all sorts of other things.
So it's really an explosion of different ideas that has been championed by Philip Nelson and his team. And I'd like to encourage people again to ask questions and to interact with him, both now, as well as during the reception that we'll have afterwards in the Swope Terrace right after the talk. So without further ado, thanks again for coming and I look forward to your talk.
PHILIP NELSON: Thank you for your too kind introduction. So as an MIT undergrad, it is such a thrill to be here I got to tell you. So this talk it's-- I now realize it's Friday at 8:00 o'clock. This might be sort of like the warm down lap, the cool down lap after a tough week. I'm going to try to stay like very sort of practical and very big picture about what we're doing.
I'm going to reference a lot of links, a lot of work. So if you go to that link up there, there's a document with everything referenced. So don't worry about taking notes. All the links, and the articles, and much, much deeper info is there.
So what I'm trying to do tonight is I'm going to talk a little bit about my team, and what we do, and they're ridiculously fun mission we have. How many people, just by hands, are very familiar with deep learning? OK, so about 2/3.
So I'm going to do sort of a fairly quick overview. I think it's sort of interesting to, even when you know something, to hear what people choose to talk about and what they don't talk about. So maybe my editing of this talk will give you some information.
I'm going to just go through some quick consumer examples and tell you why I think they're interesting. I'm going to do a slightly deeper dive on some work that we've done in medical imaging microscopy. And then, based on time, I'm going to give you a survey of a variety of other science projects. And kind of what I'm hoping to do-- I'm hoping what you get out of this is a lot of the lessons learned along the way. And in theory this stuff is amazing. In practice, it's often harder. And hopefully we'll go through that.
So that's kind of what I'm going to cover. And I'll try to stick within time. I know tours that go late. But we'll see.
So a bit about my team. My job is just ridiculously fun. So we have access to all of Google's technologies. And our job is to go out and solve hard science problems. And so we started as a very small group, almost like just an experiment. And we've had a couple of really interesting breakthroughs. So now they give me more resources, which is fun. And people-- it's like with such a wide open purview, know the selection criteria becomes difficult.
And essentially the key to understanding what we're doing is impact. We want to change the world. We don't just want to write papers, though writing papers is critical for getting there. We want to actually change the way things are done. And hopefully, you'll see that in some of our projects.
So we work with a lot of people. We actually-- I don't know if any of you have dealt with Google. But it's sometimes hard to do business with big companies. And I've sort of carved out this island. So it's fairly easy for us to do deals and all that. And again, one of our practical things you have to think about as you sort of start building out this technology.
Google is very focused on education, especially for the students in the room, around internships and AI residency. And I strongly, strongly encourage you to apply, to come look at-- this info is all in the document I linked. But for 35 or so scientists, I think we had a dozen interns this summer. And I think we'll get even more next summer. So if you're interested in applying machine learning in a sort of scientific or health care context, please reach out to us. That's my last sales pitch for the night.
So this is called Amara's law. Bill Gates repeated it. I think it's actually very telling. So we're in this hype cycle. And there's pictures you see of the hype cycle. Everyone's overexcited. And then there's this trough of disappointment. And then sort of things pick up again.
And we've been through this before. But if you look back, the '80s were about the PC. And the '90s were about the internet. And 2000s were about mobile. Like AI, machine learning, is the technology of the decade.
And this is just Pope Benedict in 2005 and Pope Francis in 2013. It's a different world. You can argue about good and bad. But we definitely live in a different world. And AI is going to usher in a very different world, too.
This is another quote. It turns out -- I put this up. And someone from Caltech said, there's a terrible story about this person. You should not mention him by name. He was bad-- bad story.
But anyway, the quote is accurately-- people give Yogi Berra credit for this quote. But it's actually him. But essentially there's no difference-- in theory, there's no difference between theory and practice. But in practice, there is. And a lot of this talk I'm going to sort of weave in the practical lessons that we learned.
So why now? What happened? And again, apologies if this is sort of review.
So the traditional way of writing code was you wrote rules. You wrote lots and lots of rules. And Doug Lenat and the Cyc project was writing lots and lots of rules. And eventually these systems just fall over. It's very hard to write the rules. The rules start counteracting each other.
And so machine learning has been around for quite a while. And the basic idea is more like can you somehow turn your data into a vector? And then at least for supervised learning, if the credit card fraud, if all-- basically you get-- when credit card fraud happens, you don't find out for three months.
The person has to get the bill. They have to complain. They have to adjudicate it. You'd like to predict the fraud right when the transaction is happening. So you get labels on what was fraudulent in the past. You try to learn formulas on the features of the data.
And it worked reasonably well. And year after year, there was thousands of papers. There was PhDs given year after year about slightly incremental improvements. You know, it's one thing to maybe do features on a credit card. It's actually not that easy. But how do you do features on an image.
So there was histogram of gradients. And there was all of these innovations, support vector machines, really, really fascinating work. And we were making pretty good incremental progress.
But it turned out, the classifiers at the end, learning the formulas-- you just try them all. There's a lot of different techniques. Just try them all-- and whichever one works.
The challenge in general was turning your data into features. How do you describe the data in a way that the computers can work on it? And like this is just a great example. I mean, how would you do features for this? Yeah, you can start. But you're not going to get very far.
So deep learning-- neural nets have been around for a while. And Minsky, from MIT, famously wrote about the mathematical limits of what the perceptron architecture could learn. But a few people sort of kept at it, Jan McEwen, Geoff Hinton. A bunch people sort of kept at it.
And about 10 years ago, there was kind of a set of breakthroughs in deep learning. And part of it was in the network architecture. A lot of it was in the mathematics of training. And quite frankly, a lot of it was just in the scale.
The networks don't learn for a while-- for quite a long time. And then they start learning. And they learn in many cases amazingly well. And so we just needed the new generation of hardware.
And two things that are going to come up again and again-- and I'll keep pointing to about some of the powerful things that deep learning can do-- is one is it can learn the features directly from the raw data. So even if you think you knew what the features are, if you give deep learning enough examples, it has often surprised us-- and I'll show you a bunch of cases of this-- with the features that it found.
The other thing is that it's able to make end-to-end predictions. So there's a blog post out from Google about a system called Parrotron, which basically goes from speech directly to speech in a different language or in a different voice. And you can imagine doing automatic speech recognition, you go speech to text. Maybe you can translate the text and then reconstruct the speech. But you've lost tone of voice.
And there isn't really a good ontology or a good representation for tone of voice. It gets lost in transit. But if you could go directly from one to the other-- these networks seem to be able to model all of these intermediate states. And again, both of these things will come up again and again.
So here's a quick visualization. That grid on the lower right is the actual-- what you're looking at is for a layer in the network, what input stimulus will maximally light up a node. And these things were learned. The network started randomness. In this case, this network was trained to think on faces, to recognize faces.
And those are not lines that an engineer would draw. But what's interesting is people have taken retinas out, and bathed them in oxygenated fluid, and sort of electrically measured them, and put pictures in front the retinas. And this is what they believe the neurons behind the photoreceptors are actually doing, that they're detecting lines. And if you have a baby, and you put the sharp black and white contrast things in their crib, that you're told to do, it comes from the research that did this.
So again, the network just learn that by itself. In higher layers, it learns corners, and edges, and eyeballs. And it learns all these features on its own.
So just a bit about why? You know, flight had to wait. I think da Vinci knew how to fly. But until we had the internal combustion engine, we couldn't really fly.
And the hardware has mattered too. It started with CPUs. Then GPUs came out. And now there's this whole-- it's not just Google-- but there's a whole bunch of-- it's a new generation of hardware around deep learning.
And what's interesting about it is if you're just sort of training the networks, the key to this hardware is that you don't need high precision. If you're just sort of walking a gradient, and you're saying make it bigger or smaller, you can use 16-bit floats. And even for the imputation, you don't need to be that accurate.
So that this hardware is ridiculously, blazingly fast on very low-precision arithmetic calculations. And it does lots and lots of them, which is what you need for training. And I love hardware. So I put these pictures up, water cooled.
And Google is now racking out what would have been the biggest supercomputer a few years ago, like every week, just for training. And my team uses-- you know, a few years ago before, the TPUs came out, we were using on order of billions CPU hours. And now it's like-- it's not even worth counting. And we actually get yelled at because we're not using enough cycles. And we waste a lot of cycles. So that's one of the fun things about being at Google.
And this hardware is coming out on the phones too. So there's a whole new generation of chips. Not so much for the training, but, yeah, at least now for imputation. But there's going to be a lot of training that you can see on the devices too. And this is sort of one of the secrets to how it's going to be more private, is that you'll be able to do incremental training without ever shipping the data back.
So now you can do imputation. All the new phones have imputation engines. But there'll be training engines on the phone too, sort of a federated learning scheme. There's a lot of work going into that.
So let's go through some quick consumer examples. I travel a lot. And this app is amazing. If you haven't used Google Translate, please download it and try it. This is a pretty amazing version, especially if you don't know how to type the language that you're in.
There's actually four different predictions going on here. One is, which pixels in the image have letters? The second is to turn the pictures of letters into actual letters. The third is to translate the letters. And then the fourth is to pick a nice font for when you replace it.
And that's pretty incredible. I mean, I go to Japan. And I can go-- I leave the Western hotels. And I go where the locals are eating. And I can pretty much get by. And you're every once in a while you get custard instead of tea, but it works pretty well actually.
But the important point on this one also is that it might take a fleet of computers to train the models. But you can get them running locally. So this is critical for medical devices, for smart devices in the home, all sorts of things.
I'm waiting for the device to come out. My mother is older. And I would love it if there was a device that could detect when she was in trouble. But there's no way that she's going to stream the video and audio from her living room up to the web. But imagine if there was a local model that could really very intelligently detect that. So I don't think that's very far away.
So this is funny to us, the smart reply. So in 2009-- you know, Google has this tradition of April Fools jokes. So in 2009, the April Fool's joke was let's do something that writes the email.
So we shipped this in 2016. And it very quickly became a significant double digit percentage of all email sent. And again, this is not picking off a list. This is like if you use it for a while, it starts speaking in your voice. It's kind of creepy actually. It's works really well.
And then, two years later, we now do it on every keystroke. So you can imagine-- for the engineers in the audience, you can imagine how complicated these systems are. We're literally doing it on every keystroke now in Gmail. And it works really well. So it's not just that it's amazing. It's moving at an amazing pace.
Yeah, it's great.
So Go is another one that I think is really interesting. You probably have heard about the big contest. And essentially the way this network worked-- you know let me contrast this with IBM and Deep Blue. So that was an absolutely amazing technical achievement, I don't know, 25 years ago, or how long it was. But it was almost the epitome of the old style of programming.
They memorize the openings. They memorize the closes. They wrote a lot of rules about chess, about points. And then they sort of played through the games.
And when they beat Kasparov, we didn't learn more about chess. We just learned that computers were finally able to tough out a human. And Kasparov was pretty pissed off, too.
The Go thing was completely different. Go is a very simple game. You have white and black stones. You just put them on. If you surround territory, you own that territory. And whoever owns the most territory at the end wins.
Incredibly simple game, but very, very complicated strategy. And people had expected that it would be another 10 or 15 years. In terms of the number of games you can play are hugely-- many, many, many more orders of magnitude than chess. And the strategy-- like you could have a configuration, which is really powerful, but can be blocked with one stone. Or you can be weak on one part of the board and really strong in another part of the board. And how do you write the rules that combine that?
And the amazing work that the team-- our London team did-- is they basically built two networks. One could look at the board and predict where the next move would likely be. And then the second network, which still almost feels magic to me, is it could look at a board with all its complexity and just predict a number between 0 and 1, whose advantaged, white or black? And it integrated all these complex strategies, all this-- like where you are, how you are on the board-- absolutely amazing.
And when we played the South Korean champion, there was two games that were really shocking. One is where he beat us. And another where-- they called it the God move. It was the most amazing, unexpected move. So behind the scenes, both of those were like 1 in 10,000 moves on the next likely move probability graph. It was just very surprising.
But a few years-- less than a year later, when we played the absolute world champion, a young Chinese gentleman, it was never even close, like we beat him in every game. And what was really interesting is that-- in interviews afterwards, of course he was upset for having lost. But this person he's really a remarkable person. His whole thing was about understanding the game of Go better.
And what was kind of magical about AlphaGo was that it discovered completely new ways of playing, that no human in 3,000 years of playing this game had ever seen before. So there was a couple of things, like-- the computer was doing what people were calling slacker moves. And it was doing these safe moves that a human never would have done.
And it turns out that the computer is just as happy to win by one as it is to win by many. And it caused people to rethink, hey, maybe we're playing this game overly aggressively. And these slacker moves are-- I think the computer showed they are the right thing to do.
There was this other, like fifth line attack that nobody had seen or nobody hadn't understood before. And one of our hopes in our team of applying deep learning in a scientific context is, can we do the same things? Can we see things that have never been seen before, and at least bring them to the attention of the scientists to understand? Because at the end of the day, these things are just correlation engines.
They're statistical correlation engines. And correlation is not causation. But correlation is highly correlated with causation.
So let the machine dig up the correlations. And then some of them might actually be causal. Unfortunately, I'll show you a bunch of cases where the correlations are totally wrong. And you have to be extremely careful when you're using that.
So one less consumer example is-- again, this sort of partially reflects on the rate of change. But less than two years later, the team in London came out with AlphaGo Zero. So AlphaGo was bootstrapped on human games. And then it just started playing itself. And they leapfrogged versions to get better and better.
With AlphaGo Zero, they just explained the rules of the game. And the machine start playing itself. And in a very short period of time, relatively speaking, it became way better than the one that was sort of-- I don't know, maybe poisoned by human games. It was absolutely incredible. And in the few hours of training that they had left, they explained it-- they taught it the rules of chess. And it very quickly became the best chess player.
Now, realize for 25 years, the best chess player has been the product of large numbers of programmers writing chess programs, and checking in some small optimization. And somebody will figure out, oh, this is how to do it better, like the collective work of thousands of brilliant people. And in a few hours, this thing just blew past it. So if you think about how a lot of things are going to be changed by machine learning, the job of being a programmer is going to change quite significantly also. And I'll show you that an example in genomics.
So we're in interesting times. Oh, wait. Before I go to scientific examples, this is another good one. So how many people have heard of DeepDream? Yeah, OK.
So what's happening here is we have an image. And you run it through a recognizer. And one of the things you can do is you can start changing these images. So you start modifying the image. And what was done here is to say, let's pick a level.
This was a fairly low level in the network. I don't remember exactly. In the sort of 23 level, deep Inception network, this may have level 5, or 7, or something like that. And modify the pixels in the image to better light up those nodes.
So you start to see-- there is a funny story-- they called it DeepDream-- about what is the computer dreaming about. And here's another great example. So you see these beautiful landscapes. And you see the waterfall back there.
So when you start dreaming, it was just like, OK, it thinks it sees a mouse. So let's make it more mouse-like. And this waterfall becomes a bird. You can kind of see that, and the dogs, and the faces, and the pagodas.
So it's like-- it's kind of cool. There's actually an art exhibit about that. And you begin to visualize what the network is doing.
But then this happened. So when the system thought it saw a barbell, it started materializing barbells according to what-- know changing the pixels in the image, so the computer thought it was more like a barbell. And the barbells had arms attached to them. Which makes perfect sense, because the images that it was trained on, there was a lot of arms attached the barbell. So the computer-- well, arms and barbells were brought together.
But it's a really important point. Because it's very easy to look at these networks and say, wow, they're so smart. They must be doing what I'm doing. And again, they're not. It's just statistical correlation. There's no higher reasoning. I mean, people are working on that.
But a person would never make this mistake, right. There's metal. And there's flesh. And there's-- they just don't go together. But the barbells had arms.
So you have to be really careful. You get amazing results. But don't anthropomorphize what these machines are doing. Like check it, investigate it constantly.
So then GANs came out. So GANs is really cool. If you think about machine learning, when you're training the model, you're like, well, there's a cat in this image. So I want you to say 1.0 for a cat. And it says 0.5. And you tweak the weights to get it closer to 1.0. But your loss function is pretty straightforward. Is it a cat or not?
But Ian Goodfellow sort of thought this through. And said, our loss function doesn't have to be just the simple 1.0, or does it match the label. Why not train a network as the loss function. For example, train a network on saying, does that photograph look realistic or not? Does this face look like a celebrity's face or not?
And then essentially you now have these generative models, somewhat like with DeepDream, where you can start generating images and training images. And you essentially train a forgery detector. And then you have a forger. And you lock the two in the room together until the forger can fool the forgery detector.
And this is really wild stuff. I don't know if you've seen it. And I've got a bunch of links in that document about-- there's a video floating around, where you can basically make Obama say anything because you can synthesize people's voice now with these generative models. You can synthesize these images.
So in the world of fake news, this is quite interesting. And I was talking to my son about this. He's like, dad we survived Photoshop. So we'll survive this. But this is a big deal. This is really interesting.
But there's a lot of use for this-- like this was The New York Times case about synthetic celebrities. There was an article just today that-- fully synthetic actors. If you do design in video games, a lot of video game landscapes and designs are going to be built by GANs now.
But there's really cool stuff, too. So there's a CycleGANs. So one of the things you can do is say this looks like a horse. This looks like a zebra. And then modify the images to basically turn the horses into zebras and back again.
And we actually use this in a medical context because-- like, for example, we discovered-- I'll get into this in a bit. But we discovered that doctors were terribly misdiagnosing this disease called diabetic macular edema. And it's sort of the swelling of the macula. And we were able to get sort of like really good ground truth data for this.
And then we wanted to teach the doctors what they were doing wrong. So we were able to basically train a GANs to turn on different levels of DME, macular edema. We could take an image and give it macular edema or have one with it, and turn out-- until we could sort of teach the humans how to do it better.
So there's interesting uses for these technologies. But it's crazy. I don't know if you've seen this. But you can turn photographs into things that look like a painter. They've taken original paintings and turn them into photographs. It's kind of like Monet's beautiful landscapes turned into photographs. It's upsetting for some reason.
So let me switch gears now. Is this about the right pace? Is this good? Yeah, OK.
So let me go into medical imaging a bit. So I'm going to quickly touch on a few of these. I'll try to go quickly.
So roughly about 10% the world is going to get diabetes it's a bigger-- it's a growing problem in the developing world because of the improving diets. I guess you can have worse problems than that. But diabetes is bad.
And when you get diabetes, it can damage your blood vessels. It's why people need amputation. For about a third of diabetics, they get a disease called diabetic retinopathy, which is a breakdown of the blood vessels in your eye. And for about a third of them it's vision threatening. So it sort of 10%, a third, a third. It's about a 1% prevalence disease. But it is the fastest growing cause of blindness in the world.
And diabetic retinopathy is just evaluated on this five-point scale, from none to proliferative. It's not symptomatic until it's proliferative. And if you could see these images, like the orange spots that you're seeing, what's happening there is the plasma is leaking out of the blood vessel and then drying. And they call that an exudate. And the dark spots you're seeing, those are actually hemorrhages where the blood vessel broke and it clotted.
And there's just not nearly enough doctors. I mean, nobody should go blind from this. And way too many people go blind from this.
And they've been setting up clinics in India, in Thailand, and Indonesia, all over. And then trying to train people to read those images. And it's actually quite challenging to read those images. So DR is a wonderful starter problem for machine learning because you don't actually need the full patient record. If you get the image of the eye, you could just diagnose it from there.
So we did this in a sort of typically Google fashion. I traveled the world and got lots and lots of images. We hired 50 plus doctors to grade them. And what was interesting is we tried lots of different machine learning models. And Inception before, the same thing that won the ImageNet contest was more than good enough to do this. So there was no magic in the ML. It was completely a data gathering and data labeling exercise.
So we did this. We hired all of these doctors. We labeled all these images. And we didn't want to publish this in Nature. We didn't just want this to be a technical project to show, look, we could do this. We wanted to start changing the practice of medicine.
So it was a very, very long negotiation to get this paper published in JAMA. We actually had to do a clinical trial in India to prove it out. And this became sort of a landmark paper, that sort of opened the floodgates of the FDA now changing sort of the approval process for some of these things. And we essentially were working as well as a board-certified US opthalmologist.
And if you're going to take one thing away from this slide-- like this talk-- this is actually a critical slide. So what you're looking at here is the columns are the best of the ophthalmologists we hired. The rows where patient images that were selected because they were challenging. And the color was the diagnosis.
So the bottom two are clearly sick. Every doctor agreed that they were sick. And the top one was probably healthy. But this rainbow of diagnoses, the two in black got every single diagnosis. This is how medicine is practiced today.
And I showed this slide to engineers. And they were all like to a person, like we have to do better. This is crazy. And you show it to a bunch of doctors. And they're like, well, what do you expect? We're human. It's not easy to do this.
And the intergrader consistency-- we would give the same image to the same doctor a week later. This is only a five-point scale. It was essentially only two and three that we would get the same answer from the same doctor on the same image, within a week. Across doctors, it was 60%, which actually was not as bad as we had thought it would be.
I showed these to some of the top pathologists at one of the top medical centers in the US. And they're like, we never get 65%-- like 60%, we wouldn't. We get 30%.
And my jaw hit the floor. And it's like, well, it's not really that bad. Pathology grading is much more technical. And there are way fewer therapies than there are grades. So in terms of therapy concordance, it's more like 85%. So what that means is that you have a 1 in 6 chance of being treated differently if only a single pathologist looks at your image than if, say, a group of pathologists look at it. So the one takeaway is get a second opinion in pathology.
And I've given this talk before. And I have had very prestigious people and well-known scientists come up to me and say, my wife's best friend had a biopsy. They did a double mastectomy. They always do.
We do the pathology after they remove the tissue. And she never had breast cancer. So she'll win a big lawsuit. But I think she would rather be whole again. So this is what's going on today. So we have to fix this.
So the thing about diabetic retinopathy is it's kind of a slow moving disease. If you don't catch it now, you can catch it next year. The current estimates on breast cancer diagnosis-- so the reason why we're making good progress in cancer is because we're able to identify different substrates of patients. And say, look, you have HER2 positive breast cancer. It's very different than you are positive or PLR positive. And there's the therapy specifically for that kind of cancer.
So for HER2 positive-- and I'm very sensitive to this because my mom had HER2 positive of breast cancer-- there is this amazing drug called Herceptin from Genentech. And it's a miracle drug. It knocked her completely into remission.
The way they determine whether a woman is HER2 positive is they do an antibody stain. And it fluoresces. And the pathologist reads it. The current estimates are about 10% of those patients, it's misread. So either the woman is getting Herceptin, which has side effects. And it's very expensive. And it's not going to help her.
Or even more tragically, here is this miracle drug, on the shelf, that took decades of work and billions of dollars in treasure. And she's not getting it. She's dying because-- a misread. But you know, it's not the pathologist's fault. This is really, really hard stuff to do. So anyway, we need to fix this. We absolutely need to fix this.
So what we realized going into this-- again, I told you how it's a data problem. So far from 2016-- we'll come back to this-- from 2016 to 2018, when we have the newest version, we basically went from being as good as a general ophthalmologist, to being as good as a panel of retinal specialists. Our diagnostic now is essentially the best diagnostic in the world. The only difference was that we curated the test data.
So it turns out-- you know, I show you that rainbow chart. We gathered about seven diagnoses for each patient. So afterwards, we're able to do a sensitivity analysis, to say for the training data how many of those labels were needed. And it turns out that two, maybe three labels, and a consensus between them, was more than enough.
So these machines are remarkably robust against dirty training data. But for the test data, 7 wasn't enough. So we actually now-- this is what this picture is-- we used to convene adjudication panels, where we would basically bring retinal specialists in for a week, to sit there arguing over each image. Then we built a whole tool that they could sort of do this remotely.
And by just curating the test set, we got from, again, as good as a regular doctor, to as good as a panel of specialists. So as you're going out there and doing these machine learning projects, do not be afraid of dirty training data. But exquisitely curate your test set as much as you can.
And the reason for this-- even though you never train on the test set, you can build millions of models. This is like which is the best model? And the performance on the test set tells you what the best model is. So really, really concentrate on your test data.
So then we said, OK, so we had-- we're able to predict diabetic retinopathy. It was great. You probably know Google has this history of sort of 20% projects. And there was a young woman, she had recently graduated. And she really wanted to work on this project. But she didn't really know ML very well. And she was in some other part of Google.
And we said, well, why do you take these retinal images. And we know male or female, or at least sort of self-expressed male, female, in these images. Why do you see if you could predict that? And there's like-- it's not going to work. But it's a good exercise. You know, try it and all that.
So she came back to us about three weeks. And says, yeah, I can pretty much tell male or female in the retina. Like there's nothing in the literature that says male retinas and female retinas are different. We're now at about a 97% ACU.
So this was a case where-- the features of diabetic retinopathy we kind of knew. You look for exudates. You look for hemorrhages. We had no idea-- nobody had any idea that male and female retinas were different. And you know what? We still don't know why.
So where the predictions localize, there's all sorts of techniques that you can do about which pixels are contributing the most to the prediction. And there's papers coming out about this, about how do you explain what the model is doing.
Well, you can do a GAN that turns it from male to female and back again. And we've sat down. And we've got the GANs that do this. And turn it from a quintessentially female to quintessentially male. And the ophthalmologists stare and say, I have no idea what's going on here. So we still don't know.
We can cut the images into 64 by 64 blocks and scramble them. And the prediction, it's not quite 0.97. But still pretty good. So it's probably a localized-- it might be in some high-level correlation structure. We still don't know.
And it's funny because I keep having people work on this. And they're like, oh, this should be easy. And then they stop in frustration after a few months. But we will solve this problem.
But there's also some other things we can see. We can see age. We can see blood pressure. We can tell what size glasses you need. So for a young kid, who maybe can't take an eye test, we can take a picture of their fundus and tell them what level of eyeglasses that they need.
And very interestingly, we can actually do a whole cardiovascular risk assessment. So there's this thing called the Framingham score, which comes from the famous Framingham study, where they followed people for 50, 60, 70 years now, in Framingham, Mass. And we can essentially tell that just from a picture of the eye. This is the risk of having a significant cardiovascular event.
So who knows? We're looking for more signal. We're looking for glaucoma and AMD. We're looking for neurodegenerative disease without markers. So this is just one those cases where we don't even know what features to look for. But deep learning finds them. You've got to have lots of examples though.
So let me switch quickly to radiology. So we-- oh, that woman is on the team now, by the way. She's doing great. She's actually coming back for a PhD.
So we had a bunch of-- these were brain MRIs for Alzheimer's. So, of course, now everybody says, can we predict male or female? And sure enough, you'll see the AUC, the 0.99999. Of course, we could tell male and female.
So a natural conclusion is, well, male and female brains are completely different, right. Well, someone had the wherewithal to say, well, what if we just cut a picture of the brain out and try to predict just on the extracted square and then also try to predict on, without the brain?
So it turns out that if you cut the brain out, it's still 0.997. And if you're just looking at the brain, it's only about 0.80. So sort of, this is a case-- and I don't know exactly what it's seeing. It's something in the facial features or in the jaw line. I don't know where it is.
But it would be very easy to just say, oh, look, male and female brains are totally different. And you're locking in on some confounding variables that you don't understand. The computer is very, very glad to do that. So be careful.
As you're doing these problems, you know try to predict the things you shouldn't predict. Try to think about what might be tricking you. There are confounding variables everywhere. And it's very embarrassing if you announce Eureka, and it's because there was some text in the middle. Or the sick people go to one lab and the healthy people go to another lab, and you're just seeing the difference between the illumination at the labs or something like that. There are confounding variables everywhere. So be careful. And if the results are too good, they probably are.
So let me talk about pathology. So I talked earlier about breast cancer. You can imagine all sorts of things you could do with pathology. Like find similar cases. Doctors reason by example. And we have a system that's called SMILY. We just launched it. It's really cool. Can you just highlight the cancerous regions, so focus the pathologists attention as to where to look? Could you recapitulate what the pathologists would say? Just get the pathologist to do the reports, can you just go from the image to report? And we're working all of those.
The one that I'm most excited about is essentially, to rewrite the books on pathology. So when a man gets a prostate biopsy, it's evaluated with this thing called the Gleason score. And Gleason was a brilliant pathologist. And he looked at a huge number of samples. And he built a model in his head. And then he codified it in a guide which they teach to pathologists, which you hope they're having a good day when they read your slide. And you can look at this. It's a two-part score. And you can look up the crisis of Gleason 7's. So basically, if you have 8 or higher, they treat you aggressively, and 6 or lower, they kind of watch you. And 7 is the pathologists giving it to the oncologist to make a decision.
But we could try to reproduce the Gleason score. And we have a paper where we did that quite accurately. But why not just throw the whole thing away and just redo what Gleason did? We can look at a million samples, we can see how nature labeled the data. Did it progress, did it not? Did it aggressively metastasize? And relearn, basically, redo pathology. So I think this is going to happen over the next few years.
But let me talk about something that's just super practical. So it turns out that focus on the images has a big effect on model quality. Now, realize that the scale of that is shifted. But you guys know about AUC curves. It's like every little bit is another patient that's sort of slipping through.
So here's an example where this is a pathology slide. And these are the predictions about the cancerous regions. Is there really a stripe of cancer in the corner there? What's going on there? And it turned out that the image was out of focus. It was just from the pathology scanner. And so it's not enough to do the predictions. You have to go and look for all the things that might go wrong. And you understand how well do you know what you think you know?
And so we actually built a model just to predict image quality. And we open sourced it. And we actually gave the Broad a grant and the guys who do ImageJ, a grant. These are the typical image processing tools-- to incorporate these models into their tools for people to use. And so then, you know, we're Google. So we went off and bought a bunch of scanners and tried them all. And this one's a pretty good one. This one has this weird striping pattern here. This one might have been a sample prep problem. So the color is the focus level. It might've been a sample prep, where they didn't quite cut it flat. So practically, you need to look at all of these things before you just make a prediction.
So let me get into more practical. We had these amazing models. We could detect METS. So the AUC was great. So we said, well, let's do a reader study. We gave the pathologists our models. And it's sitting there with pathologists. They're going to be happier, they're going to be faster, they're going to be more accurate. For the first version, none of that was true. They were less happy, they were less accurate, and they were slower in doing it.
And there's a bunch of reasons. Just as an example, you worry about false positives and false negatives. So if you're doing screening, if you're screening people for blindness, I'd love to set up eye scanners in train stations in India. A lot of people in India don't even know they have diabetes. Can we detect them? But if it's a 1% prevalence disease, so if we're 99% accurate, every other case will be a false positive. And we'll flood their health system.
But in this case, it turns out that humans, or at least human pathologists, are very good at dispensing with false positives. So if the computer says, look here. Or the pathologists can look at it and say, don't worry about that. But they were actually much worse with false negatives, because they tend to trust the computer. So if the computer misses something, the doctor is going to miss it.
So if the doctor knows that the computer is going to have false negatives, she has to reread the image. She has to deal with everything the computer says. And then she has to do a de novo read anyway. Because she's ultimately responsible for that patient's health. So where you set the operating point matters a lot, based on the context.
And then how you actually show the data-- our first version, we would mark the tumor. But the person wants to see the tumor. So you have to mark around it. But after doing some iterations, now they are happier, they're faster, and they're more accurate.
But the point here is, don't just assume because you have this amazing ML model, that's just going to drop in to some existing workflow. You really have to understand what the humans are doing, how to engage the humans. I mean, if it's a totally automated system, maybe not. But if humans are involved, as they often are, you need to really think these things through.
So let me jump one down one level. I might run a little bit late. Is that OK? I'll try to keep it tight. So let me go into cross piece. So one of the amazing things that you can do with machine learning is called image-to-image regression. So if you have pairs of highly-correlated images, you can learn to predict one from the other. And this kind of makes sense. If you have a normal photograph, if I showed you that photograph, you could tell which chair is in front of the other chair, and the chair is in front of the door, and the desk is in front of the chair. Humans can tell that.
And it's funny, because a few years ago, if a human could do it, it said nothing about whether the computer could do it. And now it's like if a human could do it and there was training data, of course we could teach the machine to do it. My team wants to do stuff that humans can't even do, like see age from your retina.
So image-to-image regression works. This is in your camera now to do selfies. We can tell when it's a selfie by looking at the depth. And then you could do all these cute photo effects.
So we said, the gold standard in reading microscopy images is histochemistry or staining, where they are staining the proteins or the biological models in the cell. There's a stain called DAPI, which binds to DNA, where you could see the nucleus, you can stain membranes, you can stain all of these different structures.
But sometimes you don't want to do the staining. It might be the case that you want to keep the cells alive. So there was some really interesting work that came out of Gladstone a few years ago, where people with neurodegenerative diseases, they end up with these plaques in their brain. So people thought, oh, well, the plaques are like the tombstones of the brain.
Gladstone was able to show that the cells that make plaques live longer than cells that don't. So you probably have these misfolded proteins, which are damaging the cells. And then cells that aggregate them into plaques and then eventually recycle them, do better. Eventually, they're just overwhelmed by the plaques. So the only way they could do that, though, is with time lapse. They had to track the cells over time.
So if you look at that image on the left, it's very hard to read what's going on, versus here. So we said, hey, can we use image-to-image regression to take bright field microscopy or maybe hyperspectral microscopy, or pathology slides that were stained years ago with standard stains and learn to impute the deeper labels? And the answer is, we kind of can. Not for everything, we're obviously not seeing protein expression.
But if there's enough morphological clues in an image like that, and then we can get training data from the fluorescence like that, we can learn to predict it. So in this case, blue is-- the nucleus is a DAPI stain. And green is propidium iodide, which selectively binds to dead cells.
So you can look at this. This is the original. And we kind of killed this cell here, right here, but it looks kind of weird anyway. So it's pretty good. It's not great, it's not perfect, but it's pretty good. These are human neurons image with phase contrast. And this is a dendrite and axon stain. And it's pretty good. It's certainly good enough for typing of the cells.
So there's all sorts of uses for this. It turns out tissue will autofluoresce. Animal tissue auto fluoresces. But the thing is, so you can take-- if you have say, a small biopsy, you'd like to preserve it to do the genomics on it. But you also want to see the architecture of the tumor. So you can hit it with fluorescence. But nobody knows how to read the 600-nanometer image or the 580-nanometer image.
But what you can do is take a hyperspectral stack, and then sacrifice some of the samples and stain them, and then learn to predict the stain from the hyperspectral stack. So it's a very, very cool technology. We open source the data, the models, everything. It was published in Cell last year. It's a very cool technique for lifting information out of biological images.
But then what happens if you've got lots and lots of images? How do you compare them? So there's another wonderful machine learning technology called embeddings. And the idea of embeddings is, you train a network to produce a vector such that similar inputs, the vectors are closer together. And closer together can be whatever you define it to be. And as long as there's some consistency, there's some deep structure that the network can lock in on, it will learn it.
So for example, this is how facial recognition works. So you have two pictures of the same person and one a bit different. And you want the two vectors of that person to be closer together. And you train this a billion times. And it is frighteningly accurate.
So what's interesting is, prior generations of facial recognition, you'd have to do feature engineering. You have to find the eyes, and the nose, and the mouth, and see how far apart the eyes are. And what if they're wearing glasses? And what if it's a profile? And here, you don't have to do any of that. You just give the machine the raw data. And it figures out what the essential difference is.
And there are some interesting stories. Because we roll this stuff out inside Google. And parents were complaining that we had mis-tagged the kids. And we're like, are you sure? And they would write us back and oh, no, you were right. I had the wrong kid in the image, or even telling identical twins apart. So this stuff works really well. And it's just, again, these embedding vectors.
So your face is essentially a cluster center in these image recognition algorithms. And if you take a picture of an arbitrary person, and get the vector from that image, and you go looking for the nearest cluster center, there is really good odds it's going to be-- it's not perfect, but it's good odds.
The same thing with words-- so we trained a network with words. This is actually older work from 2013 where words were considered similar if they co-occurred on the web. And this amazing deep structure emerged where the vector for Berlin minus the vector for Germany plus the vector for Russia pointed to Moscow.
So we never told it about countries. We never told it about cities. We never told it about capitals, or the countries have capital in their cities. And yet, this latent structure emerged. Conjugates of a verb, they would be in a different place in the space. But they would have the same relationship relative to each other. So these networks are amazing at finding this deep structure.
So we said, can we apply this to drug screening? So the way drug screening works now is, you'll have some hypothesis. They grow the cells or whatever in these wells. And they treat them with lots of compounds and lots of doses. And then they have some theory about what's going to happen. This cancer drug is going to kill these cells. Or this is going to cause this protein to express.
So they have a theory going in, and they build these assays where they pull 2, 3, 5 numbers out of these images. But these images are incredibly rich stories about what's going on in the cells. And this is a real example from a Broad study where at some sort of mid-level dose-- blue is DAPI again-- at some mid-level dose, this cancer drug was turning these cells into these multi-nucleated monstrosities. And then you turn the dose up higher, and it kills them. And so you never would have seen that if your assay was just, are they alive or dead, or count the cells. You never would have seen this.
And we love this Asimov quote, the most exciting thing in science isn't Eureka, I found it. It's oh, that's kind of weird. What's going on here? So what we wanted to do is basically go from this model where you pluck out a couple of features of the image, to just mapping the images in a almost hypothesis-free way. It's not quite hypothesis-free way, because the way you paint the cells and the way you stain them is-- there are these standard cell painting assays where you're just painting basic structures of the cell and then just put them in a morphological state.
So essentially, what we need to do-- it's two simple problems. You can imagine training an undergrad to do this. Come in and look at the controls. And there's actually quite a lot of visual variation in the controls. All of that stuff is boring. Now look at the cells that we've treated, we squirted juice on. Did something interesting happen? And which are more or less alike? So if we can do those two things-- learn what's boring from the controls and then learn what's interesting, this whole world of research opens up.
Ideally, you would like the dimensions of the vector to be meaningful. And we believe that we'll be able to project into spaces to do that. But that's not even critical. So we took this again this Broad study. And they had 38 drugs in this study, 12 different mechanism of action-- so some of the drugs had the same mechanism of action-- a bunch of different concentrations. And these cells are all actually painted-- these are cells that were treated with different amounts of different drugs-- painted with the identical assays. So you could see some things materialized, some things don't. And then the game became, can you look at a picture of the cell and say what happened to it, based on what happened with the neighbors?
And we got these just incredible results. So what you're seeing here is a TC plot where the multi-dimensional vectors are projected down to two dimensions. And the colors were added later. If you see those three blue ones on the bottom there, so those are three different compounds that all had the same mechanism of action. So the major thing that they did to the cells was all the same. They all clustered right near each other. But each one had some secondary effect that wasn't really accounted for. So they all were right next to each other, but in distinct subgroups.
And somewhat unexpectedly, at extremely low doses, the cells started moving almost linearly in embedding space from where they started to where they ended up. So we could start to see the phenotypes emerging from incredibly low doses of these drugs. So this has become a totally different way of doing it. And the results were so ridiculously good, we didn't believe them. As you shouldn't if you get great results.
So Verily is another alphabet company. We had them do the same study, but with four times as big, 60% controls, five replicates. And the results were just as good. So this is really amazing good. And we started to see-- this is not your typical dose response curve. But this is distance in embedding space. Again, so we started to see really low doses then.
So we kicked off a whole bunch of projects around this stuff-- around, can we look at cells across people and stratify them into different disease categories? Can we look for a new mechanism of action of, for antibiotics, can we look for new ways of killing cells, new compounds that do the killing. And we don't know yet whether the vectors compose, where we can be sort of mixing drugs when we're doing those experiments.
So this opens up completely new worlds of research. And if you're interested in this, there's two companies that are doing this. One called Insitro by Daphne Koller, who is this great AI person from Stanford. And another one called Recursion Pharmaceuticals, nearby. And then Anne Carpenter's lab at the Broad is doing great work here too.
So I think we have few minutes left. Let me do just sort of a quick survey. And again, our team does everything from nuclear fusion to tools for people with disabilities. So it's pretty wild. And again, please go look at that document for more background.
So let me just show you something real quick about genomics. So you guys have seen this curve. The cost for genomics is dropping significantly faster than Moore's Law. And the secret to why this technology is able to work is that it's been quantity over quality. So essentially, these new sequencers do huge numbers of very dirty reads.
But if you do enough of them and you understand the error models of the machine, you can figure out what the right answer is from lots and lots of dirty reads. And the basic thing is taking all these dirty reads into invariant calling. So I've got 30x coverage or 50x coverage. Every base is read 50 times. What's the actual genome?
So what we were able to do is to take-- let's just do some basic alignment, pile them up, basically, draw them out as an image. And then we knew what we put on the machine. So we're able to predict the top line of the image was what the right answer is. And we're able to learn to predict the top line, basically, from the reads.
And we won this big FDA award. It's called DeepVariant. It's all open sourced. If you're doing work in genomics, take a look at it. And here's the takeaway slide, is that each of the sequencing companies hire teams of programmers that deeply knew the error models of their machine to build a variant crawler. And DeepVariant was better than all of them except for one. And I think now this is a data slide. We're better than that one also.
So teams of programmers, like implementing error models-- forget it. Just learn through it with the data. And let the machine figure out the pattern. And there's all sorts of new sequencing work going on.
So let me talk real quickly about biomarkers. So I talked about how we make progress in cancer by identifying types and then developing therapies for each type. For things like neurodegenerative disease, they are so far away. They can't even do patient stratification. Like Alzheimer's, ALS, Parkinson's-- it's almost certainly many different diseases, the way cancer is with different underlying mechanisms.
And the diseases now are just characterized by the symptoms, not by the cause. And they're not finding any drugs. Herceptin never would have been approved if it wasn't for HER2 tests to go along with it. Because again, it's an expensive drug, it's dangerous, and it only works on a few people. But for those people like my mom, it's a miracle.
And the way that a lot of neurodegenerative diseases are assessed is, it's the best we could do, but it's terrible. It's like a subjective measure. The patient comes in to the doctor, they have a 25-minute exam. And the doctor says walk across. And he gives you a score of about how steady you are. And they ask you questions about, can you dress yourself? And most patients want to hide it. And they're like, yes, they can dress themselves, but they don't wear bow ties anymore. Or can you feed yourself? It's like, yes, but I don't buy gallons of milk anymore. Now I buy pints of milk so I can feed myself.
So these assessments are terrible. And there's all sorts of problems with it. Again, not just that we don't understand the disease, but we can't even stratify the patients.
So with ALS in particular, they have this 10-point score. It's called the FRS score, the functional response score. And again, it's very, very, very subjective. And one of them is with the voice. So we did this project with a local Boston group, a fantastic institute called ALS TDI, where we just took patients voices and said, can we predict their FRS score? And we can do it quite accurately. The patients just said a single phrase like, I owe you yo-yo today, or something like that. And we're able to predict it quite accurately.
The green is the FRS score, that multiple doctors consulted on curated values. And the blue is what the machine is predicting. So with these better biomarkers, there's all sorts of things you can do. It has a remarkably high dramatic impact.
So for example, one is, you can stratify patients. You can break them into different types of disease. The second is, you could predict progression better. So if you've got much more objective measures, if your error bars on assessing disease are smaller, you can have fewer people in a drug trial. And drug trials are very expensive. And so if you can have fewer people and get an accurate measure statistically, that's a big deal. You can measure that person's progress against themselves. So instead of saying, OK, this is how the group behaved, you can measure-- like especially, for rare diseases-- what happened with this individual? So more accurate biomarkers are really important.
So the voice stuff worked great. There's hopefully a Nature paper coming out on this. We got the accelerometer data from these patients too, doing simple exercises. And we're really excited about this. I think we actually, between the voice and the accelerometer data, we think that we can predict all 10 FRS scores. So again, this is just, we're chipping away at these diseases. About hey, can we stratify the patients, reduce the number of people in the trial? So it's slow progress. But we are making progress.
So another quick thing is in silico evolution. So it turns out that nature does lots of experiments all the time. And so you've got these sequences. And you get a different sequence, it could be just an error in copying. And a lot of people working on, can I go from sequence to structure, like in proteins? Because the structure tells you a lot. The group in London, they've got this thing called AlphaFold and their state-of-art work in doing structure. We're skipping structure completely and going directly from sequence to function and saying, let the deep learning network figure it out.
So this is a case when we did this collaboration with Harvard. So what you're seeing is a picture of an AAV virus. And it's amazing. It's a complex of, I think, just three proteins. And it just self-assembles into this soccer ball. It's incredible. And AAV viruses are currently one of the best vectors for doing gene therapy. Because they can deliver the gene, but it does not integrate into your genome. So you can get a temporary therapy. And if it doesn't work, it kind of fades out.
But the surface, sort of fuzz, on AAV virus impacts whether it's going to be attacked by the immune system. And whether it'll be selectively uptaked into different tissue. So we started this project that said, can we, basically, in silico, try out different sequences and predict what better AAV viruses will be? So the basic way the system works is, now, with the ability to print DNA, you can make a bunch of variants. You can usually get them to express in bacteria. And then if you can develop an assay for them, you start getting positive and negatives about which ones worked on this assay or not.
So it turns out that if you do the machine learning right and you've got fairly dense-- you know, you've got a bunch of examples around a sequence, the precision of these models is really high. So if you haven't drifted too far away, these models will predict how well it will do against that assay. For example, will this AAV virus package? Now, it turns out that the precision is good, but the recall isn't very good. So if you go too far away in sequence space, you get junk answers.
So now the challenge has become-- sequence space is nearly infinite. It's almost a discrete optimization problem. This is, where should we even look in sequence space to start the experiments in order to generate more data to make better models to do iterative evolution? And if you can do this against multiple assays, you can start optimizing different things. So again, we're on the verge of some really, really interesting work about designing proteins, designing antibodies, designing peptide drugs. So this stuff works really well. And there's some papers in that document that I linked.
So in this case, what happens, we were making AAV viruses. And we were making single mutants and a couple of double mutants. And again, basically, it's an optimization problem. And what we're able to do is to say the vertical axis is the precision. Did the virus package? Did it meet the assay? And the horizontal axis is how many mutations we made. So the dotted line is just making random mutations. And you can see, as we make multiple mutations, it just stops working.
So we got absolutely tremendous lift, in terms of the precision of predicting which ones are going to work. So this is just beginning. We're just at the opening of this. So if you want to read more, I encourage you to go to the site or lots of research to do.
Here's my last one, is simulation plus learning. So you guys know, PDEs are-- we kind of know the physics. And in general, you can do these PDE simulations. And the physics are known, but it's quite complicated. Because it depends on the parameters. It depends on the data, the velocity, and the heat at that point. But PDEs are how things are done. And it's like how climate is modeled.
But it turns out that in these PDE calculations, if you have the grid size, the computational complexity goes up by the fourth power, which is why we have climate models that are still done at kilometer scale. Because there just isn't that much computation.
Well, it turns out, not surprisingly, that there are repeated patterns in the world, over, and over, and over again. And these networks are really good at finding these patterns-- not any rules that we could ever write. So again, with hurricanes, these patterns seem to repeat.
So we had a project inside Google that was trying to do super resolution on images. So you watch those crime shows. And they're like, turn up the resolution of that image. And they take a fuzzy image and suddenly you see the perpetrator, which is, of course, BS-- but not completely.
So it turns out that if you have kind of a de-focused low-resolution image of an eyebrow, there are only so many eyebrows. And you can train models on real images. Basically saying, here's the low resolution. And let me train it to predict the high resolution. And it essentially is learning the pattern so that you see it in normal photographs. In theory, every pixel could be any color. It's sort of computationally explosive-- combinatorial explosive.
But the world isn't really that complicated. It's got repeating patterns again. So what you're seeing here is-- the bicubic is basically how PDEs are done. This third one is a neural net taking that image on the left and increasing the resolution of it. Again, you have to be careful. Because it looks good, but it might not be real. It could be just a hallucination. But the ability to make low-resolution photos look high-resolution and appealing is here already.
So we said, hey, can we apply this in a PDE context? So what you're seeing here is, on the left, is the actual numeric. So the blue line is, we did very expensive simulations, like every small point. This is basically as accurate as we know how to get it, hugely computationally explosive. And the orange is what the lower resolution computation on Berger's equations are. And what you're seeing on the right is, at one quarter the resolution on the left, you're seeing what the network is imputing.
So again, we train the network on the higher-- we said, here's the low resolution, here's the high resolution. Learn to predict the high resolution from the low resolution. And it works amazingly well. So this is buying powers of very large, almost exponential speed-up in how quickly you can do some of these simulations. So here's an example with advection too, around the baseline method.
So what's really kind of exciting about these techniques is that if you suddenly get orders of magnitude speed up in your simulation, first of all, we can do more accurate climate simulations, which we're pretty excited about. But you can also almost start treating it like an inverse design problem. Where OK, I've got this input, and I can run the physics and tell you what the output would get. What if you want a particular output? What inputs do you give?
That's an overwhelming calculation now. But what you do with this network, the gradients of the network almost guide you as to what possible inputs. So we have interesting projects in nanophotonics and things like that. So we're entering into this world of inverse design. And there's a great class at Harvard taught by someone who works part time with our team, Michael Brenner, on inverse design problem. So it's a really cool area.
I'm sorry I'm running so late. But let me wrap up a little bit. So a lot of this talk was about the promise. And I hope you leave here excited, the fact that you're here and you're still here on Friday night. But there's a lot of peril too. So just simple rules-- try simple models first. Deep learning is really cool. You don't always need to use it as a hammer. And it's kind of neat now that there is specialized hardware. Sometimes over killing it if you've got specialized hardware. But try simple models first, because the simple models might tell you something interesting where this big ball of numbers isn't. It's going to be very opaque to you.
My office mate Patrick Riley has got a great paper in Nature-- Three pitfalls to avoid in machine learning. It's a great paper. It's born of lots and lots of experience. Try to predict the things you shouldn't be able to predict is a good example. We worked with a lot of people doing bio stuff and just things like experiment design. They're like, well, all our controls are in the right column of the plate. And it's like, well, that might have been nice when you're pipetting. And that made it easier to pipette. But the right column of the plate-- it's fed later than the left column. It's got a different spot in the incubator. We can see all those things.
We get bio data from people. The first thing we do is we run our image quality thing. And about a third of the images are crap. The next thing we do is, we try to predict. Can we predict-- the batch always-- can we predict the row and the column in the plate? Often. And again, it's not that the data is unusable. But these tools are so exquisitely sensitive now, that you're seeing all that stuff. So be careful that you're not just predicting a batch or predicting the row and column.
It's funny because we can even tell, if two operators are doing the experiment, people have different lab hands. And we can usually tell if there were two different operators, which operator ran the experiment. And again, it doesn't mean the data is not usable, but be highly sensitive to that. And then again, just look at your data. Look at your data. Like that stuff about the out-of-focus images, these networks will gladly predict. You train them, they look beautiful, you put crap in them. They'll predict. I can get you a DR diagnosis on a picture of an airplane. It's like, it's not very useful. So look at your data again, and again, and again. Invest in scaffolding, invest in tools. Look at your data, please.
So just a couple of final observations. There's really great opportunities here. But you really have to think about the people and the workflow. So for example, are you building something to help humans? Really think about how the humans are doing their work, and where the operating point is, and how you're going to help them if you're building stuff that's automated. I'm really excited about the kinds of stuff we can do with screening. And again, seeing things that haven't been seen before, can we rewrite the books on pathology?
Data is a challenge. And people are very protective about their data, maybe overly so. There's a lot of effort. If you're doing bio stuff, look at the UK biobank. There's a lot of resources. Google is publishing lots and lots of data sets. But don't just assume that data is going to be available. But spend some time in the data. And really curate your test set.
There's a great paper out-- again, I linked it-- where we learned Pfam. This is the huge database of protein sequences. And there's about 18,000 identified protein families. And about a million and half sequences have been characterized by what family they're in. There's another 55 billion or so, where they used HMMs, like the best thing. And we are able to learn it in a model.
We're about to release this. It runs in your browser. It used to be a really slow database look up. We're going to release this thing where you can just run it directly in your browser. It's incredible, absolute state-of-the-art. It gives you a probability distribution across all 18,000 families as to which family it's likely in. So just absolutely sort of incredible tools emerging.
So I think I've mentioned this about correlations and causation. But there might be something there. But look at the data. And again, be careful about confounding signals.
So I often give this talk to lay audiences. And one of the things that I'm really encouraging the business people or even the scientists is, learn to ask the right questions about this stuff. You don't necessarily need to be an expert in deep learning. My team has that expertise, even though what we're largely trying to do is apply it. But learn to ask good questions. It's incredibly powerful. And again, just basic questions like, are there confounding factors? But really, even think about the question you're trying to answer.
A very legitimate complaint about a lot of ML in health care is like, someone will find a bunch of data with a bunch of labels. Like I'm going to build a model to predict those labels. It's like, well, that's really cool. But that's not what doctors do. That's not what doctors care about. Again, as a student, maybe you need to get a publication.
But think about the predictions that matter. Think about what's really the right question here. I work with a lot of amazing people. And the more senior people have developed this ability to ask the right questions. So in your careers, please focus on that. So I think I'm done. Thank you very much.