AI for physics & physics for AI
June 25, 2020
May 5, 2020
All Captioned Videos CBMM Research
Max Tegmark, MIT
Abstract: After briefly reviewing how machine learning is becoming ever-more widely used in physics, I explore how ideas and methods from physics can help improve machine learning, focusing on automated discovery of mathematical formulas from data. I present a method for unsupervised learning of equations of motion for objects in raw and optionally distorted unlabeled video. I also describe progress on symbolic regression, i.e., finding a symbolic expression that matches data from an unknown function. Although this problem is likely to be NP-hard in general, functions of practical interest often exhibit symmetries, separability, compositionality and other simplifying properties. In this spirit, we have developed a recursive multidimensional symbolic regression algorithm that combines neural network fitting with a suite of physics-inspired techniques that discover and exploit these simplifying properties, enabling significant improvement of state-of-the-art performance.
PRESENTER: So let me introduce Max Tegmark, our Max, to the weekly CBMM seminar. It's great to have you, Max. And it's a special day because you are here, and because it's your birthday. Happy birthday to you. Everybody should [INAUDIBLE]. There is, let's see, reactions like this.
MAX TEGMARK: Ooh.
PRESENTER: [LAUGHING] Happy birthday.
MAX TEGMARK: Thank you.
PRESENTER: And for people who don't know you, I imagine not many, but Max is a cosmologist, a physicist. He's the director of the Future of Life Institute. He's a very good writer. This book that he wrote, Our Mathematical Universe, is great, highly recommended. And he also likes AI. So being a physicist, you will speak today about physics for AI and the AI for physics, or vice versa. Max.
MAX TEGMARK: Thank you so much. It's a great birthday present for me to get to talk with you about stuff that I'm super excited about and to see you all here. [INAUDIBLE] physics and physics for AI. AI for physics, I mean how physicists can use AI to do physics better. And by physics for AI, I mean kind of the opposite, how physics can hopefully give something back to help machine learning and AI.
We all know, of course, how amazing the progress has been in AI in recent years. Just think about it. Not long ago, robots couldn't walk. Now they can do backflips. Not long ago, we didn't have self-driving cars. Now we have self-flying rockets that can land themselves with AI. Not long ago, AI couldn't do face recognition well. And now not it can not only do that, but it can simulate [INAUDIBLE] face saying things he never said.
Not long ago, AI couldn't save lives. Now we have actually quite useful AI, machine learning diagnostics for prostate cancer, lung cancer, eye diseases. We have AI that can win the annual protein folding competition and do such a good job on matching the three-dimensional measured X-ray crystallography shapes that it's probably going to help accelerate drug discovery soon.
Not long ago, AI couldn't beat us at Go. And now it's crushed not only human gamers at Go and at chess, but more interestingly, it's crushed the AI developers who spent decades hand crafting software to play these games, which is all obsolete now by having Google's AlphaZero just play against itself for under a day, right? Thanks to [INAUDIBLE] postdoc [INAUDIBLE] and his gang at DeepMind.
So if all this progress is happening, how can it be used to help physics? In lots of ways, obviously, for example, just right here at MIT in our physics department, I put together this long list of how a significant fraction of my colleagues there are using machine learning to do their physics better. I'll just give you a few examples here to start off.
We have machine learning used right now to detect gravitational waves better as massive black holes a billion light years away or thereabouts crashing into each other, distorting spacetime to such a small amount you have to measure to 22 decimal places to see the thing. Machine learning is great for picking up these signals. Other colleagues are using machine learning to detect extrasolar planets and other solar systems. That's part of the reason we have over 4,000 discovered now.
Machine learning is being used to analyze data at the large Hadron Collider at CERN coming in at these crazy data rates, where you can't have grad students do it. And actually, most of the hardware cost right now for CERN isn't the building the magnets and stuff. It's the hardware for running the machine learning.
How can physics pay back its debt and help machine learning a little bit? Well, both through hardware and through software. On the hardware side, for example, my colleague [INAUDIBLE] in the physics department, whom a number of you know, has developed this optical chip for faster machine learning. So it looks like your regular chip, a little black thing.
But instead of the computation being done by electrons moving around in two dimensions, the computation is done by photons moving around at the speed of light. And it turns out that this is incredibly well suited for matrix multiplication, which is, of course, one of the key things we do in our neural networks, which it can perhaps dupe on at scale about a million times more energy efficient than today's chips. So I think we'll be seeing a lot of improved hardware that will help a lot.
What about on the software side, algorithm side? What can physics do there for AI? Well, of course, AI still has plenty of challenges, right? Not only that there are tasks we don't know how to do, but also problems. For example, you've all heard about how this machine learning system was deployed across courtrooms in America. But people just hadn't understood well enough how it worked and hadn't realized that it was actually racially biased.
And there are so many other examples of where we just didn't understand our machine systems well enough. And that caused problems. Boeing certainly wishes that they had better understood the very simple automated system that controlled the 737 MAX before they deployed it. And the traders at Knight Capital certainly wish they had better understood their automated trading system before they deployed it. It lost $10 million per minute and kept going for 44 minutes until someone finally caught on and turned it off.
Now, can you raise your hand if you ever had a Yahoo account? See if we see any hands. Yeah. So if you did, you were hacked because all three billion Yahoo accounts were hacked, right? Raise your hand again if you have a credit card? It was probably also hacked because Equifax, all of their credit card information was breached.
These sort of problems, the first reaction people tend to have is say, oh, that has nothing to do with computer science. It's because of the evil hackers that did it. But then we know better. The truth, of course, is that here, too, the fundamental problem was that these systems, these security systems, were not well enough understood by those who deployed it.
They hadn't understood that there were actually loopholes in them that the hackers could exploit, again, showing how there's a lot of value if you can just get things you understand better. I'm particularly excited that you, Tommy are here today because I love how you, Tommy, draw the distinction between the engineering of intelligence on one hand and the science of intelligence on the other hand. And you like to point out that the engineering of intelligence is very much about just trying to make things work, at least work well enough that you can make money off of them.
Whereas the science of intelligence aims not just to make it work, but to ask why does it work, how does it work, at a deeper level? And I think this is really the key to addressing the kind of challenges I mentioned to get a deeper understanding of your systems. And my research group-- I'm so grateful that I get to be a part of the CBMM.
Here's what some of my students looked like before the lockdown. And below you can see what they look like after the lockdown, where they all have rectangles around their heads. But we focus on what I like to call intelligible intelligence, which is exactly this idea that the more you can understand how your machine learning system actually works, the more reason you might have to trust it.
I'd say intelligible intelligence not explainability because this is a more ambitious goal. I'm not talking about a system that can say some blah, blah, blah, to you in human terms and explain why it diagnosed you with cancer. I'm talking about able to understand things at a deeper level so you actually have reason to trust it because you understand it. It's an ambitious goal.
I'm going to tell you today about four projects at the interface between physics and AI that all have bearing on this. So let's start off with this one, which is a paper together with a former grad student, Tailin Wu, who is at Stanford now. And if you're here Tailin, hello.
So to motivate this, let's start by taking a machine learning task which is very easy these days, where you just retrain the QN network from Google DeepMind to play this Atari game, which many of you have played as kids. In the beginning, it sucks. It keeps missing the ball all the time.
But pretty quickly, it gets quite good, catches the ball every time, plays, and discovers this trick that you should always put-- you should make a hole and then always keep putting the ball up into that hole in the corner and just rack up the points. This feels intelligent, right? But how does that actually work?
Is this intelligible? Can we, for example, trust that it's always going to work this well? Well, in this case, I can show you exactly how it works because we just trained this network on one of the computers at MIT. And this is how it works.
It takes the pixel values that give the colors of all the pixels on the screen, multiplies them by a big matrix, applies some nonlinear transformations, more matrices, et cetera, et cetera. We know all of these 867,488 parameters. Is it crystal clear now how it works?
No, this is completely useless as an explanation, right? If this were instead some very mission critical software that was controlling the vision system of my self-driving car or whatever, I would have absolutely no guarantees because I just have no clue how this is actually working. So can you do better? I think yes.
And I want to start by just dispelling a myth that some people seem to have internalized that the fact-- that the power of machine learning somehow comes from its mysterious inscrutability. I think this is complete nonsense. But some people seems sort of resigned to the idea that we can only get this great power because the secret sauce is somehow related to that inscrutability.
I think rather that the power of deep learning comes from its differentiability, by which I mean every single choice of parameters in a neural network still does something. You can still take the gradient. And you can, therefore, get information about how you should change the parameters. So you can quickly get to the right place in an exponentially large search space instead of just practicing at random.
And that opens up the possibility that you could maybe have something that does just as well but is much simpler. So I'm going to do a little test of your own neural networks by just showing you something more complicated than that ball in the Breakout game. And let's make this a little bit interactive.
Try to predict where the ball's going to go next. And whenever you see any sort of pattern with your brain, shout it out so we can all hear it. Don't forget to unmute yourself. Start by the most obvious things, just don't overthink this. What-- is there any regularity at all that you see here?
AUDIENCE: Well, sometimes it bounces off a wall. But then there are other times when it just turns around right before it hits the wall. So this the confusing part.
MAX TEGMARK: Very good. It seems to bounce sometimes against walls. So there seems to be walls. But where are the walls? Are they just in a triangle shape, or is it a circular wall? Or where are the walls, would you say?
AUDIENCE: I think you're tracing the inside of a [INAUDIBLE] digit.
MAX TEGMARK: [LAUGHING] That's an interesting idea. Can anyone say anything about any kind of wall, what where it might be, or whether its shape is?
AUDIENCE: It seems to be a square, a rectangle.
MAX TEGMARK: It seems to be a rectangular wall, yeah. So when it hits it, it seems to bounce. And then when it's not bouncing, how is it moving? Is it always moving in a straight line? Or is it like there's some sort of force acting on it or what?
MAX TEGMARK: Does it seem like the law, that whatever the force is, that it's the same force everywhere? Or could there be different kinds of forces on different parts of the screen?
AUDIENCE: It could be held by an elastic band.
MAX TEGMARK: Yeah, maybe it's held by an elastic band so that it's doing like a harmonic oscillator, sinusoidally oscillating, at least on some part of the screen but maybe not all parts.
AUDIENCE: It looks like a gravitation [INAUDIBLE].
MAX TEGMARK: Oh, this is really cool. So you were saying that another part-- one of you said that in one part of the screen it looks like it's doing harmonic motion. And then another part of the screen it looks like it's gravity doing [INAUDIBLE].
AUDIENCE: Yeah, like a slingshot maneuver, it spins out.
MAX TEGMARK: Yeah. So your neural networks are really great, right? You're not just telling me a vast number of parameters. You're giving me some real insights as to what's going on here.
So what happens if we just throw-- write a simple feed-forward neural network at this and train it to just predict the next position from the past two positions, for example, minimizing the loss? It can do a pretty good job of predicting it. But if you try to predict far into the future, it starts sucking more and more.
What intuition does that give us? What understanding does it give us? Nothing, basically. Here are the parameters we got when we trained a neural network to predict this. So how can we do better
So what Tailin and I did was we borrowed four very old ideas that have been successful in physics and deployed a machine learning version of them. So let me talk about these four ideas one at a time. Maya, there's a squirrel attacking the bird feeder here if you want to try and chase it away. Its neural network is very smart, and it's been looking at this bird feeder for days and actually [INAUDIBLE].
So the first one is Occam's razor. In physics, if we have a simpler explanation that's just as accurate as a more complicated one, we tend to prefer that one. And Ray Solomonoff put this principle on a firm mathematical footing with complexity theory, together with [INAUDIBLE] and [INAUDIBLE] and other greats.
The only problem is that their definition of a simple-- or of complexity is NP-hard hard to evaluate, generally. So what Tailin and I did was we figured physics got a pretty long way, made a lot of progress using Occam's razor, even though I was a little bit vague and fuzzy in how we had to find it. So maybe we could, too.
So we defined a much simpler complexity criterion that's very fast to evaluate. We said that if you have an integer, the amount of bits of information you need to store it is just how many digits long it is in binary. So basically you take the log of the integer. If it's a rational number, it's just the two integers, the complexity of the numerator plus the complexity of the denominator. If it's a real number, well, you can convert it to an integer by dividing by the precision floor of your CPU and then take the logarithm of it.
So if you now want to know-- make a plot of how complicated, how complex is a number, if you looked at the diagram here, you get this very, very interesting thing on the y-axis. If you take generic real numbers, the complexity just grows as a logarithm of it. That's the thick red line there, which would scale up and down [INAUDIBLE] you change the precision floor.
But of course, if it happens to be exactly 5/3, then it's certainly much simpler. And generally, fractions with small numerators and denominators are simpler. And you'll see later on that the code we have will automatically try to minimize the complexity and discover simple fractions when they're there.
If we have a lot of data, if we have a model with many parameters, then we define the whole complexity as just the sum of the complexity for all the parameters in it. But we also look at the complexity of the data. So if you just have a really lousy model that always predicts zero or something, then you just have to store the whole data set.
If you have a model that predicts the data set pretty good, then you only have to store the errors that you make in your predictions after you apply the model. So we sum up the total complexity of everything. And if you look at how this plays out-- I just want to give it a little shout out to this very simple measure of complexity of numbers because it actually automatically gives you a much more robust method of fitting data than chi squared or minimizing the mean squared error, as you can see here.
Because if you have one bad data point, as in the left side, the mean squared error will give a lot of weight to those points that are quite far from the model. And it will always compromise and pull away from the good data points towards the bad data points a little bit. Whereas this other information theory-based method, it has the opposite incentive. It has an incentive to just keep doing even more accurately on the things it can already do accurately and ignore the rest. So it'll just fit perfectly on the stuff, which is good.
A second idea from physics that we throw into the mix here is one that goes back to Julius Caesar, to divide and conquer. There is this story that when Galileo was sitting in church 400 years ago, maybe being a little bit bored by the sermon, he noticed that the chandelier was swinging and tried to model this.
He didn't try to make a model to predict everything about our universe at the same time. He ignored what the priest was saying. He ignored the color of the chandelier, everything else, and just focused on the angle of the chandelier as a function of time and tried to predict that using his pulse. And when he did this he revolutionized our understanding of mechanics in physics.
So in the same spirit, what we do here is instead of trying to create one big model that's going to predict everything, we put an ensemble of models, that each is incentivized to try to specialize and do well on some aspect of the data, for example, in the case of ball you saw maybe on some part of the screen. And we basically take the harmonic mean of how well the different models do, and we can prove that this encourages specialization. You're going to see how it works.
We also use lifelong learning. A human physicist doesn't have to invent everything from scratch every time they see a new problem. Similarly, we put this AI physicist in a series of environments, and they could use what it had learned previously and unify and apply it to the new ones.
So let me just show what happened. So we take this kind of data, and we feed it into the computer like this. We just give x- and y-coordinates of the points, and off it goes.
And after a bunch of training, it discovered entirely by itself that there seemed to be four different domains where the rules were different. We didn't tell it that they were supposed to be four domains. We didn't tell it where the boundaries were, either. It just learned that.
And within each domain, it was able to do quite well. And we can already see now, as human physicists, that you guys were all right. Whoever it was who said it looks like a harmonic oscillator, you were right. That's what it was doing in the upper left.
I didn't catch who said gravity, but you were right in the lower-left corner here. There was an electromagnetic field on the lower right. And there was no force in the upper right. So if you look more specifically now, for example, in the lower left, what it has learned is, when we use Occam's razor to simplify down the neural network is-- that to predict the x- and y-coordinate of the next step of where it's going to be, you take the previous x- and y-coordinate, and you multiply by this matrix in the upper left corner. And then you add this vector constant also.
Now, when you're a human, though, and you look at this, if you see it say 1.999990, what do you think? What is this? What's your gut reaction? What is it trying to tell you?
AUDIENCE: Loading point error.
MAX TEGMARK: It's probably trying to tell you that's really supposed to be 2, right? But fortunately, we have this quantitative information theory framework to test whether it actually fits better if it is 2. And it discovered, sure enough, that that should be a 2, as you can see on the next line. But it also similarly discovers that the small numbers on the right, they're not supposed to be replaced by 0.
And if you just get rid of the matrices and just write out what this is doing, it's discovered a difference equation, which we know how to transform into differential equation, which we recognize as Newton's laws, which then auto discovered here. We ran this on 100 different worlds like this, with different domains and different laws. And in the summary here, [INAUDIBLE] performed about a billion times better than just the simple neural network in terms of accuracy. And it was also able to learn with a lot less data and a lot faster.
So intelligibility isn't nice just for helping gain trust in what you've learned, but it can also really aid performance. So let me give you another example now of some of these physics tools applied. Here, the physics formulas that are discovered were kind of simple, right? So together with Silviu-Marian Udrescu-- salut, Silviu. He's also on the call here.
SILVIU-MARIAN UDRESCU: Hi.
MAX TEGMARK: We decided to see if we could do tougher, harder ones. This is a paper that we just published in Science Advances a couple of weeks ago. And symbolic regression, what is that? Well, as Josh Tenenbaum and others here who work on it can tell you, it's simply the challenge of taking a bunch of data and discovering a formula that fits it well.
For example, Johannes Kepler spent four years looking at data like this from measurements of Mars until he discovered that it was an ellipse. Wouldn't it be nice if one could do that automatically? If the function is linear, then, of course, linear regression is so easy. We do it all the time, even if it's a function of many variables.
But if it's an arbitrary function, this is known to be NP-hard for the simple reason that there are exponentially many possible formulas that you could do. If you'd just make a list of all formulas from the simplest to gradually more complicated, by the time you get to even relatively simple ones, you might have waited a million years or longer than the age of the universe to get to the Planck blackbody formula. So that's obviously not good enough.
So in response to that, there's been a lot of nice work. Hod Lipson had the best symbolic regression software to date when we started our project. They used a genetic algorithm. But we decided to see if physics could help.
So we had this vision that a lot of the problems we actually look at, even if the random formula is NP-hard to solve, [INAUDIBLE] special properties. So [INAUDIBLE] for example, has emphasized that most formulas we care about actually are compositional. Even if it's a formula of nine variables, you can usually rewrite that as a bunch of combinations of functions of fewer variables, often two variables or less.
We often have symmetries as well, or separability, where maybe the function of eight variables is just one function of three times one function of the other five. We also tend to often be smooth functions that neural networks can do well on. So how can we combine these ingredients to solve the problem better?
To build a test set, we took the 100 most famous or complicated equations out of the final lectures on physics, stuff like you see here. And for each one of them, we made a big table of numbers, which is the starting point for the software to deal with. So you put one column for each input variable, give them random values. And their last column is what the formula evaluates to.
Your task is look at the table, find the formula. Linear regression isn't good enough because these are not linear functions. So you see Julius Caesar here. That's because we tried the divide-and-conquer strategy again, using some of these physics ideas to see if we could break from the simpler ones.
For example, we would train a neural network to fit the function really well while still having no clue as to what the function actually was. And then we do experiments on the neural network to test if it had any of these simplifying properties. For example, if the neural network discovered that actually the only way it depends on column two and three is by the ratio of the two, then we would replace column two and column three by one column, which was the ratio of them, and then restart the software on a data file with one column less.
Similarly, if it discovered that they were separable, then you could replace this by two problems, both which have less variables. And this turned out to be very helpful because the basic reason symbolic regression is so hard is because of the curse of dimensionality, where problems get exponentially worse with the number of variables. I'll skip over the details. You can ask me later.
But if you look, for example, at the function here at the bottom, it's the optics formula, right? You can see this somewhat messy expression is separable. It's a function of theta-- sorry, a function of phi times a function of delta and n.
It can train a neural network, and then it can discover that. And now it breaks it apart into two problems. And we have this recursive loop, which keeps going until the individual parts get so easy that a brute force search or a polynomial fitting or something like that can zap it.
So as an example, look at this problem here of nine variables. it's Newton's law of gravity. If it's faced with a table with all these columns, it does the dimension analysis first, so it can reduce the number of variables a little bit. And then it discovers, oh, translational symmetry. It only depends on C and B in the upper-right corner here by their difference.
So it can eliminate one column. Now it has one variable less. Then it discovers that it only depends on E and F by the distance, one more simplification done. And then it discovers this is separable. So it can factor this into two mysteries, et cetera, until it can solve the whole thing.
And the results of this, we were actually quite happy about this because this really nice Eureka software that I mentioned from [INAUDIBLE] Lipson, in addition to costing money, it could only do the 71 out of the 100 mysteries that we threw at it. Our code that Silviu did a heroic job on, which you can find on GitHub for free, solves all 100 of them. Then we decided to see if we could break it by going back to our physics books, like graduate textbooks, and pulling out even more complicated equations, like these ones.
And it still solved-- so this time, the Eureka software failed on 17 out of 20. It could only do three. Whereas our code still solved 18 out of 20. And we have a new version of it now which can do even better.
And this is an example, the way I think about it, actually, of data compression, lossy data compression more broadly. I actually think of all of physics as, in a sense, being lossy data compression because we walk around in the world, beautiful, sunny day. And we almost immediately throw away almost all the information that comes into our senses and keep only the part that's really useful for us for predicting the future.
If you have a table of numbers, like we gave Al Feynman, and you run [INAUDIBLE] minus 9 on it, for example, just take up less space on your hard drive, right? If you were to have discovered that the ninth column is some function of the other eight, then you can compress it even better, right? And the more you can compress things, then, in a sense, the more useful your formula is.
And if you take this information theory point of view of what we've done, here is what came out of AI Feynman's inner workings in the process of tackling the particular mystery of figuring out the kinetic energy formula and special relativity. You can see it in the lower right in its full glory. And there's a trade off when you do data compression between how much information you retain about what's useful, the opposite of being lossy, this trade off between inaccuracy on the y-axis, on one hand, and complexity on the x-axis.
So you can get very low inaccuracy by having the full formula. The opposite extreme, it could predict something super simple, like always predict that the kinetic energy is 0. Now you get a huge loss, a huge inaccuracy. But the complexity's very low. That's the upper-left corner here, right?
Now, what do we humans tend to value? We tend to value things which do pretty well on both of these criteria, right? Occam's razor says simple is good. But we also want accurate. And you notice there's one other point on this frontier, [INAUDIBLE] frontier, which is in a corner, where it does pretty well on both complexity and accuracy. Which one is that? A big shout out here if you have a suggestion.
AUDIENCE: MV squared over 2?
MAX TEGMARK: Yes! And we were very excited about this because we had never taught this AI anything about high school physics and that particular approximation to kinetic energy that we humans found useful. It, all on its own, decided that m MV squared over 2 is a really useful approximation for kinetic energy just from looking at the data, because it can get a lot of accuracy with much less complexity than the full formula.
So I'm quite interested in using this as a tool not just for discovering the exact laws, but also for discovering really useful approximate formulas for things across science. And thinking about science's data compression again more broadly, this segues into a third project I want to tell you about just very, very briefly. Suppose-- this is a paper that the I also wrote with Tailin Wu, very related to what [INAUDIBLE] spoke about in his recent CBMM talk.
Suppose I have a bunch of cats and dogs here. And your task is to classify these pictures, whether they're cats or dogs. And suppose I tell you that you have to do some sort of lossy data compression. Instead of sending the whole picture into the classifier, you have to just do a clustering and divide all the pictures into groups, say, three groups, for example. And you can only tell me whether the image is in group 1, 2, or 3. And based on that integer, you now have to predict whether it's a cat or dog.
And now what's the best grouping to do, right? Should you just-- it's not so obvious. And there's a lot of interesting literature about what's the optimal number of groups to have and what they should be and so on. So we were actually very excited that we were able to solve exactly this problem for a special case, whether it's binary classification, like cats versus dogs. We ran examples also for MNIST for two different digits. Sevens and ones, they're easy to confuse and for Fashion-MNIST.
And without getting into detail, what we found was that the trade off between how simple things are-- it's always simpler if you have few groups or slower entropy in your compressed data set. So farther to the right, in this case, is simple-- the trade off between that and how much information you retain about whether it's a cat or dog. In this case, you can see [INAUDIBLE] curve as the one we're talking about.
If you look at the original images, there's 0.7 bits of mutual information, not 1, because it's kind of hard. But if you only do two clusters, one where the ones that you guessed it's a cat and the other cluster where you guess it's a dog, the mutual information between your best guess there and whether it is a cat or dog is only 0.6 bits. You can do a lot better with 0.7.
And a fun physics link here is that if you look at these corners, we saw in the plot I just showed you before with the kinetic energy, that the things that we humans find most interesting are in these corners, where you do unusually well on both simplicity and accuracy. The corners here, which also pop out, they actually correspond to phase transitions in the machine learning, where you have these bifurcations going from two classes to three classes to four classes and so on.
And a significant fraction of all papers in physics are about phase transitions. And I would love to chat with many of you afterwards, if you have ideas, because I just have a sense that there's a lot more fruitful work to be done by linking up phase transitions and machine learning with stuff that's been studied in physics.
The very last thing I want to leave you with is coming back to this discovery of physics equations from looking at moving things and start by confessing that I felt that even though Tailin and I were kind of excited about the AI physics part, we felt we kind of cheated because the truth is we had these pictures of the moving dots, right? But we didn't send in the image. We sent them the x- and y-coordinates.
And there was a lot of human intelligence that went into figuring out that you should measure the x- and y-coordinates of the moving dot. Wouldn't it be much nicer if you could skip that step and just start with the raw video? Suppose you just had a video of something moving like this on some weird background.
Any compliments about the artistry of this code. It was Silviu-Marian Udrescu who made it. Wouldn't it be cool if you could just send in the raw video, and it would discover that it should map this and data compress this, map it into some latent space, which actually involves measuring the x- and y-coordinate, all by itself? This might sound-- And then you could study that latent space and try to figure out what the actual equations are.
This is actually a much harder problem than you might think. Because if you can solve it, then with any kind of image, then you will also be able to automatically solve it. The way you phrase the problem, it doesn't even matter that it's an image. We just send in a vector of 10,000 numbers or whatever, and it has to figure out a way of mapping out 10,000-dimensional space into 2-dimensional space.
It should also be able to solve it if you're looking, for example, at the image through a strange distorting lens like this. In which case, what you want is not at all to measure the x- and y-coordinate of where it is, right? You would like the mapping that the machine learning discovers to undistort the image so we actually get back the actual useful coordinates of where things are.
This is what we humans do every day when you walk through the world because the stereoscopic projection means that what we see around us is actually kind of distorted. We don't just get x, y, z, right? So Silviu-Marian Udrescu worked very hard on this.
We had a very simple architecture. We send in these video frames. We have an auto encoder that maps it into a latent space and makes sure it can map it back. And then we have a time evolution neural network that tries to go from the latent space to the next step, [INAUDIBLE] the last two to the next.
And the big challenge is here we want to bring in Occam again with his razor and ask, what kind of latent space will give us the simplest laws of physics for how the latent space evolves? So we thought a lot about how can we define simple in a differentiable way that we can train? And we started thinking about this guy, Einstein.
So if you have a mapping that a neural network does from its input to its output, you might use Einstein's work on curved spaces to say, maybe a space-- a map-- your neural network is simpler if it doesn't curve [INAUDIBLE] so much if this mapping has very small Riemann tensor components, for instance.
And you can code this up. But it's numerically very painful because you have to take many derivatives-- many gradients, multiple third derivatives of a higher and so on. And then we realized that there's a much simpler thing you can actually do, which is if you just put a penalty on the actual derivative of the gradient, so you're encouraging the gradient of the neural network to be constant, if the gradient of the neural network is constant, then, as you can see from the second Einstein equation here, the curvature is always 0, and it's nice and simple.
So we decided to try that first. And remarkably, it actually worked great. In the beginning, we had a lot of problems, where it would discover weird latent spaces that looked like a cat, even though the line was supposed to be parallel and so on. I could tell you more about amusing reasons this happened.
But eventually, Silviu persevered and was able to discover nice simple latent spaces for everything, even though we were distorting it in some cases with this weird lens so that original-- if it had just taken x- and y-coordinates the image, it would have gotten some really weird latent space, where the laws of motion were super complicated. If discovered instead the undistorted motion. And I can either stop right now because I've been going for 41 minutes, or I can take 3 more minutes and tell you a little bit about how I feel this is exciting also for physics, again. Which do you prefer?
PRESENTER: Oh, for another three minutes.
MAX TEGMARK: OK, I will. I will do that. So if I put on my physicist hat again, this stuff of latent spaces, we used to think in physics that there really was Euclidean space out there that we just discovered, right? But now those of you who do awesome neuroscience have discovered that the organism will often invent internal representations and latent spaces and so on.
So you might wonder, could it be that even [INAUDIBLE] the physical space is actually also a latent space? And I actually think that's true because we learned from Einstein that this whole thing with inertial frames, for example, that you're supposed to have the x-coordinate and y-coordinate and the z-axis all perpendicular to each other, and it's not supposed to be accelerating and so, that you don't have to do that. In general relativity, Einstein said, aw, forget about that. You don't have to be in an inertial frame. You can have any coordinates if you use general relativity.
Why do we still use this more simple latent space, where our axes are perpendicular and things aren't-- and an object addressed remains at rest? I think it's because our brains are choosing to make a latent space where the laws of motion are as simple as possible, again. We interpret it that way. And I want to just show you. Silviu and I were able to automate that process, in this case.
So we gave it five different examples with magnetism and harmonic oscillator, and a nonlinear cortical oscillator, et cetera, et cetera. In all of these five cases, even though we just sent in video sequences, it mapped things into a two-dimensional latent space, five different ones, because these were five different neural networks.
And if you look more closely, you can see that some of them are very squished relative to the others because the laws of physics, in some cases, would still work out just fine if you scaled the axes by different amounts or, in fact, did any affine transformation where you just take this two-dimensional space. You take each vector and multiply it by a 2-by-2 matrix and add another vector. So there are these six degrees of freedom left.
And we were wondering, could it be that if you just insist on using these degrees of freedom you have, make the equations look as simple as possible, if it would actually discover what we humans consider simple? So Silviu mapped first find. He took all the rocket images from all the videos and mapped them into all of the five latent spaces, figured out just what affine mapping, connected the different spaces so he could put them all together into a single unified latent space.
And now we still have these six parameters we can play with. We can shift things sideways, for example, put the origin wherever you want. On the right side-- or on the lower left-- on the lower-right corner, rather, you see the equations of motion that Silviu had put in to start with. And you can see that some of them don't care if you change the origin, like the first one.
But the harmonic oscillator, it cares. Those equations will be the simplest if you put the origin in such a place where the center of the harmonic oscillator was. Then there's going to be less terms in the equation. And sure enough, if Silviu tries all the origins and all the different shifts and plots how complicated all the equations are together, he finds that there's a certain shift that's the best.
And then you can start rotating the space with the best rotation. In this case, there are some of these formulas which get much simpler when you rotate by a certain angle. And others, we pick that. And then similarly, you can do a little bit of shearing. And at the end, it discovers exactly what we would consider the simplest space, where, in fact, the x- and y-coordinates, there's no relative stretching, and equations are beautifully simple.
And I suspect that this is very much what's going on in physics, actually. We keep coming up with a representation of the world such that the description becomes as simple as possible for us. Because if we have that internal representation in our brains, then that minimizes the amount of computation we have to do when we try to predict the future and figure out what are the best actions to take.
So in summary, I've told you that it's not just the case that machine learning and AI is helping physics enormously in so many areas. But I also feel that physics and the science of intelligence more broadly [INAUDIBLE] really help machine learning in various ways. I've given you a series of examples.
For example, by combining Occam's razor, divide and conquer, and other ideas for how you can go from raw distorted video to find the latent space where the equations of motion are so simple that you can use AI Feynman to actually discover the equations for them. And then also showing you how you can also find which are the most useful approximate equations for fields where maybe an approximate equation is the best that you can hope for.
And I would like to end by saying that we would love to collaborate with those of you on the call here. If you have any data sets lying around on your your own hard drives, where you think there might be some patterns yet to be discovered, it would be really, really fun to see if any of these tools could help discover them. Thank you.
PRESENTER: Thank you, Max. Great. Let's give Max a great applause. That was great.
Associated Research Module: