Advice for a (young) investigator in the first and last days of the Anthropocene
Date Posted:
August 21, 2025
Date Recorded:
August 15, 2025
Speaker(s):
Jascha Sohl-Dickstein, Anthropic
MODERATOR: It's my pleasure to introduce Jascha Sohl-Dickstein, who's currently a research scientist at Anthropic. He did his PhD at UC Berkeley with Bruno Olshausen at the Redwood Center, and later on did a postdoc with Surya Ganguli at Stanford. Now I think Joshua is probably most well-known for the kind of originator of diffusion models. But I know him for basically being one of the most creative people and most careful thinkers I've ever met. And I think, today, you should really take his words to heart because when he says something, I really believe in it. And, Jascha, you can take it away now. Thank you.
JASCHA SOHL-DICKSTEIN: OK, thank you for the kind and slightly embarrassing intro. Cool, so the title here is a play on a very opinionated book by Ramon Y Cajal that given the audience many of you might be familiar with. Before I even start, I want to strongly encourage you to interrupt me with questions. I think this talk will both be more effective and also much more interesting if you call out every time I say something you don't believe. So please do that.
Maybe to get you in the right mindset, here we have a plot of GDP versus time. And that is you standing precariously on top of the line. And you're thinking to yourself that, I live in a pretty normal world. Some things are probably going to change, but the future is probably going to look mostly like a linear extrapolation of the present. And as this plot should maybe suggest, this might not be the right perspective on the future.
And by the way, surprisingly, this plot looks surprisingly similar even if you plot it on a log scale. We did not reach our current rate of exponential growth until around 1950. So the world is changing exponentially, and it's changing exponentially at a faster time scale than it has historically changed exponentially.
So the purpose of this talk is to convince you that the future may be very different from the present. In particular, AI is transforming the world, and it is going to be a big transformation. If you believe this, you should make decisions about your career and research projects and possibly personal life that take this into account.
I also want to emphasize that you have a really immense amount of leverage on how this transformation happens. We are still quite early in an exponential, and you are in positions of large leverage early in an exponential, which can lead to very, very different outcomes. And finally, I'm going to have a small number of practical suggestions for how you might want to think about this and what you might want to consider.
So let's talk about geologic epochs. So geologic epochs are measurements of deep time divided by distinct changes in life forms or climate or geological processes that can be observed in rock strata. Humans are starting to have a geologic impact on our planet. We're in the middle of the sixth great extinction in the history of the earth. Radioisotopes from nuclear weapons tests can now be observed being laid down in rock strata. And so you'll be able to tell, a billion years in the future, that we were here, or at least you'll be able to see some of the impacts that we had on the planet.
There's a proposal to name this current epoch of human activity, human-driven geologic change in the earth, the anthropocene. This focus on human-driven change is maybe interesting because we are perhaps near the end of the period where humans are the primary intellects driving global change. Maybe The central conceit of this talk, which I have to admit, I also chose because I kind of really like the title, is that very much depending on the way AI plays out, this could be a very short geologic epoch.
So this is a talk about AI. So let's talk about the progress in AI over the course of the anthropocene. So here is the amount of compute used to train notable AI models with x-axis here. y-axis is training compute. In the upper right, I've circled a very rough plausible range for the compute performed by the human brain in a lifetime.
My own back of the envelope estimate for that is if you imagine that every synapse does one flop, one floating point operation, per millisecond in your brain, then you do about 1 trillion petaflops of compute over your lifetime, which is a small fraction of a grid line above the largest model trained to date.
So we are like reaching scales of model training compute that are similar to those that the human brain experiences in an entire lifetime. This is not sufficient and probably not even necessary for artificial intelligence. But it is perhaps suggestive or makes it more plausible that these models may be able to behave at human-like levels if they are using an amount of compute that is human-like. And I'm going to try to make a point of pausing occasionally. So yeah?
AUDIENCE: So you could also multiply that number by 4 billion and by 8 billion people, and then it would be very [INAUDIBLE].
JASCHA SOHL-DICKSTEIN: You perhaps get the benefit of evolution. You maybe don't get the benefit of the other 8 billion people more than a machine does. But even evolution, our genome compresses to 700 megabytes. And at most, a few hundred of those megabytes are to do with neural development. So it is true that we have evolved to be good intelligent systems. But the legacy of evolution is not like the synapses in our brain. The legacy of evolution is a couple 100 megabytes of hyperparameters. Yeah?
AUDIENCE: On the previous slide, you said you could imagine the anthropocene to be very short. Does that mean you think AI will reduce the impact of humans on the environment? That will be great. Because you characterize the anthropocene as the impact on the environment, like the people later or somehow just later or [INAUDIBLE] here do you think that's going to change? Because that sounds very good.
JASCHA SOHL-DICKSTEIN: Oh, man, I think what capabilities we're going to have and how we use those capabilities are two very different questions. And I have more confidence about what capabilities we're going to have and way, way less about how we're going to use them. It's hard to believe us putting the genie back in the bottle so completely that we stopped leaving a mark on the planet. But we've chosen to do strange things in the past.
So here is a recent study from METR. What they did is they created an ensemble of tasks, and they measured how long it took humans to perform each of those tasks. And then what they're doing is, for each model, they're plotting the time length on a log scale at which the model is able to perform a task with, like, 50% success. You can make a similar plot for 80% success, which they also have in the paper. And you find that this curve is shifted but has the same slope.
The key thing to observe here is that the length of the task that models can do in an unaided fashion independently is increasing exponentially. It's doubling every seven months or so. If you extrapolate, you would predict that a model can do a full day's work of intellectual labor with 50% probability of success sometime in early 2027. Or if you want the higher probability of pass rate, then sometime maybe 2028.
I'm going to pause again. To me, this is one of the strongest possible-- if you believe in straight lines on log plots, then this is a fairly compelling one. I'm going to keep on going.
Here's a plot showing the speed at which superhuman performance has been achieved on new benchmarks up through 2023, so up through two years ago. Superhuman performance is very consistently achieved, and it's very consistently achieved faster and faster. Here, y-axis is performance relative to a human baseline. And you can see that for benchmarks released in 1998, it took until 2015 before we hit human levels. And you can see that for benchmarks released in 2019, we were getting superhuman level by 2023.
I was one of the organizers of a large-scale collaborative benchmark called BIG-bench, and we had publications claiming superhuman performance on subsets of our benchmark before we even published the paper. So this is something which is happening faster and faster. And this trend of rapidly solving ever-harder benchmarks has continued since 2023.
Here is progress on GPQA-Diamond, which is a benchmark created by PhDs in different STEM fields. They have this five-part process of iterative creation and review and filtering and editing and rereview of problems that are included in the benchmark. You can see that it's taken-- you can see that in 2023, models were performing at roughly random on this on this benchmark. And you can see that it took maybe 2 and 1/2-ish years before models were decisively superhuman on this benchmark, where superhuman in this case means they do better than PhD students in the specific domain that each question is targeted at testing. So progress is rapid. I'm going to pause again for a couple of seconds. Yeah?
AUDIENCE: What are the latest benchmarks that the models are trying to beat?
JASCHA SOHL-DICKSTEIN: So a lot of the ones that people are doing now are code-related benchmarks. So I think maybe partly for utility reasons, there's benchmarks of can you do these large-scale autonomous coding tasks? There's benchmarks of humanity's last exam, for instance, which are just trying to come up with really, really hard questions. I actually don't do benchmarking much. So I can find you a list, but-- yeah?
AUDIENCE: Can you give an example of the questions?
JASCHA SOHL-DICKSTEIN: I should add one as a slide. I don't have one as a slide. But the questions are ones that require both deep domain knowledge and reasoning. So it might be like a physics problem in, I don't know, cosmology, where they ask you to assume something about the vacuum state, and they ask you to assume something about the underlying model that's being used. And they ask you what the probability of some kind of transition is on some time scale or-- so they're often questions that require both computation and deep understanding in our domain.
AUDIENCE: So my issue is my own experience-- some of these models are surprisingly good [INAUDIBLE] some of these very niche and complicated topics. But they would sometimes just have lapses or gaps in their logic. And my fear is that if the thing is with a PhD student, we usually accumulate knowledge by kind of building on top of previous knowledge in a way that's not very lossy. So we make sure there's not a lot of gaps, there's a solid foundation, and then we learn the next thing. But with this, if you have gaps all along the way, you're trying to accumulate knowledge, I'm afraid it doesn't add up to anything.
JASCHA SOHL-DICKSTEIN: Yeah, I mean, so here's maybe a-- I think maybe the question for that is practical. Can they do the tasks that we care about them doing? And can they do new tasks that we care about them doing? And I think that people use them to code. People use them in all kinds of situations where they rely on them heavily. They're also able to solve de novo problems.
So it's not just, for instance, academic benchmarks that are being saturated. A few weeks ago, models from two different organizations were able to solve IMO problems. And if you're not familiar with it, the IMO is extremely hard problems that are developed every year specifically for the competition. There's 72. It's for people under 20, but there's 72 people every year that-- last year, there were 72 people that got a gold medal and two models. Yeah?
AUDIENCE: So do you think the models could potentially now be showing really good benchmarks but not necessarily demonstrating good utility in the sense that-- not just overfitting the benchmarks but the fact that these benchmarks don't necessarily correspond to--
JASCHA SOHL-DICKSTEIN: What we actually care about?
AUDIENCE: Yeah.
JASCHA SOHL-DICKSTEIN: Yeah, so this is a this is a good question. It's hard to rule it out entirely. My largest argument against that is on vibes and on the fact that this rapid progress is not benchmark-specific.
You would imagine if there was some set of capabilities-- it's a very strange set of capabilities that are absolutely crucial to the world but that no one is able to create a benchmark that measures. If you can't quantify the thing that's missing, then you have a high burden of proof insisting that something is actually missing. Yeah? Boris.
AUDIENCE: I'm wondering, how was this achieved in the sense that-- was it just a prompt, or were there humans refining the [INAUDIBLE] system?
JASCHA SOHL-DICKSTEIN: I mean, they trained a model to do math like this. This is a different model than the one that you use through the chat interface. But actually, I think you can you can get direct access to Gemini's version to Google version. But this is the model operated in natural language. There was no transformation to lean or similar. Boris.
AUDIENCE: My [? collection ?] is the most critical system, and OpenAI system failed on the same problem. Do you have any explanation for this?
JASCHA SOHL-DICKSTEIN: I do not. Yeah, I don't have a-- do you know what fraction of the humans failed in that same problem? Was it just a particularly--
AUDIENCE: [INAUDIBLE] those who solved all the problems?
JASCHA SOHL-DICKSTEIN: Yeah.
AUDIENCE: Well, [? Russia and Canadians. ?] But most of [INAUDIBLE]. But I don't think that was the same problem that they failed.
JASCHA SOHL-DICKSTEIN: Interesting. That's fascinating. But I don't have a good comment on it. Yeah?
AUDIENCE: So the previous benchmarks, you showed that how the progress in AI kind of can get us to a stage where it's super [INAUDIBLE]. I was wondering the kind of mistakes or errors-- are there benchmarks that we've explored, mistakes or errors? How well did this do humans? Because in the previous benchmark, they didn't reach 100%. So there are still some mistakes.
So these mistakes are also going to be hard for humans, or is that a different pattern? Because, sometimes, we see them make mistakes that humans wouldn't make. So are there any benchmarks that are showing correlation between mistakes or what is hard for humans and also, what we [? prompt ?] with our model? Or is that a different pattern?
JASCHA SOHL-DICKSTEIN: That data is out there. I'm sure someone has published it. I'm not familiar with the answer, though. Yeah?
AUDIENCE: I'm just wondering [INAUDIBLE] time. And so what's the interpretation of that? Length of time is just a complexity or [INAUDIBLE]?
JASCHA SOHL-DICKSTEIN: What determines where-- so they came out with a they developed a diversity of software engineering tasks, and then they measured how long it took people to do those tasks. And so they had assigned a time length to every task by having a human subject perform the task. Yeah?
AUDIENCE: I'm wondering why [INAUDIBLE] really conforming the world around us [INAUDIBLE] sense. For instance, self-driving or something-- it's more like useful digital systems or coding something like this, like for cognitive tasks on the internet. But it's not really [INAUDIBLE]. So do you think there's something missing to make a physical [INAUDIBLE] AI or--
JASCHA SOHL-DICKSTEIN: I didn't quite understand. You say you think there's something missing or what--
AUDIENCE: Yeah, like [INAUDIBLE] AI around us, for instance, in the sense of automatic driving. So I think the AI progress we see right now is about in the digital world only, in the sense that we have useful assistants helping us reduce coding or question answering. You don't really see it in the physical world around us-- for instance, self-driving cars.
JASCHA SOHL-DICKSTEIN: Ah, you're saying robotics is--
AUDIENCE: And do you think there's maybe something missing there, or that it can also be achieved with this paradigm of scaling?
JASCHA SOHL-DICKSTEIN: Yeah, so self-driving, by the way, is-- you should you should travel to San Francisco and ride in a Waymo. It's--
AUDIENCE: --benchmark environment, in [INAUDIBLE] details for those cars to be able to drive. That's why they don't know if [INAUDIBLE] city.
JASCHA SOHL-DICKSTEIN: They do each city very carefully for a long period of time because they're trying to eliminate the very, very long tail errors. The cars can drive quite well even in novel environments. But they-- yeah?
AUDIENCE: Yeah. [INAUDIBLE] see them on highways because it's too dangerous.
JASCHA SOHL-DICKSTEIN: I think that's because Google is being extraordinarily conservative. Highway driving is easier than street driving. And the cars themselves are actually significantly safer than humans driving. So I think it's a function of Google really, really wanting to avoid a fatal accident with their car as opposed to the actual relative performance of the car compared to human drivers.
AUDIENCE: A related question is, is there much work on [INAUDIBLE]? Given your answer and saying, well, I believe this is true by this model [INAUDIBLE].
JASCHA SOHL-DICKSTEIN: Yeah, calibration is something that people care about and measure. It's something that people sometimes even directly optimize for. It's getting better. The models are often overconfident, but less so than historically. All right, I'm going to continue.
So models are able to perform novel tasks that only a very small number of humans are able to perform. From talking with people at other labs, I think it's quite likely that we will see models solving open theoretical math problems in the next order of six months, but you have to wait and see if it happens.
When I gave this talk at Harvard-- when I gave this talk earlier this week at Harvard, and when the talk announcement went out, I got this kind of fascinating unsolicited email, where the entire message was, like, "what a batshit crazy abstract." And I think this is interesting and relevant because there's this concept called the Overton window. The Overton window is the set of ideas that it's considered acceptable to express or hold. And at least on the surface, this email is a statement that the idea of AGI is outside the Overton window, or at least it's a statement that is very much inside the Overton window to call AGI crazy.
And since the Overton window determines what we feel comfortable discussing and, to a significant extent, maybe even what we feel comfortable thinking in our heads, it's very important to clearly communicate that the Overton window is moving. I think maybe a primary purpose of this talk is to give you permission to take this stuff seriously. The things I'm saying in this talk are normal. Respected and well-known people and institutions say the same things. And they're still respected.
I'm going to give just a few examples of this. So here's Rishi Sunak, who's the previous UK prime minister, saying the competition for AGI-- AI that surpasses humans at all cognitive tasks-- is of fundamental geopolitical importance. Here's Barack Obama, former US president, you might be familiar, saying, as profound as-- paraphrased-- cell phone technology has been, AI will be more impactful. And it is going to come faster. We're now starting to see these models, these platforms be able to perform really high-level, what we consider to be really high-level intellectual work.
Here's Eric Schmidt, former Google CEO, and very active philanthropist. "We believe that, in the next year, the vast majority of programmers will be replaced by AI." I actually think this first part is more aggressive even than me. "Within three to five years, we'll see AGI systems as smart as the best humans and, in six years, artificial superintelligence smarter than all of us combined. This is happening fast, and society isn't ready."
There's also relatively storied, respected newspapers are publishing-- takes that at least seriously. Consider AGI. There's The Economist looking at the economics of superintelligence. There's The New York Times with an opinion page article, "The Government Knows AGI is Coming." So serious people are taking this seriously and saying it publicly and it's allowed. I'm going to pause for another moment and see if-- yeah?
AUDIENCE: So at the beginning of the talk, you said that you are very confident about the capabilities that these models will [INAUDIBLE]. But you're very underconfident about how we will use them. I think, as somebody in the industry, and there's a lot of people in science over here, what measures do you think we can implement in both industry and academia to safeguard ourselves against bad uses of highly capable models?
JASCHA SOHL-DICKSTEIN: Man, OK, so this is a very complicated question. I think that there is a very wide diversity of risk. Some of the more easy stuff that is still a struggle is you should have transparency frameworks that are in law that companies need to implement, where they test their models against biorisk or against cyber attack or against other types of use cases and publicly report the capabilities that they believe are unlocked and the mitigations that they're pursuing.
I think that figuring out how we should transform our economic system is an largely open question. I think that there's two extremes which are pretty terrible. One extreme is everyone loses their job and we don't give them any money, and that's just bad for them. Another extreme is they take all the AI scientists and put them up against a wall somewhere. And I really hope that we come up with some method of distributing the gains to the population with some kind of universal income or similar mechanism.
I think there's also strong risk of targeted manipulation. I think we're going to be able to design stimuli that individuals find extraordinarily addictive and compelling. And I think that having a legal framework that protects people from being directly manipulated by whoever has access to their eyeballs is going to be very important.
I also think building tools that interface somewhere between you and the web and-- I think, at some point, it probably is not going to be safe anymore to just browse the web with your own eyeballs. You're probably going to be talking to an agent that does it on your behalf. And I think building those kinds of intermediaries and those kinds of supporting technologies is going to be very important. Yeah?
AUDIENCE: Since you bring up the economic aspects of this, I'm just wondering, as Chinese places that are just attached to headphones. So to me, a very good metric of whether you reach superintelligence is simply that you can make money. And at that point, also, typically the people that make money, they don't talk about it. They don't involve [INAUDIBLE]. They just make money. And they are very quiet about how they make money. So I'm just wondering how you think about that?
JASCHA SOHL-DICKSTEIN: I think this is not going to be quiet. I think it's going to be very obvious when large fractions of human labor are automated. All right, let me get to the end of the AGI is coming, and then we can start talking about what to do about it.
So models are getting better very fast, and I've set either an ominous or promising tone by talking about the start and end of a geologic epoch. I've also talked about how the idea of AGI is normal enough now that you're allowed to believe in it in polite company. But when can we expect it to happen? If you live in the same SF AI bubble that I live in, the consensus answer is just a few years. If you're at a party and you ask someone their AI timelines and they say they have a really long timeline and you ask them what that means, they say, oh, I think it's going to take a decade.
My wife works in biotech. And there's an ongoing miracle where cancer rates are exponentially decreasing, which is utterly amazing. If you talk with someone in biotech and you're like, oh, the cancer rates are going down, are we going to eliminate cancer, the first thing they do is they start listing all the reasons that getting rid of cancer is actually way, way harder than you would you would expect from looking at a line in the graph, and it's going to take much longer.
It means something when the people at the frontier labs that are building this technology don't see any obvious blockers. The people in the frontier labs are like, yeah, we're just going to keep on scaling it up. We haven't seen it stop working yet. It's just going to keep on going. Yeah?
AUDIENCE: [INAUDIBLE] energy consumption?
JASCHA SOHL-DICKSTEIN: So energy consumption right now is-- energy consumption becomes a practical or physical location if it's roughly 10,000x what it is today. Extrapolations for energy consumption get quite large. But in terms of physical limits, we're very, very far from them being physical limits.
AUDIENCE: What is [INAUDIBLE]? Is it an agent that can maintain long-term [INAUDIBLE] and also [INAUDIBLE] to skills and knowledge?
JASCHA SOHL-DICKSTEIN: I actually have a slide just for you. So I think it's common in conversations about this for people to be like, OK, but what is AGI? They want you want to do a detour into defining it. And I think this is a very interesting and nuanced question. But I also think it's almost an irrelevant question.
Let's say we have a prediction that we're going to build machines that can fly, and then people start-- they build a machine that can glide for 100 meters before it crashes. And then the next year, they build a machine that can glide for 180 meters and it crashes. And the next year, it can fly for almost a minute, but then it crashes.
And then they can fly for five minutes, but they can't steer it. And then you can fly for almost an hour, and you can actually control the thing. And I think you could have an incredibly nuanced discussion about at which point in this progression, you actually have a machine that can fly. But I also think that nowadays, it's just a silly question. And I think the same thing is going to happen with AGI.
I think there's nuance at the cusp. And I actually have a paper where we break this out. But I think that shortly after you can have long, long arguments about whether it quite counts as AGI or not, you're going to-- it's just going to be a silly question. If someone asks you can a machine do this intellectual task that humans do, the answer is just going to be obviously yes.
AUDIENCE: That's how your previous graph was posed, when you ask people, do you think [INAUDIBLE] will happen? So what exactly do they think will happen?
JASCHA SOHL-DICKSTEIN: Ah, so there was a definition in the survey. They use the term human-level artificial intelligence. And they framed it in terms of-- I forget the exact phrasing. We'd have to look it up. But I actually didn't talk about that. I actually skipped ahead without talking about this graph. So this graph is the-- in 2022 and in 2023, there were surveys run on contributors to all the major AI conferences about when they expected AGI to be achieved. And maybe the most interesting thing about these surveys is that in the 2022 survey, people expected AGI to be achieved, with 50% probability, by, I don't know, 2065 or something.
And in the 2023 survey, people expected AGI to be achieved by 2045 or so. And so people's timelines in the field have been moving earlier quite quickly. I don't know what a survey in 2025 would show. And in fact, the organization that was running these surveys has stopped running them. So if you're looking for a very highly cited and fascinating paper, you could run another survey like this. But I strongly suspect that the consensus time scale would be somewhere in the early 2030s presently, if you ran a similar survey. Yeah?
Because there are things that the model still can't do, why do I believe it will be able to do them in the future?
AUDIENCE: Yeah.
JASCHA SOHL-DICKSTEIN: Because every time we've quantified something the model can't do, it's very rapidly been able to do it. And because if you believe there is a capability that the model cannot do and will not be able to do, then you shouldn't see this in every single benchmark and also, just because of my own personal experience, where the model is constantly able to do things that it was unable to do just six months before.
Maybe finishing the slide, my own vibes-based analysis is currently working with Claude, it feels to me like working with, in many ways, an incompetent and, also, in other ways, brilliant and incredibly engaged grad student. And that wasn't true a year ago. A year ago, occasionally, it would help me write a small piece of code. And it won't be true a year from now either. It's going to be a much more capable model a year from now than it is today.
I think you need to look at the fact that there is a sequence of capabilities that people did not believe that AGI had or AI had, and then it developed those capabilities. And you need to ask if there's a pattern that we expect to hold for new capabilities as well.
AUDIENCE: Speaking of [INAUDIBLE], I feel like we are mostly studying a single system [INAUDIBLE] network. Most of the research I think are done for a single system. Well, I don't see any research about collective intelligence, for example, about multi-AI systems working together to solve some problems. Maybe you can argue that the system would still [INAUDIBLE], so we don't need the collective.
JASCHA SOHL-DICKSTEIN: I think we're going to-- so, I mean, Claude Code, for instance, is already like a multiagent system. It has an orchestrator, and then subagents go out and do different things. I have an AI safety paper, which I can talk with you about later, which we're working on, where we measure the capability and alignment of organizations built out of AI agents rather than individual agents.
And we find that organizations built out of AI agents are both more capable and, in general, less aligned. But the more capable part is there and important. I think they will work together better in time, but they're already more capable when you have many interacting agents than if you have a single agent. Yeah?
AUDIENCE: I wanted to ask a question about how you think about [INAUDIBLE]. For example, self-driving cars-- so self-driving cars today are basically classification models. They're not generative models. They're not that kind of apparatus. And the humans, we can learn to drive much faster than, say, self-driving cars. We need millions of hours of training data.
Meanwhile, a teenager can learn to drive in about 30 hours or so because they have some general principles, some general intelligence around things like gravity and momentum that models do not have. So I'm curious about how you see those areas developing and also, how you see that relating to some of the benchmarks. So for instance, gravity and momentum are things I remembered from the neuroscience grad student. So how do you think about those kind of dimensions, especially in embodied areas?
JASCHA SOHL-DICKSTEIN: You're asking do these systems need massively more data than a human needs to do the same thing?
AUDIENCE: No, I'm saying, how are you thinking about general intelligence have been characterized by some basic concepts that humans have that they can generate and build on top of. So with the example of a teenager learning to drive, they need very little data or experience because they have other things they can fall back on, which are generalized principles that are not present in today's self-driving cars.
JASCHA SOHL-DICKSTEIN: I mean, I think this is kind of the premise of foundation models. I think that we'd spend a ridiculous amount of compute training our models, and then they can adapt to new tasks rapidly in context. And maybe, in context, learning is not the best way for them to do it, but it is a way that works.
Also, I think that Waymo almost certainly uses some pretrained vision models as part of its stack, but it is true that they put a lot more effort into making Waymo drive than a teenager puts into learning how to drive. I think pretraining is much less data efficient. But I think it kind of doesn't matter if it works. Yeah. Brian.
AUDIENCE: If you look at,-- really, exponentials is that sigmoids are also early exponentials. And then you can't really tell if you're on a sigmoid versus exponential when you're very early in the kind of curve, like we saw with things like COVID, where [INAUDIBLE] becomes sigmoid eventually, given, certain bottlenecks. So how do you feel about potentially bottlenecks that make the exponential and narrow domains of exponentials but they're actually sigmoids?
JASCHA SOHL-DICKSTEIN: Yeah, well, I mean, I think the amount of space that we have access to is polynomial in time. The light cone has some volume, which is t-cubed. And so exponentials cannot go on forever. So this is guaranteed to turn into a sigmoid. The question is does it turn into a sigmoid before or after it reaches it reaches human level? And I think we're very, very far from resource constraints. All right, I'm going to keep on going. I think I went a little too unstructured actually, which is OK.
So if we take these timelines seriously, if we think that there are going to be devices which-- who knows how they roll out into society, or at least capable of doing the intellectual tasks that we do in 2 to 5-ish years, what should we as individuals do? Maybe first, we should make sure that our projects are still going to be relevant when they're completed. There is some curve of like improving science that can be done with, with minimal human effort just by asking a foundation model to solve a problem for you.
And one outcome that you very much want to avoid is that you spend two years working extremely hard, and you make a lot of progress, and you make a big jump, like that. And you've just barely managed to roughly keep up with the exponential. And you could have done the same project in three days if you just waited two years to start it. This suggests working on targeted projects in relatively large collaborations so that you can go fast and so that you can stay ahead of the exponential in many ways discourages open ended exploration, which I think is horrifying, but I think is also probably unfortunately true.
Maybe the second thing that it suggests is you should keep in mind "The bitter lesson." So "The bitter lesson" is an essay written by Richard Sutton, where he observed that general methods that leverage computation are ultimately the most effective. And maybe the canonical example of this, is people used to spend entire careers at top universities hand-designing features for image classifiers. And then we just scaled the data and scaled the compute and scaled the models, and now none of their work is relevant at all for the present.
So there's two aspects. Maybe there's a positive and a negative framing of the bitter lesson. One is you want to avoid projects that are going to be solved by scale alone because your work won't be relevant in the long run. The second aspect is you want to work on projects that will benefit from scale. You want the approach that you develop to become even more effective if there is greater scale of compute and greater scale of intelligence. And even better, you want the thing that you developed to enable scaling to happen more rapidly, at least if you're optimizing for impact.
You should force yourself to use the AI tools. They provide fundamental new affordances. They're often awkward and nonergonomic. They're definitely hard to learn how to use well. They are an unsolved problem in user interface.
But despite horseless carriages being awkward, they're still more effective than carriages with horses. You should use them to brainstorm and iterate on research ideas. You should use them to get feedback on your writing. You should use them to write your code for you. You should use them to iterate on analysis. I don't generate my own plots anymore. I tell Claude to generate plots for me, and then Claude goes and does it.
Also, if you're either API or you're interested in becoming API, there's actually a surprising overlap in what is required in order to get good performance out of existing language models and what is required to get good performance out of grad students, which is you need to provide structured and clear scoping at the right level of detail and abstraction for them to succeed. And this is an art, but it's a useful skill across a wide variety of situations.
This talk was motivated by a conversation I had with a grad student where, earlier in the conversation, they were talking about how their AGI timeline was like three years. And then later in the conversation, they were like, I'm going to finish my PhD in another couple of years, and then I'm going to get a postdoc, and then I'm going to see what faculty positions are available.
And I don't know what the job of an academic is going to look like in a decade or two decades. I don't know what any jobs are going to look like in one or two decades. But if your left brain, if half of your brain believes that something utterly transforming is going to happen to the world, you need to have your whole brain believe it. You can't have this split system, where you believe one thing but then only believe it in the context of talking about it over beers and then the context of planning your actual life. You're like, pretend like you don't believe it.
Also, I think people often make risk-reward trade-offs when making career decisions or life decisions. I think people often choose something they're slightly less excited about but they're slightly more certain they're going to succeed at. Or they're choose a path which they think will be more stable and more guaranteed success. I think there is some baseline level of uncertainty. If you believe that an incredibly disruptive cognitive technology is going to be able to do everything you do, I think this changes the risk-reward trade-off.
I think if there's a baseline level of risk that you cannot get below, no matter what life decisions you make, then mostly when you make your risk-reward trade-off, you're just trading off reward. And I think that this-- which is probably good life advice totally irrespective of AGI, but I think even more in the context of AGI, I think, does strongly suggest going for the thing that you most want and doing the thing with the with the greatest upside, rather than trying to balance safety and upside. Yeah?
AUDIENCE: Could you elaborate on your first point [INAUDIBLE] academia can be very different. [INAUDIBLE] can you envision?
JASCHA SOHL-DICKSTEIN: So again, I think what the future actually looks like is a slightly different question than what the capabilities are. I think we may still have tenured professors for centuries. It's hard to know. I think, though, that if your job is to think novel thoughts that no one has thought before, and there is a machine that you can ask to please think a novel thought for me, and it will come up with a better novel thought, then your reason for existence is-- it has to be a little different. Yeah?
AUDIENCE: I think this is a good point, saying that there's a reason why many academics work in industry. So I'm wondering if you agree your thought process is outsourced to a machine, and the programming aspect is also outsourced to AI, what do you, for example, envision yourself doing once AGI--
JASCHA SOHL-DICKSTEIN: Man, I worry about this. Also, I have two kids. I worry a lot about what I should be teaching them. And I don't have a good answer. I am going to have to find things other than doing novel scientific research to justify give myself a feeling of self-worth. Luckily, most people on the planet seem to have a feeling of self-worth and don't do that. So it's probably going to be OK. It's a hard question. I don't have a good answer. Yeah?
AUDIENCE: What do we do about this point about the job and academic? I wonder whether it's [INAUDIBLE] [INAUDIBLE] take a step to consider if there's folks making [INAUDIBLE]. But what about other fields adjacent to AI, like UAI or even neuroscience, like [INAUDIBLE]? What'd their future look like in 10 to 20 years?
JASCHA SOHL-DICKSTEIN: I can only partially understand here the question. You're asking if this impact is going to be field-dependent?
AUDIENCE: What about the first [INAUDIBLE] for neuroscience?
JASCHA SOHL-DICKSTEIN: Oh, like what should you do? Well, I mean, selfishly, and also because I think it's a fascinating problem, I think you should study the models you're building. I think you have intelligence that you can-- you have white box intelligence. How could you understand it?
And this also is something that is like, this is going to be an amazingly disruptive technology. And the better we understand the technology, the more of the outcomes are likely to be good, and the more we're going to be able to avoid bad outcomes. And so I think that you can directly-- in fact, I'm going to jump ahead.
So I think that you actually have a skill set which is very relevant to this, which is you have a skill set of understanding how thinking systems think. And I think you should turn that skill set towards understanding these systems. In back?
AUDIENCE: So my gift [INAUDIBLE] great temptation is a very engaging foundation, whereas the models right now need a lot of scaling up in order to even simulate a fraction of that data. So if all neuroscientists, for example, were to focus on studying these models, then what would happen to try to figure out, how can [INAUDIBLE] like that happen with such an efficient way? And isn't that a question that needs answering in terms of using our energy and our system efficiently?
JASCHA SOHL-DICKSTEIN: So maybe a two-part answer. Part number one, actually, the energy balance is actually pretty surprising if you work it through. The energy used per word generated at inference time by these frontier models is more than the energy that would be generated by a brain in a vat, but less than the energy that would be used by an entire embodied person per word. There's additional affordances you get if you have an entire system to just run your model. Especially you can do, for instance, parallel large-batch generation, which buys you significant energy savings.
So these models are not actually that expensive compared to human thought already. But I also believe they could be much, much more efficient than they are. And I think if you want to do research on neuromorphic computing or you want to do research on how to make them more efficient, I think those are fascinating directions. Yeah?
AUDIENCE: So I guess what you're seeing in industry is that people are not really hiring engineers or software developers anymore. And then I think, to some extent, these models are similar in a ability of masters students. So what is your prediction for the future? I guess I feel like it's already kind of replaced masters students. You mentioned replacement [INAUDIBLE]. So what will people be doing?
JASCHA SOHL-DICKSTEIN: Yeah, I think what's probably going to happen, first of all, I think if you're a top few percentile contributor in whatever you're doing, I think there's going to be a period where the returns to top-level people go through the roof because you're able to be much, much more effective at a much greater scale. And so I think there's going to be a period where you're extremely in demand before demand drops to almost nothing.
But I think what you're describing is accurate. I think they're going to keep on doing more and more things. I think it's not a strict ordering. It's not like they're going to take that person's job and then that person's job and then that person's job. It's that they are going to automate a broader swath of tasks across a broader swath of jobs. So I think it's going to happen across many different things in parallel.
I'm actually going to keep on going, and then I'll-- so if you have limited time to contribute, and the stakes are high, you should do something that you'll be proud of. After you retire to your villa in the Dyson swarm, you want to be able to tell your grandkids that you helped us get to the good outcome. This is the geologic epoch after the anthropocene right there, photographed, so spoiler.
So this is a once in many lifetimes chance to shift the trajectory of historic transformation. And these are, plausibly, your last opportunities to have a particular set of contributions that you could be very, very proud of and that could be very, very good.
I also want to emphasize that, sometimes, AI can seem like this process that's happening that's beyond your individual control. And this is utterly false. You have an amazing amount of leverage. We are still very early in the exponential. In the lower right, you have a excerpt from a study on the physical constraints to scaling models in the next five years. And the relevant takeaway from this is that there are clear paths to 10,000x-ing the compute that we are currently using for AI.
So five years from now, you're going to look back. And even though it feels like we're in the midst of a transformation, if you look on a linear scale, we're still in the flat part, before the exponential takes off. So we're very, very early in the process. And the reason that it matters that we're very, very early in the process is that small choices that you make early in an exponential can unfold and have absolutely huge consequences.
There's countless examples of this in government and science and technology and economics and elsewhere from, I don't know, the founding of the American medical association in the bar, which completely changes the medical and legal landscape in the United States to someone to-- in like 1980, two scientists proposed a standardized protocol for exchanging emails. And as a result, we can all send emails to everyone else in the room. But it was a decade later before someone did the same thing for chat messages. And as a result, chat is balkanized, and there's no standardized protocol that's actually used. And so you can only send chat messages to people in the same walled garden.
The decision of Dr. Gay to share the first immortal human cell line taken from a cancer patient, which revolutionized human biology that he freely shared samples of cultures of this, they also were taken without permission. So it also put modern human bio on a shaky foundation.
There are many, many cases where if you do something at the start of a process, either organizational or technical, you set the norms and you set the trajectory that the entire process follows. And so you have this power. This is an amazing blessing. It's also like a responsibility.
So individual small choices you make are going to determine the future commercial and social and political landscape of AI, like what research problem you work on, where you work, whether you call something a capability or a risk and promote it or warn against it, whether you choose architectures that have interpretable latent states or are totally inscrutable and can't be analyzed. What tech tree does your does your project entail? So also take this seriously, not just because you're making decisions about your own career, but because you actually have a lot of leverage in how the world unfolds. And you should you should try to make it unfold in a way that you'll be happy with later. Yeah?
AUDIENCE: So I want to connect directly to this. So I [INAUDIBLE] likely replace a lot of jobs in the future, like a huge amount, and it will not be clear what all these people will be doing [INAUDIBLE] including us. And you mentioned general basic income as one of the many solutions.
And society is not yet ready. So within the people you know, you've collaborated work with, and the companies that actually develop these technologies, how much will do you think is there or possibility is there in the companies to fund general basic income NGOs worldwide? So for example, [INAUDIBLE] is on a panel up in Germany, so I could link it up directly, and the company could just fund basic income of [INAUDIBLE] on their own rights because that's the solution.
JASCHA SOHL-DICKSTEIN: So you can't-- so I think the outcome that I personally think would might be most politically palatable, and that I think also my company would support, although I am not in a position to say that in any kind of-- I can't speak for the whole organization-- is, I think, some kind of tax on generated tokens. Probably not internal thought tokens because you don't want to encourage them to compress the model's output so you can't interpret it. But tax on generated tokens might be a good source of income to pay for this.
I think a company unilaterally saying it's going to give all its money to people is-- you won't be a leading AI company for long if you give away all your money and the other companies don't give away all your money. But I think a situation where we tax the technologies that are replacing human labors and distribute that money to humans is a probably a very good one.
AUDIENCE: So you think this will come or--
JASCHA SOHL-DICKSTEIN: I think that the future is going to be utterly weird in ways that I'm not even thinking of, but I hope that this-- I think this is a plausible and positive way in which it could develop. Yeah?
AUDIENCE: Is your assumption, when you talk about the role of scientists, that when AGI is here, it will be able to solve and answer all the larger questions in science, like solve the discrepancy between quantum and classical, or what is the assumption that AGI can do science, and so we should be focusing on other things?
JASCHA SOHL-DICKSTEIN: I mean, there are things that AI probably won't be able to do. But I think that you should imagine anything you can do sitting in front of your computer or writing in a paper notebook, it will be able to do. I think what you want to do as a result of that is a little bit of a personal value choice, but I think that you should imagine that the unique intellectual skills that are very scarce right now and that make it like a good idea for you to work on scientific problems are going to become cheaper and much more widely available.
I'm going to keep on going, and we're close to the end, but I want to actually get there, and then we can talk more. So I promised that I would give concrete advice. Let me share an actual rubric that I use when thinking about what research projects to work on. This is partially informed by AGI, but this isn't an AGI-specific rubric. But there's maybe a small number of questions you can ask.
First, you should gate by impact. If this project works flawlessly, what is the potential benefit? And the larger the potential impact, the more important it is that you ask about positive impact. So you ask about [INAUDIBLE], as well as magnitude.
This doesn't mean that every project needs to be trying to utterly transform the field on its own. It's fine to do small projects, but they should be-- yeah, it's fine to do small projects, but they should be in pursuit of some larger, coherent goal. You should be able to tell a story about why you believe it's important.
And, in fact, doing an ambitious project without being able to break it apart into small sequential projects is a failure mode. But yeah, you should choose things that-- you should gate by the size of the impact you can have. You also want to ask, nowadays, what is the time scale of success? By the time you actually-- not will this project have high impact if it appeared today, but will this project have high impact if it appears when I actually finish the project?
There is the bitter lesson, which we already discussed, which is will your project be robust to scale of compute and to increase scale of intelligence? Things that are robust are things like developing foundational data sets or running experiments that are not easily repeated, developing algorithms that interact linearly or super linearly with scale. You want your approach to work better as your complements become more powerful rather than worse. If you do work that sets the standard questions or common practices or framing that future research uses, this is something that behaves very, very well with scale because it changes the trajectory that you're pursuing.
You need to ask, what is the opportunity cost of working on this project? How much time is it going to take? How much effort is going to take? How much of that is reusable for other projects if it fails? And how quickly can I get some initial signal of whether this idea was good or bad?
You need to ask why are you the right person to work on this project? What is your comparative advantage? Do you have a specific set of collaborators that are really good for this project? Do you have specific expertise? Do you have access to some resources like large-scale compute or experimental data or a pool of undergraduates you can TMS on? What do you have that makes you good at this? You should be super suspicious, though, of thinking that you have a new conceptual insight-- not that they never happen, but it's very, very rare that the actual underlying idea is a new idea. It's almost always in the execution.
And finally, and I think the one that people most neglect when they are choosing projects and especially for academic projects, is how redundant is this project? How many other people on the planet are trying to do the same thing at the same time in roughly the same way?
If you explain your project to other people, and they're just like, oh yeah, that sounds like a good idea, that's a sign. It's a terrible project. It means that you're doing something that everyone already agrees is a good idea, and there's probably 10 other people doing it. And even if you win the publication race, it's a waste of your time and skills to do something that would have happened at roughly the same time, even without your efforts. This one can be sometimes be different in industry than in research, in that sometimes, in industry, you still have to do the things that other people are also doing.
And maybe the final comment on choosing projects, people never go weird enough. Whatever project you're working on, you should do a weirder one. You're going to be judged not by the typical outcome of a project. You're going to be judged by the best few projects that you have in your career.
And this means that you should be rolling the dice on things that are lower probability of success but much more important if they do succeed. It also means that you should be trying very, very hard to work on things that are not obvious conveyor belt next steps but that actually push out, in some novel direction, that will change what future research does.
The sweet spot is if you can very clearly articulate why something is interesting and important but also, when you explain it to people, they look at you a little bit weird, and it takes a few minutes to for them to really get it.
So quite concrete rubric. I also am going to be concrete about specific projects that I, at least, am excited about in the context of AGI coming. This list is massively incomplete. Most impactful ones are probably things I've left off this. But this is nonetheless things that I think are very interesting.
Maybe number one is AI for science, like using AI to model fusion reactor internals or weather dynamics or do materials discovery or model large ensembles of neurons. I think that scientific understanding is almost purely a positive in expectation endeavor. And I think that there are qualitatively new capabilities that AI affords you.
I think that AI models are getting to the point where you can do science on them. And maybe the canonical example of this is like interpretability work, which is basically doing neuroscience on AI models. And it's still very, very primitive. We can do much better than that. But it's not just neuroscience. You can also do statistical physics on AI models. You can do psychology on AI models. You can do cognitive science on them. You can do economics on them. You can do many, many different fields using the model itself as an object of study.
There's safety research, which is quite small as a field still and also hugely important and only going to become more important as these systems begin to act increasingly autonomously and increasingly are able to do things like guide someone without a high school education how to make a bioweapon. This is going to become very, very important.
And many of the right-- people haven't even really built a consistent ontology of risk yet. There are many basic questions that are still just there to figure out. There's extrapolation and characterization of future capabilities. And I think this goes to a question earlier. But maybe one particular area here that I think is particularly underresearched is how systems of interacting agents are going to behave compared to individual agents.
These things are going to going to roll out into the world. The default consequence of that is going to be massive increases in disparity, like how we can roll these out in the world in a way where people get equitable access and get equitable access to the resulting wealth is a very, very hard problem. And it's even harder problem because it's basically deep in culture war issues. So if you have the mental stance where you can work in an extremely toxic environment and remain pragmatic, then this is a thing that you should work on.
Fixing the user interface, actually, I think counts here. I think they're not very usable now. And I think if you can find the right ways for people to interact with them, I think that makes it much better.
And finally, governments are absolutely desperate for people with deep technical understanding to interface with people building the policy around these technologies. Most of the people building the technology don't want to go work for government because it's stressful and political, and you don't get to do research anymore, and you don't get as much money. And as a result, as a human being, you can have insane leverage on how regulation is built and how the technology develops if you are willing to work on policy and government, even inside, either inside or outside, actual governments.
So that is the list I have. Yeah?
AUDIENCE: Should we add to this list estimating the reliability of the answer because, particularly if you're dealing with physical systems-- so when you import [INAUDIBLE] documents, it's maybe reliable at 10 to the minus 9, which means that there's only one chance in a billion that something wrong happens in the next hour.
If the pilot says welcome aboard, you'll be glad to know that the control system of this aircraft was designed by the latest network from Anthropic. And as a result, we have 98% chance to live in San Francisco. It would be problematic. And so I believe that all of these things-- that we'll start dealing with physical things amd [INAUDIBLE] where things can break and go wrong and so on, I think it becomes super important to also know how good your answers are and how reliable.
JASCHA SOHL-DICKSTEIN: Yeah, I think being able to predict failures and being able to make calibrated prediction of failures is, I completely agree, very important.
AUDIENCE: I wanted to ask you [INAUDIBLE]. How do you imagine this kind of superhuman intelligence? Because I think there are two ways, kind of, if you look at chess engines, for example, they play chess in a superhuman level, but it's very, very hard for humans to even predict top 10 human players in the world, and they won't be able to predict the move, maybe only after they see it.
They can rationalize it, compared to maybe a system that you say, OK, if I had put the 10 best people in the world in the room and give them one day or one year, they would come out with the same answer that the model comes with. How do you measure this superhuman intelligence, and do you think it's affected by the fact that you were trained with different data?
JASCHA SOHL-DICKSTEIN: I think it's interesting, actually, because a lot of the human data we train on is not just a human chain of thought in their own brain. It's like, someone spent five years of their life condensing all the thoughts they have into a single 100-page book, and then we train the model to try to one-shot generate this book. Or seven people wrote a research paper together, and you train the model to reproduce it.
So I think we are training the model on superhuman sequences, but I'm not sure exactly how to think about this. I also think that this is maybe-- the fact that it's going to do things that are hard for us to interpret is maybe a reason, also, that interpretability and being able to mind read the thing is so very important.
I think I'm over time, so I'm going to take maybe one question, which I'll try to answer fast, and then-- yeah?
AUDIENCE: So you showed some examples of how AI can solve some PhD-level [INAUDIBLE] in competetion. So I was wondering, how do you think-- how can we use AI to change the education or basically augmenting intelligence to look [INAUDIBLE]? So do you think this is a research area that can be explored?
JASCHA SOHL-DICKSTEIN: Yeah, definitely. I mean, I think, at every level, it is. I think that human Go players are better than they were before we had superhuman AI Go players, and it's because they can study with it. We can now have a personalized tutor for every single student on the planet. Typically, if you give someone personalized tutoring, their performance increases by one standard deviation. It's one of the very small number of interventions you can do that have a very large unambiguous effect on education. So that's amazing.
I also think that there's probably going to be increasing disparities, where there's a set of students that use the AI to cheat on all their homework and don't learn anything, and then there's a set of students that use the AI as like a personalized tutor and are, in junior high, doing graduate level novel math research. And I think the gaps between them is going to grow huge. But I think the technology is amazing for education and empowering humans. And I think that's a great thing to work on, too. You're going to make the transition a little bit less painful if there's a longer period where humans are very useful because they can work better with the AI. All right, thank you.
[APPLAUSE]