Scientific Utopia: Improving Openness and Reproducibility in Scientific Research
Date Posted:
October 28, 2014
Speaker(s):
Brian Nosek, University of Virginia
All Captioned Videos Brains, Minds and Machines Seminar Series
Description:
Brian Nosek , University of Virginia
Professor in the Department of Psychology and co-founder of Project Implicit and the Center for Open Science
Abstract:
An academic scientist’s professional success depends on publishing.
Publishing norms emphasize novel, positive results. As such, disciplinary incentives encourage design, analysis, and reporting decisions that elicit positive results and ignore negative results.
These incentives inflate the rate of false effects in published science. When incentives favor novelty over replication, false results persist in the literature unchallenged, reducing efficiency in knowledge accumulation. I will briefly review the evidence and challenges for reproducibility and then discuss some of the initiatives that aim to nudge incentives and create infrastructure that can improve reproducibility and accelerate scientific progress.
BRIAN NOSEK: My general research interest is in the gap between values and practices-- what we think we should do, what we want to do, what we're trying to do versus what we actually do. And what I'd like to do today is talk about an application of that interest on the gap between scientific values and practices, particularly on openness and reproducibility, and what happens in daily practice in our laboratories and in scientific communication more generally, and how we might think about ways to improve the alignment between the daily practice of science and what we aspire for science to be.
So I will have a little bit of data, but it's mostly a presentation of a strategy that we're taking through the Center for Open Science, with the hope that you'll have some feedback for that on where you think the approach can improve, where you think you can provide value for you, because the center is functionally a service organization. It exists as an independent nonprofit. And it doesn't sell anything, and it doesn't have any intellectual property. It builds stuff, and gives it away, with the hope that it has some value for researchers in doing their research, but then facilitates improvement in the practices that we're trying to do every day. So the critical feedback on that will be most welcome.
But for this, let me begin with the sort of core assumptions or values that scientists may or may not hold and how science should be done. And Robert Merton, the sociologist of science, did a fair bit of work on this in the 1940s, '50s, '60s, on describing the norms of science. And I just want to review those as recognition factors of what is it that we are here in the business to do, or at least how is it that we aim to do it.
And one of the core norms is communality. When we make a scientific claim, we make available the basis of that claim. What's the evidence? What's the methodology we use to generate the data? What are the data? What is the analysis strategy? We expose those things that are the basis of that claim, so that they can be critiqued, extended, revised among the community of other experts, versus the counternorm of secrecy, of keeping all the information as one's own intellectual property for commercial or competitive advantage in some kind.
A second norm he identified is universalism-- that the research is evaluated on the merit itself. The research is its own evidence for its value, rather than particularism-- that we evaluate research by the reputation of those who generated it or who made the claims.
The third is disinterestedness. Scientists are motivated by the knowledge and discovery, just the curiosity of wanting to know things, as compared to the counternorm of self-interestedness, treating science as a competition, beating the person down the hall to that finding or that effect or that prize, whatever it may be. A fourth-- organized skepticism, so a scientist considers all the new evidence, even that against one's own prior work, and maybe even embraces that that is counter to one's work as an opportunity to learn something new. My assumptions were not correct here. Isn't that an exciting discovery versus organized dogmatism-- investing my career, promoting my initial ideas and beliefs, and defending those against all the attackers and detractors who have different theoretical positions than mine.
And while Merton didn't discuss it, many others who have thought about the norms and values of science have also raised a quality as a norm, as opposed to the counternorm of quantity. We try to do very good work, not just a lot of it. And there's others that have been raised, but these five are many of the ones that are discussed most frequently as the norms of science.
Now, Anderson and her colleagues, in 2007, wanted to evaluate the endorsement of these norms. Do people actually agree with them, or is it just Merton talking to himself about what the norms of science are? And what they did was they did a survey of two groups-- early career here are people who are in NRSA or equivalent kinds of postdoc mechanisms through NIH. Mid-career are those who would had achieve their first R01, so average age around 40 years old for that group. And they had about 3,500 respondents.
And what I'm showing you here on the x-axis is the cumulative percentage of the respondents to the survey. And the gray bar indicates those that endorse the norms over the counternorms from the prior slide. Whereas the black proportion is those who endorse the counternorms over the norms. And the hatches are people who basically endorsed them equally.
So as you can see, 90% or more of all the scientists surveyed said that the norms are what they endorse, as well. So then I said, OK, don't tell me what you endorse. Tell me what you do in your daily work. And it shifted some like this.
So still, most people-- 60% or so-- in both groups are saying that they practice by the norms of science, although many more people are acknowledging that the counternorms also influence their behavior. And a small increase in the number of people who said the counternorms are what drive my behavior over the norms.
So then they said, OK, don't tell me what you do. Tell me what everyone else around you does in your field. And that's what it looks like.
So people say that I value these things. I try to practice by them, and all the other duds in my field are doing it all the other way. They're all in it for themselves. They're dogmatic about their findings. They're keeping all of their stuff for themselves, and they're not practicing by the values of science.
Now, anyone that has been studying human behavior since Psych 101 knows that this is a very difficult circumstance to be in. If you believe that the culture practice is very different than your values, then behaving by the values that you have is very difficult. All of these people are doing things to advantage themselves in their career. They're doing things that will get them jobs, help them keep jobs, but they're doing it in a cynical way against the norms of science, where I want to practice by the ideals. But then I'm confronted with this choice of behave by the ideals and disadvantage my career, or advantage my career and give up my ideals.
And that's a very difficult bind for anyone to be in for any kind of practice that they want have. And so there is a perception, at least, whether or not it's reality, that people have choices to make between their values and the practices that they're going to abide by in their everyday research. And that behavioral challenge and that perception gap or reality gap is one that we aim to try to address.
And so how I'd like to organize what I present is giving you sort of an introduction to the Center for Open Science and how it is we're trying to do this organizationally, and then hope that some of the products and services and things that we're doing will have some interest or feedback from you. The general mission of the Center is to improve openness, integrity, reproducibility of scientific research.
And we have three core activities-- infrastructure, building tools, software development for making things that are easy and helpful for researchers to do things that they're trying to do today. Community building-- building new incentives and working with the various stakeholders in science across the entire ecosystem to try to make it possible for people to behave by their values and still succeed in science. And then metascience-- doing active research on scientific practices itself, like doing large-scale replication projects and seeing what is the reproducibility rate and what predicts reach reproducibility across disciplines?
And we have a couple big projects-- one in psychology, another in cancer biology. And we have a few that are emerging in some other fields like those. I won't get too much into those. But I'm happy to talk about them later.
The center itself is an independent 501(c)(3). We have a staff of 40 or so right now. And most of them are doing software development. So we're basically a tech startup, and there are some scientists there too that are doing things with it.
So that just gives you some general context of what it is that the Center is about. And I'll just review briefly sort of the general strategy, and then show you some of the things that we're trying to do to engage long-term solutions to these issues of openness and reproducibility. The first is to build technology to enable people to change. If we don't have the tools available to actually do practices in a way that's more open, to actually make our research workflow more transparent, then asking people to change or giving them incentives to change isn't going to be enough.
The second is that we have training services to help people enact the changes. If they want to have a more reproducible workflow, how do I do that? What kinds of tools can I use? How do I use those tools? So we have free training services now for implementing some of these things.
And then the key, of course, is that you can build anything you want. But if there are no incentives to use it, then people will not use it, and it won't make any difference. And so we have to address the incentives for what's good for me and what's good for science, and try to make those the same thing.
So first, I'll talk about the technology. I won't spend a lot of time on training services, other than to say that we have them, and if you go the website, you can email and get free services. And then I'll come back to and focus on the incentives part, because I think that's where a lot of the interesting discussion emerges about how to address these issues, should we want to address them at all.
So for the technology, our primary service is called the Open Science Framework. It is a web application. It's already freely available. So you can go to OSF.io and sign up for an account.
And the basic goal of the OSF is to provide support for researchers to manage their own research workflow. So it is a project management system at the most abstract level of doing research, where we accumulate stuff. And we need to store that stuff. We need to be able to communicate with our collaborators. And we need to be able to maintain that.
And it aims to solve a problem that researchers have right now with their workflow, as they use it right now. And I can illustrate that with sort of an amalgam example of something that happens in my lab all of the time. So Anup is a third-year grad student. We have weekly meetings. We're having a conversation about some research idea during our meeting.
And I said, oh my gosh, this is just like this project that Nicole and I did a couple of years ago. And we never went anywhere with it, because she graduated, went off into industry, and so it's dead. So maybe you can pick that up as pilot data, and then take it into this new direction as you're describing it.
He says, that's great. So I say, OK, let me go get it on my machine. And so I go to my machine, and I look, and I say, oh, well, I don't have any of those materials. Nicole was the lead on the project. So I have to email her for it.
And so I email her. And because she's in industry, three weeks later she responds. And she says what project are you talking about? And so we have some back and forth over time about what the project is. She says, oh, yeah, I remember that. But I don't have it on my machine here, because that was an old machine. So I'm going to have to go look in my storage files to try to find that stuff.
So a few weeks pass, and I send her some reminders. Remember that thing I asked you? Oh, yeah, yeah, OK, I'll look for it.
And then she, one day, sends me this huge email and says, oh, OK, I found it. At least I think I found it. There's four different versions of the materials. I can't remember which one we actually ran, because remember we were talking about these different variations that we could do, and we did one of them. And here's the data, but I don't have a code book, because I never bothered to clean that up, and so I can't really understand it. But Anup will find all of this makes some use of it. And so I sent that to Anup, and he looks at it. And he says, I think I'm going to work on a different problem.
So the problem that we confront all of the time, and we're a pretty organized lab, is that we lose our own materials for our own use. We are constantly not able to reproduce our own work in order to make better use of it, because we have ad hoc systems. And we have collaborators coming in and out at the very different places. And everybody has their own system for managing their own materials and data, and there's little to no integration of that, so that we can have confidence of what each project, materials, and data actually are.
And so the OSF aims to solve that basic problem by having a common cloud-based interface for everybody to use that's on a project, so that they can access their stuff for their own use. And if we can solve that problem, then the OSF provides value for how researchers are doing their work now. We want to not make researchers have to step anywhere away from their workflows as they exist, but rather improve their workflows as they're doing them. So I'll just give a brief description of some of the features of the OSF, so you can get an idea of how the platform is trying to implement that, while simultaneously starting to give some handles on to making the research itself more reproducible and transparent.
So that's the address. You can go in and sign up, either right now, if you have a phone, or later. So the first is this collaboration, documentation, and archiving.
So this is a project page. So when I log into my account, I can create new projects. When I create projects, I can add contributors, who are my collaborators, with that project.
There's wikis. I can push files into the system. I can link up other services that are where files are stored. I can create components, which are discrete parts of the research workflow for different aspects of it, like different studies, analysis plans, data as separate components, just helping organize it in order to help manage those files.
And so all of the other collaborators, if I add them as a contributor to this project, when they log into their system, they have access to this too. And I can give them different levels of access control. If they're an RA and should only be able to do some things, I can control how many things they can do.
It also has in it version control. So its version backed by Git now. Although that may be developing or changing. But it's a version control system, where every time a file is updated, it is automatically versioned, so that the prior versions are still available.
So a lot of people have a rudimentary version control system in, for example, manuscript writing, where they're writing their manuscript. And when they're about to make a big change, they append the date to the end of the document file, and then Save As, and then in their folder, keep a number of sequential dates of big versions of that file. This does that automatically, so that you can always retrieve older versions and do change logs in order to understand what has changed in the different files, whether it be data or manuscripts or materials or otherwise over time.
It also intends to merge public and private workflows. So right now, if you wanted to make your data available publicly, most often that thought and that action would occur after you're already done with the project. So I've achieved publication, and then the journal might ask me, do you want to make your data available, because we have that service? And then I say, no, I'm done with that project. Why would I add all of that extra work when I've already achieved the end goal, which was just to get published? So appending things on to people's workflow is not a good way to get them to do more things, especially when they have a lot to do already.
So the service, as a project management tool, can be used entirely privately all of the time. So researchers can add their collaborators, allow them to work on all of this stuff. And it is a private secure workspace for just them to look at it and manage those tools. But there's a button on the top of every project and every component that says, Make Public. And if you ever decide to click that button, a little pop-up shows up and says, are you really sure you want to make that public? And then if you click Yes, I'm Really Sure, then that URL is now public.
And so it can be discovered through the search and discovery tools in OSF or just via Google or wherever else, so that others can find and access to make use of that. And you have full access control. So if I have a project, and I don't want to make the data available, but I'm willing to make the analysis plan or the code book available, so that others might discover things and then say, oh, you have those measures, can we collaborate in order to do something else, well, I can make just the pieces of the workflow available that I'm prepared to share with others.
Then once it's easy-- so the main idea with that, of integrating these workflows, is to remove the technical and practical barriers for being open and just make it a matter of intention. It just requires clicking twice. And if I'm willing to do that, then I have now made it a lot simpler on myself to make a lot of the things that I've been doing in the lab part of the public discussion, because it's a tool that I've already been using. I don't have to do extra work.
But we also need to think about why would people make their tools available? So there's a lot of work in alt metrics and trying to find ways to give people reward or at least know what kind of impact they're having. And we do all of those sort of ordinary things, like counting downloads. You can know how many times your materials have been accessed. How many times your pages have been visited is documented.
And then we're building tools to facilitate citation automatically in the system. So we adopt some tools from open source software development. Forks is a very frequently used tool, or style in software development, where someone else builds something interesting, like, say, a new measure or a material or a procedure, and I think I can do something a little bit different with it that would be interesting for my purposes. Well, I fork it into my workflow, so now I have a version of that project that's linked back to the original. And I make changes.
Then the link is always present. So there's a functional link. They build something that's of use to me, and I've changed it and alternate. And then if I make it available, and someone else then extends it, then there's another link. And so the network of development of the tools, of data, of everything else that might change over time is retained as a functional network of how people are using and developing research.
And then there's other simple things, like just creating links to things that you use. And then templates-- if you create a version of a project that someone else sees as particularly valuable, they can template it and then make changes. And you get counts for how you've done that. So it's basically just trying to surface a lot of things that happen that aren't in that singular way of evaluating impact of citation counts of published articles.
It also now links to the many components of that workflow. So that, for example, if you are someone who creates great analysis scripts, and you hate writing, so you don't actually produce as many papers as you produce analysis scripts, you can start getting citations of those analysis script and get credit for that, even if tenure committees aren't yet prepared to consider that contribution. That is all linked to the need for persistent identifiers.
So every project and component has a permanent link with an identifier associated with it. So there's something to cite and a place to go in order to get that. And each project automatically generates citations. So it makes it very easy for people to cite a data set, where they may not have thought of citing a data set as a research object before. But if we can make it very simple to do so, then people might start thinking differently about how citation can be used to credit things other than just a final report. So that's all incorporated.
There's also a feature called registration. And what registration does is, I have a link to my project, and it's a particularly important time in the study. For example, with Nicole's study, she couldn't remember which of four versions that she used. If we had had the system, what would have been ideal is that at onset of data collection, she creates a registration.
She freezes the data-- the whole project at that point in time. And that frozen version is always linked to the active project, so she can continue working with the project, but there's this frozen version that we record saying this is what the project was like at the onset of data collection. So we can always go to that and know exactly what the materials were that were part of the project when it actually started.
This can also be used for pre-registration. So if we have a strong confirmatory hypothesis, and we want to give ourselves some handcuffs in how it is we analyze that data in order to show a strong confirmatory test, we can register our analysis plan. And then, as long as we follow it, we have a some certification of some pre-registration of that approach. So it has multiple potential uses.
I should also mention that you can register privately. So people are worried, for example, about being open about their registration, because someone else might scoop them-- see that grand idea you had and steal it and run it themselves before you can get it done. Well the system does not require anything to be public. So you register privately, and then you make that available when you want to do so. And we can talk about some of the interesting challenges that emerge with that.
The last point I'll make about the infrastructure is where it's heading. And it's really to connect the services that researchers use. So the main goal of the OSF is to link all of these different services and the workflow together to help ease transition costs.
So you have different systems that you engage with-- IRB, grant applications, data collection tools, data analytics tools, data visualization, where data is stored, the publication manuscript systems-- all of these require you to make a transition-- getting out of something that you're doing and into something else, which has room for error. And it's also a disruption to reproducibility of what that workflow was-- how is it that you got from place to place? And none of those systems can talk to each other.
So the data analytics tools that we use don't easily connect to where it is we might store the data. We have to actively get those to be connected, or the visualization tools, or the manuscript authoring. So what OSF through API relationships is connect services all to a single source, so that they can talk to each other.
So we have a number of different services already connected. So if you use GitHub, you can connect a GitHub repo to your OSF project. And you can connect a Dropbox, if you use Dropbox. And if you use Amazon S3, which is, if you want, very cheap, very large storage, that's a very good resource to use, it's mostly not used by scientists because there's some developer challenges to solve, but we've made a simplified interface to engage it much more easily. So you don't have to have any of the technical knowledge. But all of those can be linked together in a single project in a single file tree, so that you can work with those tools together interactively.
And then over time, we aim to connect many more services across the entire workflow. So there are many things, for example, that researchers in social behavioral neurosciences use to help them with their research process. And if we can define API relationships between all of these things, then we can get them to communicate with each other, in addition to just helping to manage and monitor the workflow.
So you can imagine, for example, that when you're ready to push an article into the publication system, if we can connect the manuscript authoring and typesetting process with data storage and with analytics tools, then we can make every inferential test in your paper a link. And then, when you click the link, it pops up the code that generated that test and the data that the code was applied to, so that you can reproduce people's analyses as you read their paper.
And you say, well, I would have done that a little bit differently. I would have added these covariates. I wonder what happens, and you add those on the fly and see what the result is. So connecting these tools together provides an opportunity to do things in a much more interactive way with the research that you're consuming and the research that you're creating. And so that's sort of the long-term aim, to connect the many hundreds of services together, so that you can do much more more efficiently and more reproducibly.
OK, so that's part one on technology. And I want to move, for the last part of the talk, to address some of the incentives challenges. And this is really where it's easy to talk about how we could do things, but actually getting to do things is much more difficult. And before I get into some of the strategies, at least that we're starting with-- and I'll be very eager to get your feedback on those-- I want to talk about the context in which this occurs.
And these are two titles of articles that appeared in Nature in 2011. "Believer it or not-- how much can we rely on published data on potential drug targets?" "Raise standards for preclinical cancer research". These two papers were produced by two industrial laboratories-- Bayer and Amgen, respectively-- who tried to reproduce studies from high-impact articles in mostly oncology, but in a few other related fields, in the sort of the pre-development of pharmaceuticals or other therapies for clinical practice, particularly for cancer.
And their routine is let's get the result that the basic research lab got. And then we'll extend it into the preclinical and clinical stages of trials if it shows promise. Well, these two groups reported that they tried a few dozen different studies that were all very promising, lots of high-impact results. And they had respective success rates of reproducing the original results of 11% and 25% of these published articles.
And of course, those are stunningly low numbers for anyone that says, well, how much should it be reproducible? Both of them drew the conclusion, implicitly or explicitly, that the reason it was so low was because many of the published findings are false, rather than there may be other challenges for reproducing them, like technical expertise-- which they said, no, no, no, there was no expertise challenges here. We're expert-- and many other factors, the methods are not known, there may be unknown moderators. All of the various things that can be barriers to reproducibility may be real barriers, but they may be much more extensive than we appreciate when we're consuming the literature that we look at, because 11 and 25% is quite low.
There's also a very healthy community emerging of meta-science, of studying the process of scientific research itself. This is a figure that comes from one of Daniele Fanelli's projects. And he's done a lot of work in this domain, looking at, in this particular case, the rates of positive results supporting the tested hypothesis.
So down at the bottom, here, is a percentage, where he reviewed a number of different fields and just counted how often did the reported studies support the hypothesis-- showed a positive result for that effect? And from physics through psychology-- we're the winners-- we had 85% to 92% positive result rate, which is a remarkable rate of positive results in the published literature. It's all the more remarkable when compared with research about the power of studies to find positive results.
So we had a paper recently where we did a meta-analysis of meta-analyses across neuroscience fields-- a variety of disciplines, low-level animal models to imaging and other things. And we just estimate the power of the research designs in those various meta analyses. And what I'm showing you here on the x-axis is the median power for those various meta-analyses.
So there's a subset that have very, very high power-- 90% or above-- but many that were below 20%. The median was either 20 or 30% here. I can't remember. You might be able to count.
The median is somewhere around 20%. And what that means is that, assuming that every effect being investigated is true, we would expect a positive result rate to be 20%. And as I showed you here, neuroscience-- 85% positive result rate. So the expected value based on the power of designs is about 20%, to the extent that these are talking about the same population. We're seeing 85% success rate.
Those numbers don't line up. It's not possible that those two things could be true simultaneously, unless some other things were occurring. And what those other things-- I don't have that yet-- so we'll talk about what those other things are. But there are some obvious ones, which is not everything that gets done gets reported. And there could be adventurous analytic strategies that make things look more significant than they actually are. And these have been discussed by many others.
So that's one element of the context of why there are concerns about reproducibility. Another I just want to mention briefly is the extent to which we have certainty about the findings as we do them ourselves, as we observe them. And this is a project that we're just about ready to submit. And what I'm showing you here on the x-axis is an effect size.
So in odds ratio terms. So one is no effect, no association in this particular case between the two key variables. And positive values here is a positive result in the expected direction. And what I'm showing you are 28 different teams who all investigated the same research question. And their mean estimated effect size and the 95% confidence interval around that effect size.
So you're seeing quite a bit of variability across the teams in their estimated effect sizes for this research question. In this case, the question was are players with darker skin tone more likely to get red cards in soccer than players with lighter skin tone? Now, the interesting twist to this particular investigation, which had these 28 different groups-- there's actually 29. One is not here because the confidence interval is so wide it didn't fit on the page-- is that they all used the same data set.
So we took a data set, said, here's the question that we're interested in. Who wants to help analyze it? And 29 different teams did independent analysis of that data set, testing the exact same question. And this is the range of findings that they found. About 2/3 of them found a positive effect. About one third of them did not find an effect. And there was quite a bit of variability in their responses.
Now, what's even more stunning to me about that is that we had a two-stage process in this analysis. So the teams did their analysis, then they submitted them to us with the results. And then we removed the results and just put together the analytic strategy, the exclusion rules, transformations, the model they used, everything else, and then set those back to all of the teams. And every team peer reviewed at least three others' analysis strategy, gave them feedback on the appropriateness of it, things that they might want to consider, et cetera.
So they each saw many of the other analysis strategies and gave feedback. So you'd at least expect some convergence on how it is they approached the data. But this is the variability after the second round, after they considered all of that feedback, and then revise their analyses.
Also, this is not a case where we have a few very inexpert analysts and some others that are highly expert, or vice versa, because most of these teams are made up of people who are instructors of statistics or methodologies at their respective universities, or have published on methodology and statistics. So we recruited a very highly expert set of people. But there's a lot of choices that one makes. And many of the choices are very reasonable choices to make in data analysis. And yet those choices have implications for outcomes. Yes, please.
AUDIENCE: Give us a hint. What's differing between these? Like, what are they doing differently?
BRIAN NOSEK: Yeah, so some of them use different models I'm not showing you here, like mixed effects versus a linear regression approach was the dominant difference to sort of simply characterize them. There were some differences in transformations or presumptions of how to code the outcome. And then, there were differences in covariates.
So know there's 29 teams. No two teams used the exact same set of covariates in their analysis. They all had different choices of which kinds of covariates to include. And which ones are the right ones? Well, there isn't an answer to what are the right covariates to use, easily. People can debate reasonably about different choices, but all of those decisions are ones that are just part of the process of doing research.
And what we see in the average report is one of these analyses. But in this case, which one happened to have occurred which would have had an impact of what the conclusion would have been drawn about the paper, where in reality, if many different people looked at the same data, we'd find very different answers. So this is a scenario where there's not necessarily motivation to get a particular outcome, but there's still a lot of variation in the outcomes just because there's many choices that we make as we're doing our research, as we're doing our analysis, and what we're deciding to report, that we don't easily appreciate as adding to great uncertainty in how to interpret the outcome as it is reported.
And so the value in this case of transparency of that data is it's possible to observe that the finding is highly contingent on different choices made in the analysis. And then the question is which one or ones of those analyses are most defensible? So the more general issue-- just the flexibility of the choices that one makes in analysis-- is how those different choices can facilitate making research look better in print than it is in practice.
So we're at the boundaries of knowledge. We're studying problems that are hard, and that's why we're studying them. And most of the data, at least that comes through my lab, is a mess.
And it's a mess because we don't understand the phenomenon that we're investigating. But we're incentivized to, when we try to do get it published, make it the most novel, clean, beautiful positive results story that we can, because that's what makes for a higher likelihood of getting a publication at a higher prestige outlet. And of course that's a good thing. We should be wanting novel, positive, clean results, because those are the best kind of results.
But because we're working on hard problems, that's not what we're getting most of the time. And with a lot of flexibility in how we analyze our data, in what data and studies that we choose to report, or not, and the flexibility in reconstructing our stories, presenting hypothesis, generating analysis, we do a lot of exploratory research with our data, because we learn a lot of things, especially unexpected things, from our data. But the incentives are to present it as if we knew that all along and that was our overriding hypothesis going to it.
And we can't use the same data for hypothesis generating and hypothesis testing. They have to be tested separately. But we're not incentivized to do that.
And so we have lots of tools at our disposal that we could use and that are in our interest to use, if our outcome of interest is in publication. And the problem is describing them here-- yeah, please.
AUDIENCE: [INAUDIBLE] So what is the role of the peer review process [INAUDIBLE] So in other words, shouldn't the peer review process be able to take care of [INAUDIBLE].
BRIAN NOSEK: Yeah, so in an idealized form, then yes, the peer review aims to identify potential error and root that out. But if that information is not available to me as a reviewer, then there isn't a way for me to say, well, jeez, that's not the choice I would make, or I don't know if that's a defensible choice, or at least present it analyze this other way. And so for those mechanisms to work, it requires transparency to know the workflow, to know the alternatives, to know how choices were made.
AUDIENCE: [INAUDIBLE] to incorporate this in the peer review process?
BRIAN NOSEK: So there are many different types of challenges that can confront it, because peer reviewers, for example, are already highly burdened. So if we say, OK, now what peer reviewers need to do is get the original data and reanalyze it, in order to write their reviews, then the 5% acceptance rate for being a peer reviewer would probably drop to near zero. So there are real challenges in saying, yes, the ideal is that peer reviewers can look at everything and reproduce it and do it all.
The reality is that that's not going to quite work, and so they have to have other ways of thinking about how to generalize peer review. Make it more open, rather than the very constrained process that we have now. And that goes beyond some of things that I'll say now, but I'm happy to talk about that more.
Of all the things I've described here as challenges and others-- low power, there being more positive results than we would expect, the ignoring null results-- I didn't show you some recent data. A publication just came out a few weeks ago in Science documenting the file drawer effect in a very effective way, about how when labs get negative results, they're much less likely even to write them up, let alone just fail to publish them-- the lack of ethic or incentive for doing replications, and the limitations of null hypothesis significance test, which we didn't get into.
All of these that are the dominating part of the discussion of the methodology debates across different disciplines right now are not new. They were discussed in the 1950s through '70s in much detail, and prior, many of them, for the challenges facing the discipline for getting a reliable research literature. Also those same publications offered solutions which are the same as the solutions as we're talking about them now.
We need to increase transparency. We need a stronger clarification between what is confirmatory versus is exploratory. We need more value of replication in places for people to publish those things. We need to publish null results, et cetera.
So if the problems have been understood since the 1960s, and the solutions have been understood since the 1960s, then what's the problem? And the issue, I think, is that there isn't an error with the identification of problems and solutions, but rather what hasn't occurred is appreciation of how implementing the solutions has to take stock of the incentive structures in the ecosystem as it exists. As it faces my choices every day as a practicing scientist, how is it that I can pursue those solutions and still survive and thrive in the discipline?
And for us, that comes down to a core challenge, which is the incentives for my individual success are overwhelmingly about me getting it published, not about me getting it right. I'm incentivized to make it the most clean, beautiful, positive, novel story I can. And if that sacrifices some accuracy, there are very few implications.
And that doesn't mean that I'm doing it deliberately. It doesn't mean I want it to be inaccurate. I didn't get into science to do inaccurate research. I got into it because I'm curious. I like to work on problems. I like to discover true things. But I also need to get a job, and I need to keep that job. And I need to continue to survive and thrive and feel good about my progress in the field.
And those factors are real factors that face individual researchers every day, if they want to survive in the field, especially when they perceive the rest of the field as not doing it according to those ideals. If everyone else is willing to cut those corners and do those things in order to get publications and remain active, then I can choose to with the ideals and then drop out of science. Or I can adopt some of those practices and potentially succeed.
So there are a number of challenges that face us in actually making headway into trying to address that incentive challenge. One of them is the perceived norms gaps that we just discussed at the beginning between what we think others are doing and what we perceive our own values to be. A second is that we have these fantastic frontal lobes that can make whatever is best for me the right way to do it very easily. When I'm confronted with the five different ways we analyze the data when we're digging into it, the one that actually looks best for the outcomes that I need suddenly becomes very compelling as the right way to do that analysis, because I can bring to bear lots of great reasonings and justifications in order to make that the right one. And I don't even necessarily aware that I'm doing that.
Also, you don't know what happens in my lab. You only see the final reports. And so when we're in a situation where we have lots of flexibility in how it as we approach our analysis, what it is report, if only what you'll know is the outcome, then I have no cause to self-reflect on would I make this same decision if I knew others were watching me? If I was observing myself doing this, would I say, that's the right way to do the analysis, or that's a defensible choice, or whatever else? And so that perspective-taking doesn't occur as easily-- found a lot of literature demonstrating this-- when we are without accountability or any notion that we're being observed.
And the other primary challenge is that I'm busy, and so are you. It's great to talk about ideals and how we ought to do things and how wonderful science could be, but we have stuff to get done today. And there's a lot of stuff to get done today.
And any new practice that you ask me to do, most of the time, I'm just going to say no, because I've got the way that I do things. And I'm effective-- at least I think I'm effective-- in the way that I'm doing them. And so adding to that work flow or disrupting it is something that is very difficult. Inertia is very powerful because of the efficiency gains that we perceive getting from it, at least in the short-term.
And so all of those challenges have to be addressed in any tools that get brought to bear. And they have to be faced directly in any way that we're going to think about the incentives that need to overcome them. And so I'll talk just briefly about a couple of different thoughts about different incentives to try to address this, and then we'll close so we have a couple minutes for discussion.
One of them is just based on a very simple fundamental process of signaling. When you want to promote a behavior, especially when that behavior is already valued, you provide signals when it occurs, so that other people can observe those signals as indicators that it's possible to do the behavior and that there is value from the community in some way for doing that behavior. So badges becomes a very easy means of doing that.
So there is a open community-- you can join it if you would like-- that the cause supports, that is developing the specifications for what makes something open data. What makes something open materials or code? What qualifies something as pre-registered? And journals, or other entities, can adopt these badges as options for authors, for example, to apply force.
Say, I meet the Open Data standard. I'd like to get the badge on my article for that. And that, then, becomes a signal for I did that. This project has that.
So if a journal adopts that-- Psych Science is one of the adopting journals of the badges so far, then what it does is it signals that it cares about these things. Psych Science cares about open data and open materials and pre-registration. And then it provides information about who is doing those things.
And just seeing that, even if it's in a small minority of cases, if it's a valued practice, it makes much more difficult a lot of the initial barriers to doing those behaviors. I don't do it, because it's a pain the neck to do it, and there aren't ways to do it. Well, those are getting undermined by the fact that Pam Mueller and Danny Oppenheimer are doing it. So there must be some means of doing it, and there's evidence that other people are doing.
So those are ways to promote a practice in a very simple way-- very low effort. And we have technologies behind these where the badges themselves are baked with information about the issuer of the badge, about the DOI and other indicators of the article, so that they can be digitally tracked, and they can even become linkable to the data, or the materials, or the pre-registration itself. Psych Science hasn't integrated that yet, but that's coming. So that's one.
A second is a model called registered reports. And this is the cartoon version of the research process, right? We design a study. We collect and analyze data. We write a report, then we publish it.
Of course, there is an important barrier in that process. And that is peer review. The basic idea of registered reports is to move peer review from after the research is already all done and there's nothing that can be done about it, other than make it as nice as possible for a reviewer so hopefully they won't hate it and move it to the design phase. So peer review and registered reports is done with the research question, the methodology, the analysis plan, all the justifications for why it's an important question, why we need to know the answer, regardless of what the outcome is, and how it will be tested, and what quality assurance will be included.
And then, if it's accepted, it's conditionally accepted regardless of outcome. So if you follow through with what you said you're going to do and the peer reviewer suggested was needed, and if t non-outcome relevant outcome criteria-- showing that you actually tested the question, the manipulation worked, et cetera-- if all those check out, then the journal guarantees that it will publish the report.
So this shifts incentives in a number of ways. One, making it possible to publish negative results on questions that are of importance, because you don't know what the outcome is, so it's going to be published regardless of that. It lowers the barrier to doing things like replications, because all that they need to do is submit a design, and rather than run the entire study before finding out that it's not possible to publish it, because no one cares about that. And it's also an opportunity to distinguish confirmatory approaches from exploratory ones.
So in the registered reports model, it does not hamstring us from now looking at-- once we did the analysis that we register in advance, we can also, then, do all of the kinds of exploratory analysis that we ordinarily do with most of our research. You just have to report it has a distinct part of the results section. So here's the confirmatory tests, and then here's our additional analyses. And then our discussion is whatever consideration we have for what we learned from that process.
So there are many occasions where this isn't an appropriate design. But as a complement to the traditional peer review process, it offers an opportunity to do strongly confirmatory designs and to get a lot of error and design quality things addressed in advance of actually doing research, rather than after the fact. So there are 13 journals so far they've adopted this as a submission option.
In fact, one of them Comprehensive Results in Social Psychology is just launching. And they're only publishing registered reports. Everybody else just has this as a submission option.
We have published a special issue of 15 of these in Social Psychology, I think in May of this year. And it was very instructive as a process for how this works. And so if you want to talk about that, happy to do so.
Perspectives on Psych Science is doing a prominent version which is only for replications. And their first one came out of Jonathan Schooler's verbal overshadowing work. And then eLife is doing this for our reproducibility project on cancer biology. So all of those projects are being peer reviewed in advance by eLife, a life science journal before we do them.
OK, so let me just close with a couple comments noting that any of these ways of trying to address incentives can't be done in any single part, because the ecosystem sort of reinforces and sustains many of the things that drive individual researcher's behavior. And so there are many groups that have to be involved and to coordinate in order to think about how might we improve my ability to succeed in the field, while also trying to approach the ideals of how we think science ought to be done. So for example, universities might think about what they make explicit about their search and promotion processes to try to reinforce what they would like their researchers to be doing in order to achieve tenure or promotion in their institution. And also, it's very clear that it is important that the scientific disciplines do something about this, because the concern has extended very broadly and very vertically about the reproducibility of scientific research.
So this Federal Register-- this is the least-read journal in the world, the daily journal of the United States government. The Office of Science and Technology Policy issued a memo on the end of July of this year. And there's like 29 things where it asks for information or requests for information. So anybody can submit comments. And this is what the offices in the White House do when they're getting ready to act. They send out public requests for comments on these different issues, and then they use develop directives or commands or whatever they do for the various federal agencies that they oversee.
And one of those was given the recent evidence of irreproducibility of a surprising number of findings, how can the federal government leverage its roll as a significant funder of scientific research to most effectively address the problem? So the government is prepared, and getting more prepared, to act. And a variety of our funding agencies are already doing things. NIH has many new programs in order to try to address reproducibility.
So if we're going to have an impact on how our discipline operates, we need to be part of that conversation. And the federal government, without guidance, can make decisions and offer solutions that we would find to be not productive for the way in which we would like the science to work. The IRB system, for example, is one where there were lots of decisions made without consideration of the realities on the ground of how that stuff actually works. And that one, actually, is a success story now, because there's some very rational changes happening to the IRB process.
But this one has the potential to go very badly, because stuff will happen. So if we don't get in front of it as communities, showing that we already know how to address these challenges or how to do this well, then it might be done to us. And we might not like that outcome.
So for us, the main goal is to address this perception or reality gap that the values that we came to have in our daily practice are also ones that we see visual signaling evidence that others are doing, and that we are all reinforced for pursuing a kind of research community and practice that is aligned with what we want the science to be. So I will close with that, just acknowledging the team so far at the center, and remind you that if you are looking for a job, we have lots of job openings. So thanks very much for your time.