Garrett Grolemund | R Markdown The bigger picture | RStudio (2019)

Transcript#

This transcript was generated automatically and may contain errors.

Last summer, I had the chance to participate in the writing of this book, which is called R Markdown The Definitive Guide, and it's available for free online at this web address. I'm particularly pleased to be involved with this project because I believe that history will see R Markdown as a turning point in the replication crisis.

If you haven't realized it yet, the replication crisis is the one dark cloud in the otherwise bright future of data science, and it's a darker cloud than you might think. I'll assume that you may have heard something about the replication crisis because it's been widely reported on in academic journals like Nature, but also it's crossed over into the mainstream, being published in papers like the Wall Street Journal and the New York Times.

Basically, earlier this decade, pharmaceutical companies noticed something alarming. As part of their work, pharmaceutical companies replicate the results of promising studies to evaluate whether or not they could turn the findings into a drug or a treatment that they could then sell. Normally, they keep what they find in-house because it's a competitive advantage, but in 2012, the Amgen company noticed a picture that was so disturbing, they decided to make it public.

Amgen had replicated the results of 53 landmark studies. Now, landmark is Amgen's word, but by that I mean studies that are influential and that other studies relied on. What Amgen discovered is they could only get the same results as six of the 53 studies. Now, that's really bad. In scientific terms, that means the other 47 studies might as well be wrong. In fact, they probably were wrong, but they were published in peer-reviewed journals and they're accepted as true by academia.

After this announcement, the Bayer Pharmaceutical Company confirmed that they could only replicate the results of about 25% of the studies that they recreate. Since then, we've studied this in depth and our best estimate is that about 75% to 90% of research in preclinical studies is irreplicable. That means, again, that this research might be wrong. The results are likely coincidence at best.

Since then, we've studied this in depth and our best estimate is that about 75% to 90% of research in preclinical studies is irreplicable.

This should be concerning to you. Not only are academic reputations at stake, but there's a lot of money that is wasted on this research that can't be replicated. In fact, we have a very good estimate of how much money is wasted on irreplicable research in biomedicine, and that is $28 billion per year in the United States. Now, to put that in perspective, with $28 billion, you could buy a latte for everyone on the planet from Starbucks.

Now, if you're like me, you might not have a good sense of how many people are on the planet, but estimating based on the number of people in this room, with $28 billion, you could buy everybody in this room their own private island in the Bahamas, assuming that supplies last.

This is just for research done in one year, and it's just money wasted in the United States, and it's just money wasted in the field of biomedicine. Unfortunately, all signs suggest that the replication crisis is occurring across every branch of science.

For example, a study of 18 influential economics articles from prestigious journals revealed that only six of them had results that could be replicated. A study of 21 articles from Nature and Science showed that only 13 of those articles had results that could be replicated. Now, Nature and Science are considered the most prestigious academic journals, and they publish work across all the domains of science. So while 13 out of 21 isn't technically half bad, it's still very alarming.

And there's other reasons to be alarmed, too. We've seen that money is being wasted, but there's a real opportunity cost here. These studies are meant to do things like generate wealth, heal the environment, and cure cancer, but they can't do that, and worse, they're misleading the people who would otherwise be solving those problems. I like a healthy environment, and I really do want them to cure cancer, especially before I get too old.

The other thing is, these studies have become part of the scientific consensus, even though they're useless, and we don't know which part of the consensus they are. I mean, yes, we do know the ones that were studied, but everything we haven't looked at, we don't know what's true and what's false.

But more personally, if your expertise is closely associated with data science, and I suspect that it is since you're here, the replication crisis for you is a credibility crisis. The common denominator of all these studies is that they rely heavily on data and methods for analyzing data. And like it or not, commenters have observed this pattern. Academics, and presumably the people who read the New York Times and Wall Street Journal, are starting to realize that data is not a panacea, and neither are sophisticated methods for analyzing the data.

What's really causing the replication crisis

You can solve this problem, but you need to spot the cause first. I can tell you what academia thinks is the cause, because the American Statistical Association published an article on it last fall. It's right here next to the cover story on R, which, by the way, is also a very good read, and the ASA used the metaphor of a cargo cult to explain what's going on.

So do you know what a cargo cult is? In World War II, after World War II, in isolated South Pacific islands, cargo cults developed. During the war, natives who lived on these islands saw soldiers arrive and build airfields and radio towers and whatnot, and then miraculously, from the natives' perspective, planes descended from the sky, laden with cargo for the war effort, and a lot of that cargo made its way into the hands of the natives, who found it very, very useful. But the war ended, and the planes stopped coming.

So in some places, the natives reconstructed landing fields, radio towers, radar dishes, the things that they had seen the soldiers use to try to summon the planes back. They didn't understand the original technology, but they assumed if they did something that looked similar, they could get similar results. Well that's the metaphor the ASA uses. The ASA says that many applications of statistics are cargo cult statistics. Practitioners go through the motions with scant understanding. In other words, the people analyzing data just don't know what they're doing.

It's a convincing story in some ways. I'll show you. This is a worked example for research workers. We can assume that Sir Ronald Fisher understood the original technology of statistics because he invented most of it. The example begins here in blue text and continues for a few pages, and what Ronald Fisher is doing is he's stating his problem very precisely. He's describing what he thinks are important characteristics of the problem. He's spotting some reasonable assumptions, and then he's inventing a method that can help him answer his question. And down here we get his answer.

This is a really simple thing like, you know, what's the correlation? But this is how he did it. Now here's how modern researchers, at least the ones who don't use R, might do the same thing or handle the same problem. They'd say, well, okay, my software gives me these 20 tests to choose from, so I'll pick this one.

As I said, it is a seductive story, and if the authors of the ASA article are in here, I apologize, but I don't agree with the explanation. And that's largely because I have a PhD in statistics, and if you do statistics completely correct, you can still have a replication crisis.

Now when people talk about the replication crisis, they often mention p-values. Let's use p-values as an example. People say we're p-hacking, we're misusing p-values, but let's look at what happens when you use a p-value correctly. A p-value just means that you've done a statistical hypothesis test. Almost all statistical tests account for one source of variation, the uncertainty that comes from taking a random sample from a population. A p-value attempts to quantify the uncertainty.

But this source of uncertainty is just one source of uncertainty that you will encounter every time you try to use data to answer questions about nature. This is a graphic developed by my friend and colleague Drew Levy that depicts the other sources of unavoidable uncertainty that are involved in the process. Each small white bullet point here is a different source of uncertainty, but p-values only address that one segment over here. And those of you who suggest that we should replace p-values with something like a Bayesian odds ratio or whatnot need to account for those other sources of uncertainty too. A replacement for a p-value isn't going to solve the problem if it also only looks at that one source of uncertainty.

We act as if accounting for that one source of uncertainty justifies the entire process. But it doesn't.

Confusing science with math

So let me tell you what I think is going on. And I could do it with this example. This is just one of the pages from that Fisher example we were looking at. If you look closely at this page, you'll see things like formulas, algebraic variables, factorial signs. It looks like math. Why might that be a problem?

Well, let's do a thought experiment. Which set of words do you associate with math? I'm going to guess that's not hypotheses, messy, best guess, or discover, unless you're very new to math or very advanced at math.

The beauty of math is that it's so precise. It's so logically certain. You can prove things with math. And that's why we love to use it when we can. But now think of science, the sort of roll your sleeves up, get your hands dirty and make an experiment science. You're always working with hypotheses that only represent your best guess. At any point in the future, your hypotheses could be revised due to new data or completely overturned. Your only road to glory in science really relies on discovering something that no one else has documented yet. With science, you cannot prove things.

These two systems are complementary. Math as a form of logic can prove things for you, but only things that are already present in your premises or implied by your definitions. Math can't tell you if those definitions could correspond well with reality. But science can. That's science's job. Science helps you pick the most pragmatic definition, and it helps you keep track of whether or not that definition corresponds to your current state of knowledge. But science can never prove that that definition is correct. It will always be an estimate or a guess.

Now those of you who are fans of hypothesis testing might say, well, no, Garrett, you're wrong. You can prove that a hypothesis is false. Well, no. You are wrong. If there is a probability model involved, there will always be some probability, even if it's very small, that your hypothesis is correct, no matter what the test says. We just round that probability down to zero if it happens to be small, say less than one in 20. And that's the point. A statistical hypothesis test looks like math, but it's not math. It doesn't deliver logical certainty.

So we've created a cargo cult by confusing science with math. As we started to use more and more data in our science, our methods started to resemble more and more math. Somewhere along the way, as a group of people, we forgot or stopped acting like math was only a tool for scientific reasoning. We began to believe that our work was like math. It could deliver logical proofs.

So we've created a cargo cult by confusing science with math. As we started to use more and more data in our science, our methods started to resemble more and more math. Somewhere along the way, as a group of people, we forgot or stopped acting like math was only a tool for scientific reasoning. We began to believe that our work was like math. It could deliver logical proofs.

Now we must undo that cargo cult, and this is very important because we're on the verge of starting a second cargo cult as we use machine learning. Our science is starting to look more and more like computer algorithms. Computer algorithms are very powerful, automatable, reliable in their own way, but our science does not gain those quantities just because we use a computer algorithm. And what's worse, when machine learning fails, it seems to fail in a much more public way.

Garrett Grolemund | R Markdown The bigger picture | RStudio (2019)

Transcript#

What's really causing the replication crisis

Confusing science with math

Reproducibility as the solution

R Markdown demo

Q&A

Featured software#

lubridate

rmarkdown

rstudio