Resources

See if talking to the doctor is right for you (Max Hockstein, Georgetown) | posit::conf(2025)

See if talking to the doctor is right for you. Speaker(s): Max Hockstein Abstract: Clinicians and data professionals have different approaches to data which can lead to communication difficulties within a research team. This session will highlight common areas of confusion between data scientists and clinicians. Several techniques will be reviewed to facilitate communication using several examples in R. There will also be a real, live-clinician who wants to hear from you and will apologize for the actions of any past clinician researchers. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, my name is Max. I am a physician from Washington, D.C. And what's happening right now is something that you guys are very familiar with. It's like, no, this code was working just a second ago. Hold, please.

So, while we are figuring this out, we're going to do a thing where I don't use slides and try to convince you and simultaneously apologize for any transgressions that clinicians have transgressed during their interactions with you as data scientists.

So, I live in three different worlds. The first world I live in is emergency medicine. And the specialty is largely what you think it is. It lives at the intersection of patients that really need care and a system that doesn't.

The other part of my life is critical care. I practice cardiovascular surgical intensive care, which means that nobody wants to come visit me at work. Cardiovascular intensive care is a very quantitative field of medicine. And you'd really hope that all medicine is quantitative, but you're about to figure out that it's really not.

And critical care led me to an interest in clinical research. And clinical research was kind of where my fun started.

How clinicians think about data

So we talked about emergency medicine, intensive care, and clinical research. And so my idea of clinical research when all of this started was that I hand you this, right? And then I hand you an Excel file you put into the black box machine, which gives me a p-value. And then we gallop into the sunset of PubMed.

But what became even more clear, more quickly, is that what I'm actually handing you is something like this. Some heinous spreadsheet that I thought was very well done. It turns out it's not. And what the data scientist tells me is basically just kill it with fire. I never want to see this again. And the project goes into the abyss.

So you can pretty easily imagine that our first couple of doctor-scientist meetings, I'm not sure who's who here, right? And I was asking for things, and I wasn't really sure what I was asking for. And so I sat with this for a long time. And when I went to grad school for clinical and translational research, I was having a lot of difficulty merging the gap between health care professionals and data scientists.

And after sitting with this for a while, it hit me that health care communicates uncertainty. We love telling people what diseases they have. Yes, you have coronary artery disease. No, you are negative for this. And that is kind of how we're taught.

And so in creating this kind of paradigm, I went back to kind of this book we have that every medical student uses, this book called First Aid. And it's basically a high-yield board-review book. And the year that I used this is, like, aggressively unimportant. But the point I want to make here is that these two pages are all doctors learn about biostats.

So there's something about, like, incidence and prevalence. Please don't try to read my handwriting. It's not useful to anyone's time. And so really this is all we learn about numeracy, biostats, and inference. And so this is then continued and reinforced by the way that we take tests.

So, like, every test in medical school is the same. The question here is about a disease called Goodpasture syndrome. Please don't look this up. But again, it all starts the same. Patient comes in with a thing, and then we're given a set of data. And then we have to do something really, really gnarly, which is dichotomize a continuous variable to say something is high or low.

And then the question gives you some other, like, aggressively unhelpful information. And then you pick the answer choice, which here is A. And this is something that we use all the time to address concern for something called acute coronary syndrome or heart attack.

And I was working with a resident the other night, and I said, hey, is the troponin back? Can we, like, move on? And he said, yeah, the troponin's back, and it's negative. Right, and I see people over here, like, clutching their pearls. Yes, right, correct. Their idea of negative is binary plus minus when our interpretation of that is, like, what's the limit of an assay of a continuous variable? Guaranteed that no doctor thinks about that.

And to reinforce our mistreatment of numbers, this is a white blood cell count that we get part of a panel called a complete blood count, or CBC. And you'll notice here that the normal high range is 10.8, and the value here is 10.9. Many people will say, ah, value 10.9 is higher than 10.8. This patient has a very high white blood cell count.

We have no concept of what error is. And when we hear about error, it's something that I did wrong, not necessarily a thing that happens.

And we can create a lot of cool diagrams like this, right? So the incidence of a certain disease process over time appears to be going down. And this has, this is probably just, you know, the disease getting better, right? So rather than using causal inference, what doctors do is more casual inference, where we kind of see like, ah, that's going down, therefore the two must be related.

So rather than using causal inference, what doctors do is more casual inference, where we kind of see like, ah, that's going down, therefore the two must be related.

And you can imagine that this is like in direct confrontation to our data science colleagues who like to communicate in uncertainty, which is to say when they publish papers, they say how uncertain they are.

So this is a manuscript that was published a couple of years ago, just by way of its execution. The TLDR is patients that come to the hospital by ambulance die more than those who don't come by ambulance. Uh-huh. So that's good. But when you look here, the confidence interval for this odds ratio is between 7 and 17. Is that a helpful number for anybody? I would argue no.

And so I remember in grad school then learning about bootstrap confidence interval, and we were about two hours into a lecture, and the professor's up there doing some like very mathy stuff at the front of the room, and I remember in my post call, unfiltered state, saying like, what are we doing here? And that was meant more of like a meta, and so the professor said, we're just seeing how big the error is. The estimate stays the same in all this foolishness that we're doing up here on the board is like for a confidence interval, and unfortunately clinicians don't understand how important those error bars are.

So it's not necessarily surprising that a lot of clinicians will look to the p-value or some other measure that we publish greater than .05. Time for subgroup analysis.

The communication gap

But this disparity in communication is not unique to the dynamic between physicians and data scientists. Anybody know what this picture's from? Bay of Pigs invasion, which I know is not on the forefront of everybody's mind, but the reason that the Bay of Pigs invasion was okayed by the U.S. government was because there was a fair chance of success. Now, in the report that was issued afterwards by the CIA, it turns out what was said in the room was that the odds were 3 to 1, and unfortunately using this imprecise probabilistic language, 3 to 1, oh, that sounds good, let's go, which resulted in what is arguably one of the larger military kerfuffles in world history.

So it's not surprising that experts are really good at what they do, right? They have knowledge, they have skill, and they've been really, really dedicated. What we are not taught anything about is communication, and so the solution to this is actually partnership and nosiness.

Partnership and nosiness

So I want you to be nosy. The point of why I'm here today is I want you to get into the nitty-gritty of the investigation that you're doing because they're really, really different. So when I say be nosy, I mean I want you to ask, like, why are you doing this thing? How does that work? Why is that there? And in return, what really helps us is to hear, like, the reason I'm using Model A is because I don't like Model B.

So when I say learn anatomy and physiology, A&P is like one of those bread-and-butter classes that we do in medical school. In and out, any variation is bad, right? That's, like, the first two years of medical school. And so a lot of what I do is hemodynamic research. So when I report heart pressures, feel free to ask, why do we care so much about pressure in the right atrium? Well, really, it's because of the health of the pulmonary vasculature.

Similarly, why are you using linear regression as opposed to logistic regression? Any data scientist that could possibly be asked. You'd be surprised the number of doctors that struggle with when to use linear versus logistic regression.

So similarly, in both fields, we have pathology. Sometimes the heart does weird stuff, and we have to deal with it. And sometimes in your IDE, it does weird stuff. And similarly, we have diagnostics. When we have a question, we will do a CT scan. And similarly, when you have a problem with a model, you do nearly identical diagnostics just using different words.

So when you see a CAT scan or results of a CAT scan, please feel free to ask us, why is this gray and this is a little bit less gray? In the same way, say, listen, the reason I'm not using your really good idea is because the model doesn't work as instrumental to our growth as clinicians and as physician researchers that we learn this.

And just like we have diagnostics, we have therapeutics. Sometimes when the heart is acting crazy, sometimes we apply electricity to get it back in the correct rhythm. Similarly, when your data is acting a little bit crazy, you do something to it to make it act a little bit better.

And so when I say to consider our IDEs, I work where I used to work every day. And so I want you to take a moment and look at every single part of this room that could potentially give you a piece of data.

You have a ventilator in the back, you have an ultrasound, you have a cardiac monitor, you have blood, but some of it stayed inside. But think about the amount of data that healthcare professionals have to assimilate very, very quickly. This is our IDE.

This is our other IDE. So this is what's called a flow sheet. And this is not a real patient. This is, like, I found it online somewhere. Amy Fisher is not a real person. And this is just a way for us to interact with the patient by way of the computer. And this is a way for us, as clinicians, that we will or won't perform to make a lot of these numbers different.

Now, compared to that, a different IDE, where a lot of the data that you use and eventually operationalize is done and now what I'm told is positron and that I will be using it from here on out.

But what we think of on the clinician side is that our vision is that this is what you're doing in the background. And this, to us, is very, very confusing. I did take, like, a couple minutes and I saw a line integral in there somewhere from a class that I drank my way through in college. But our vision of statistics and data science is nearly uniformly this, which makes it very intangible to a lot of clinicians.

And so an old joke in medicine is, you know, one view is no view. And so whenever we look at a data set or we look at a paper, what we may see is this. When we add data science professionals to our interpretation, what we actually see is this. And the old joke in medicine is, while I get one view, always get the lateral.

And so with that, I want us to seek partnership and nosiness together. Anytime that you work with a physician or any other clinician, please indulge the, I want to tell you about this model. I want to tell you how I transformed this variable. Let's program this together because I guarantee you your clinician is more scared of you than you are of them.

Let's program this together because I guarantee you your clinician is more scared of you than you are of them.

And with that, I want to... I had this slide very, very quickly while I was sitting there because I want to display that I have the cutest pets on the Internet. And I have Avon Barksdale here and Koa Brown who are very good boys and girls and are also my partners in nosiness. And with that, thank you so much.

Q&A

Quick question. Were you always interested in statistical concepts in your medical education or was there a pivot to this when you started research?

Undergrad, I was a math major. And statistics was sort of treated like, almost like physics where like, no, those are like the applied math people, we don't talk to them. And so I guess I recognized the importance of statistics. I wanted no part of it until I realized that statisticians are really, really expensive and that sometimes doing your own very basic stats is a lot cheaper and they hate you less.

Totally makes sense. One more question. Could you comment on any statistical concept or anything like that that you believe that is very important for other clinicians to know that they likely do not?

Yeah, I think that the biggest sin that clinicians commit is dichotomization of a continuous variable. And I've harped on this a little bit. I mean, the 10 bar count of 10.8 is fine, but 10.9 is an emergency is absurd. We all say that and we all are nodding our heads here because we've had this conversation just using different words.

But there are many clinicians that don't necessarily have the appreciation of where the error bars are. That with every value that you sample, there is a distribution around it. I would tell clinicians that yes, I'm saying this, but there is a certain guardrail of safety around what I'm saying. What I'm saying is correct up until a certain number of limits. I would say that that's the concept that clinicians struggle with is the concept of error, not necessarily that they've committed one.

Great. Thank you so much.