From Data to Dollars: Improving Medical Billing Accuracy Using NLP (Julianne Gent, Emory Healthcare)

Transcript#

This transcript was generated automatically and may contain errors.

All right, good afternoon, everybody. Hope everybody enjoyed Atlanta's King of Pops. It's a very large stateful here, so I'm really glad those of you who are out of town got to enjoy it.

My name is Julianne Gent, and today I'm gonna be talking about some ways that we are protecting some of our frontline healthcare heroes in regards to specifically billing.

So imagine, if you will, with me. You are an intensive care physician. You see and take care of the sickest patients in the hospital. You are at the end of the third consecutive 12-hour shift, and half of the people in your unit needed some sort of life-saving treatment just today. You then suddenly get a call from your hospital's billing department, and the manager says that they received a complaint from one of your previous patients saying that their insurance is denying your care and that they must pay for their entire hospital visit out of pocket.

Then, on cue, you get a call from your director's office where an auditor from the Center for Medicaid and Medicare Services is sitting. They inform you that they've reviewed several of your clinical notes and are accusing you of medical fraud.

They actually show you the note in question which is related to the same patient who had called billing earlier. Here's the note here, and all of this happened simply because you added an extra zero on accident in your long documentation claiming that you were billing for 300 minutes instead of 30.

So, this might seem like a very exaggerated situation, but this can easily happen to our overworked and burned-out clinicians where these simple clerical errors could lose the patients, their jobs, and their reputation. So, seeing this happen at other healthcare entities and foreseeing that this may happen to us in the future, our executive leadership came to our team with the following question.

Can we help clinicians maintain accurate billing documentation while meeting them where they're at? Basically, how can we help physicians do their job better without increasing their already overburdened workload?

Evaluating solutions

So, of course, our office said, of course we can. So, we actually considered a couple of solutions.

The first was to require clinicians to review every single note they've written that day and double-check for billing accuracy. Well, we just talked about that we need to meet physicians where they're at. We're not gonna add yet another tedious task to their already long stack of tasks to do and take away from patient care.

So, we're gonna go ahead and eliminate that. The second one we looked at was called a SMART phrase, and it is a functionality that is provided by our electronic medical record or EMR software vendor. So, basically what this does, it allows clinicians to type in the name of a template that consists of a lot of commonly used phrases. For example, here it says critical care billing statement. And then it automatically adds it to their note. So, the data typed into this phrase, where you can see with the three asterisks would be where you put the billing time, is actually stored very cleanly in a data warehouse where we can extract that data.

However, we found two large issues with this approach. First, SMART phrase templates are just one of many documentation tools that physicians can use, that they're optional, they're not required, and many physicians don't use it. Many physicians like to write their own text in freehand. And secondly, we recently found that a lot of clinician workflow is doing the old copy and paste of notes from a previous, text from a previous note into a new note. This happens a lot when care is not changed from day to day with a patient and it saves them a lot of time and extra typing. But when this happens, the data that I showed in that asterisk in that phrase is no longer extractable. It is no longer put into that data warehouse.

Why not an LLM?

So, I know what a lot of data scientists in this room are probably thinking. This is just untextured, sorry, excuse me. This is unstructured text. Just use an LLM. We got asked this a lot by a lot of physicians. Can't you just use AI? What's this LLM? Well, we have a couple of limitations, especially, specifically to healthcare.

So, there are three main challenges we face when we thought about using an LLM. The first being cost. We all know that LLMs are not cheap, but they also have a large infrastructure cost. So, if you're trying to run an LLM on a daily basis, requires powerful GPUs, a dedicated cloud space, which is something maybe a large university, a pharmaceutical company or tech company could have and support, but not a small to medium-ish healthcare system.

Secondly, even if we somehow got the cost, both financially and infrastructurally, one of the issues is timeliness. Anything related to healthcare quality, so that means anything patient safety, or anything that could get audited by a government official or a regulatory branch, like as we saw earlier, the auditor coming for medical fraud, can come at any time, at any point of the year, and can come multiple times. So, we were given a turnaround time of about a week and a half. And there is not really a good feasible way in LLM we'd be able to train, validate in that timeframe to get ready for any potential auditing.

And finally, the biggest one is HIPAA compliance. So, for those of you who are not from the U.S., HIPAA is a federal law that protects patients' healthcare privacy and data that is taken very, very seriously. So, if we wanted to use an LLM that's already on market, like ChatGPT or Copilot, if we put healthcare data into that LLM, we would be violating HIPAA. And could have devastating consequences to a healthcare system. So, all in all, we scrapped the LLM.

And finally, the biggest one is HIPAA compliance. So, if we wanted to use an LLM that's already on market, like ChatGPT or Copilot, if we put healthcare data into that LLM, we would be violating HIPAA.

Building the NLP tool

So, because we scrapped three of our main tools, we decided, why don't we just make our own? If you want it done right, do it yourself.

So, how we built this tool. So, here's an example. Don't worry, healthcare people, this is fake data. We're not breaking HIPAA today. So, here's an example of what a typical hospital intensive care unit clinical note would look like. You can see it's all free text. It's incredibly messy. Tons of random numeric data.

Luckily, our EMR vendor that I talked about earlier stores all of this free text in a data warehouse, which we can extract via SQL code. And we can also get a lot of other important information, like who wrote the note and when. And so, now we have this messy free text data in a cleaner format. So, we can now see Meredith Gray, our favorite dramatic physician, wrote this note for this patient on New Year's Day.

So, we could theoretically take that SQL output, copy and paste it into an Excel file, and import it into R. But we're data people. We want things to be automated. We want to reduce the manual process as much as possible. So, instead, we utilized the odbc package to connect the SQL data warehouse directly to R.

And so, in the DB get query function down below, you can actually copy and paste your SQL query. And then, you can now directly import the data from the SQL data warehouse directly into R in one stop shop, one motion. No copy and pasting, no Excel needed.

So, the next step was to create a data set of keywords that we find in these notes that are specifically related to billing. So, we actually worked with our critical care clinicians. They looked at several notes of their colleagues and kind of tagged the most commonly used phrases or the phrases that we would need to have in a note for regulatory purposes that are required. And we put these into a keyword data set. So, I'm showing probably about the first 15 or so, but there's tons because there's so much variation in ways that clinicians can bill and put that phrasing in.

So, the reason we also include the minutes or min information is because when you saw in the free text note previously, there's lots of numeric values. There's telephone numbers, there's labs, there's vitals , there's PIC numbers, which is a physician number. And so, we wanna make sure that because we aren't using an LLM, we want the tool to be able to pull the number and know that the number is the number of minutes and not a blood pressure or a heart rate.

And so, finally, we coded the function that serves as our NLP, and we were actually able to use this with some very basic and ready-to-use and readily available tools from Posit. Tidyverse , Dipler, and stringr. That is all we needed to make a basic NLP tool for our purposes.

So, basically, what this does is that it first determines if there's a number present before, but also within one of the keyword or phrases, and extracts it into its own column. And if it recognizes that there's no number present before, that's also within the phrase. It'll see if there's a number that's afterwards, but still within that phrase. That was in the keyword data set.

We then apply this function to the entire clinical note data set. So, if we apply the results of our NLP, natural language processing tool, to our example, we can now see that there is a column called billing time that has correctly extracted the number of 35 minutes. And it knows that none of the other numbers in that free text were the billing time.

So, finally, we then created an outlier variable that we can identify potential documentation errors. So, at our practice, we found it would be pretty rare for a clinician to bill less than 10 minutes of time, or more than 100 minutes in a single note. So, we use these values as outlier thresholds. And then, if the billing time is outside of these thresholds, or if it doesn't exist, because that is another issue we want to identify, if there is no billing information in the note, the column is then marked as a one, and then zero otherwise, classic binary variable. So, this allows our end users, our clinicians, to filter the data to identify those outliers quickly.

Validation results

So, now a tool is only as good as its validation results. And since my background is in epidemiology, we, of course, had to conduct a retrospective study on all our notes. And so, we took all the notes written by critical care clinicians in the year 2024. So, then we wanted to see if, number one, if our tool even worked. And number two, if our tool performed better than what was our current tool available, which was that smart phrase method that I had mentioned earlier.

So, we analyzed a lot of different performance measures for our executives, but I think the most important one to highlight here is the row in the bold at the bottom. This is the percentage of notes where the billing time was successfully extracted or identified as not being there. So, our NLP tool, you can see, worked about 93% of notes, whereas that smart phrase method, it only worked on about 60%. And I want to also note that these are not small denominators. This is almost 285,000 notes, written by over 500 clinicians. So, you can see how big the impact of a functioning natural language processing tool is.

So, our NLP tool, you can see, worked about 93% of notes, whereas that smart phrase method, it only worked on about 60%. This is almost 285,000 notes, written by over 500 clinicians.