Resources

From Data to Dollars: Improving Medical Billing Accuracy Using NLP (Julianne Gent, Emory Healthcare)

Protecting our Healthcare Heroes: Using Natural Language Processing to Prevent Billing Mistakes in Healthcare Speaker(s): Julianne Gent Abstract: Maintaining accurate billing documentation in healthcare is essential to prevent revenue loss and preserve patient satisfaction. I’m Julianne Gent, Analytics Developer for Emory Digital, and I’m here to discuss the natural language processing algorithm we built utilizing an automated SQL-to-R pipeline. This algorithm uses packages ‘odbc’ and ‘stringr’ to import SQL queries into R, recognize billing patterns, and extract billing time. Our algorithm accurately captured billing data for 93% of over 250,000 notes. The billing provided by our hospital’s medical software? Only 40%. Our algorithm showed that an SQL-to-R pipeline can improve billing documentation and accuracy, and we are confident that it can be applied to many other industries. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All right, good afternoon, everybody. Hope everybody enjoyed Atlanta's King of Pops. It's a very large stateful here, so I'm really glad those of you who are out of town got to enjoy it.

My name is Julianne Gent, and today I'm gonna be talking about some ways that we are protecting some of our frontline healthcare heroes in regards to specifically billing.

So imagine, if you will, with me. You are an intensive care physician. You see and take care of the sickest patients in the hospital. You are at the end of the third consecutive 12-hour shift, and half of the people in your unit needed some sort of life-saving treatment just today. You then suddenly get a call from your hospital's billing department, and the manager says that they received a complaint from one of your previous patients saying that their insurance is denying your care and that they must pay for their entire hospital visit out of pocket.

Then, on cue, you get a call from your director's office where an auditor from the Center for Medicaid and Medicare Services is sitting. They inform you that they've reviewed several of your clinical notes and are accusing you of medical fraud.

They actually show you the note in question which is related to the same patient who had called billing earlier. Here's the note here, and all of this happened simply because you added an extra zero on accident in your long documentation claiming that you were billing for 300 minutes instead of 30.

So, this might seem like a very exaggerated situation, but this can easily happen to our overworked and burned-out clinicians where these simple clerical errors could lose the patients, their jobs, and their reputation. So, seeing this happen at other healthcare entities and foreseeing that this may happen to us in the future, our executive leadership came to our team with the following question.

Can we help clinicians maintain accurate billing documentation while meeting them where they're at? Basically, how can we help physicians do their job better without increasing their already overburdened workload?

Evaluating solutions

So, of course, our office said, of course we can. So, we actually considered a couple of solutions.

The first was to require clinicians to review every single note they've written that day and double-check for billing accuracy. Well, we just talked about that we need to meet physicians where they're at. We're not gonna add yet another tedious task to their already long stack of tasks to do and take away from patient care.

So, we're gonna go ahead and eliminate that. The second one we looked at was called a SMART phrase, and it is a functionality that is provided by our electronic medical record or EMR software vendor. So, basically what this does, it allows clinicians to type in the name of a template that consists of a lot of commonly used phrases. For example, here it says critical care billing statement. And then it automatically adds it to their note. So, the data typed into this phrase, where you can see with the three asterisks would be where you put the billing time, is actually stored very cleanly in a data warehouse where we can extract that data.

However, we found two large issues with this approach. First, SMART phrase templates are just one of many documentation tools that physicians can use, that they're optional, they're not required, and many physicians don't use it. Many physicians like to write their own text in freehand. And secondly, we recently found that a lot of clinician workflow is doing the old copy and paste of notes from a previous, text from a previous note into a new note. This happens a lot when care is not changed from day to day with a patient and it saves them a lot of time and extra typing. But when this happens, the data that I showed in that asterisk in that phrase is no longer extractable. It is no longer put into that data warehouse.

Why not an LLM?

So, I know what a lot of data scientists in this room are probably thinking. This is just untextured, sorry, excuse me. This is unstructured text. Just use an LLM. We got asked this a lot by a lot of physicians. Can't you just use AI? What's this LLM? Well, we have a couple of limitations, especially, specifically to healthcare.

So, there are three main challenges we face when we thought about using an LLM. The first being cost. We all know that LLMs are not cheap, but they also have a large infrastructure cost. So, if you're trying to run an LLM on a daily basis, requires powerful GPUs, a dedicated cloud space, which is something maybe a large university, a pharmaceutical company or tech company could have and support, but not a small to medium-ish healthcare system.

Secondly, even if we somehow got the cost, both financially and infrastructurally, one of the issues is timeliness. Anything related to healthcare quality, so that means anything patient safety, or anything that could get audited by a government official or a regulatory branch, like as we saw earlier, the auditor coming for medical fraud, can come at any time, at any point of the year, and can come multiple times. So, we were given a turnaround time of about a week and a half. And there is not really a good feasible way in LLM we'd be able to train, validate in that timeframe to get ready for any potential auditing.

And finally, the biggest one is HIPAA compliance. So, for those of you who are not from the U.S., HIPAA is a federal law that protects patients' healthcare privacy and data that is taken very, very seriously. So, if we wanted to use an LLM that's already on market, like ChatGPT or Copilot, if we put healthcare data into that LLM, we would be violating HIPAA. And could have devastating consequences to a healthcare system. So, all in all, we scrapped the LLM.

And finally, the biggest one is HIPAA compliance. So, if we wanted to use an LLM that's already on market, like ChatGPT or Copilot, if we put healthcare data into that LLM, we would be violating HIPAA.

Building the NLP tool

So, because we scrapped three of our main tools, we decided, why don't we just make our own? If you want it done right, do it yourself.

So, how we built this tool. So, here's an example. Don't worry, healthcare people, this is fake data. We're not breaking HIPAA today. So, here's an example of what a typical hospital intensive care unit clinical note would look like. You can see it's all free text. It's incredibly messy. Tons of random numeric data.

Luckily, our EMR vendor that I talked about earlier stores all of this free text in a data warehouse, which we can extract via SQL code. And we can also get a lot of other important information, like who wrote the note and when. And so, now we have this messy free text data in a cleaner format. So, we can now see Meredith Gray, our favorite dramatic physician, wrote this note for this patient on New Year's Day.

So, we could theoretically take that SQL output, copy and paste it into an Excel file, and import it into R. But we're data people. We want things to be automated. We want to reduce the manual process as much as possible. So, instead, we utilized the odbc package to connect the SQL data warehouse directly to R.

And so, in the DB get query function down below, you can actually copy and paste your SQL query. And then, you can now directly import the data from the SQL data warehouse directly into R in one stop shop, one motion. No copy and pasting, no Excel needed.

So, the next step was to create a data set of keywords that we find in these notes that are specifically related to billing. So, we actually worked with our critical care clinicians. They looked at several notes of their colleagues and kind of tagged the most commonly used phrases or the phrases that we would need to have in a note for regulatory purposes that are required. And we put these into a keyword data set. So, I'm showing probably about the first 15 or so, but there's tons because there's so much variation in ways that clinicians can bill and put that phrasing in.

So, the reason we also include the minutes or min information is because when you saw in the free text note previously, there's lots of numeric values. There's telephone numbers, there's labs, there's vitals, there's PIC numbers, which is a physician number. And so, we wanna make sure that because we aren't using an LLM, we want the tool to be able to pull the number and know that the number is the number of minutes and not a blood pressure or a heart rate.

And so, finally, we coded the function that serves as our NLP, and we were actually able to use this with some very basic and ready-to-use and readily available tools from Posit. Tidyverse, Dipler, and stringr. That is all we needed to make a basic NLP tool for our purposes.

So, basically, what this does is that it first determines if there's a number present before, but also within one of the keyword or phrases, and extracts it into its own column. And if it recognizes that there's no number present before, that's also within the phrase. It'll see if there's a number that's afterwards, but still within that phrase. That was in the keyword data set.

We then apply this function to the entire clinical note data set. So, if we apply the results of our NLP, natural language processing tool, to our example, we can now see that there is a column called billing time that has correctly extracted the number of 35 minutes. And it knows that none of the other numbers in that free text were the billing time.

So, finally, we then created an outlier variable that we can identify potential documentation errors. So, at our practice, we found it would be pretty rare for a clinician to bill less than 10 minutes of time, or more than 100 minutes in a single note. So, we use these values as outlier thresholds. And then, if the billing time is outside of these thresholds, or if it doesn't exist, because that is another issue we want to identify, if there is no billing information in the note, the column is then marked as a one, and then zero otherwise, classic binary variable. So, this allows our end users, our clinicians, to filter the data to identify those outliers quickly.

Validation results

So, now a tool is only as good as its validation results. And since my background is in epidemiology, we, of course, had to conduct a retrospective study on all our notes. And so, we took all the notes written by critical care clinicians in the year 2024. So, then we wanted to see if, number one, if our tool even worked. And number two, if our tool performed better than what was our current tool available, which was that smart phrase method that I had mentioned earlier.

So, we analyzed a lot of different performance measures for our executives, but I think the most important one to highlight here is the row in the bold at the bottom. This is the percentage of notes where the billing time was successfully extracted or identified as not being there. So, our NLP tool, you can see, worked about 93% of notes, whereas that smart phrase method, it only worked on about 60%. And I want to also note that these are not small denominators. This is almost 285,000 notes, written by over 500 clinicians. So, you can see how big the impact of a functioning natural language processing tool is.

So, our NLP tool, you can see, worked about 93% of notes, whereas that smart phrase method, it only worked on about 60%. This is almost 285,000 notes, written by over 500 clinicians.

Monthly process and future plans

So, how do we currently use this tool? So, following our successful validation, for the most part, we still are in the process of continuous validation, as I will point out here. The tool actually went live in February 2025. So, here's our monthly process. We first apply the tool to the previous month's clinical notes.

Then the new data is sent to critical care leadership for manual review. Sometimes they like to look at some certain outliers, maybe if a patient is really complex, sometimes that justifies a very high billing time, and so they just want to make sure that everything is up to snuff.

So, then critical care leadership will notify the clinicians if they have notes that are within those outliers. And if they find that the physician hasn't made that many errors, or if the errors are pretty minor, maybe they put 40 instead of 400, or something kind of obvious, the clinicians will then be notified and the documentations are fixed. However, if we find that there's maybe a physician or two that is constantly missing documentation, or if they're constantly making very inaccurate statements all of the time, they then get flagged for further investigation.

And then finally, critical care leadership notifies our team if they have found any valuable uncaptured phrases related to billing text that we can then add back to our keyword data set, and the process continues.

So, the last part I want to talk about is our expansions and our future analytics plans. So, we want to actually expand this to other specialties. Right now, it is only in critical care, but we got asked to expand the tool to surgical services to use for their operation or procedure notes. So, we've been trialing an NLP tool for that instance that has had pretty similar success as our previous NLP tool.

And we also like to run some additional analytics to see if people are even correcting their notes. You know, we create this tool for physicians, but we actually want to know if it's being used, if it's valuable, if it's just extra manual work for them, or if it's going to require some additional education or other processes.

So, in short, our team utilized currently available POSIT tools to create a natural language processing tool that successfully identified billing mistakes in clinical notes. And we also proved that this tool could be developed with readily available and cost-effective tools that can be reproducible among organizations where access to LLM technology is limited. So, now I feel like our clinicians don't have to worry so much about whether they're making accidental keystrokes and whether or not that's going to affect their medical license or causing their patients financial harm, and they can go back to doing what they do best, keeping us safe and healthy. Thank you, everyone. I'd like to open the floor for questions.

Q&A

Thank you, Julianne. We have the questions from Slido. I'll ask. So, this seems from somebody who is medically literate. You can either request a BAA from OpenAI for HIPAA-compliant LLM calls or use a cloud provider like GCP and Azure. You can look at the slides and see if there's any questions you'd like to ask. You can then get a BAA for HIPAA-compliant from the cloud provider. In fact, cloud providers offer quite a few healthcare offerings specifically for this purpose. I guess the question is, have you considered this?

Yes, we definitely have. And I'm sure Emory Healthcare is in the process of trying to get those things. So, we had just switched medical softwares within the last two years or so, and so we're still trying to learn what our medical software capabilities are, as well as developing. We now have a team called Emory Digital, where they're now going to be talking more about AI and LLM projects. It's all just very, very new. So, I'm sure that's in the pipeline, but for this time and for the timeliness where we had to get this done as soon as possible in case a Center for Medicaid and Medicare Service audit came around, we had to be a little bit quicker and not have to wait for that.

This one I find interesting myself as well. So, what kind of notes or phrases did you see causing the errors in the 7% that the tool did not identify correctly?

Yeah, I mean, so some of them range from just some misspellings, like in adding an extra I to critical, which is where the LLM would have been really useful. They could have gotten some of those easy misspellings. Some of them would just be just time, parentheses, 15 spaces, and then brackets, smiley face, IDK how long I build, maybe 10, question mark. So, yeah, we didn't add that line specifically. I think we actually went to that physician and went, you got to refine it a little bit more, but we saw a lot of variation.

So, the results from that data table were from February, and so it's now September. We have been adding phrases into that keyword generator each time that we've run this, and we've gotten to the point now where we've gone, well, there are no more that we can really add, unless there's those crazy ones that just require telling the physician maybe don't document that way, but it's now even more refined than that 93%.

Thank you. This is a classic one, because this is more of a comment than a question, but the person wanted to say that they're very pleased to see classic NLP being used rather than the LLMs when the LLMs are not necessary, so I think that is a hats off for you.

All right, next question is, are there other kinds of data validation you think this could be applied to in healthcare billing?

Oh, tons. I think one of the other side projects that our team is working on is that there's actually different subsets of billing. You can bill for critical care, or you can bill for what's just called evaluation and management, so basically if you are seeing a patient who is super, super sick, you get reimbursed more by Center for Medicaid and Medicare versus if you're just seeing a patient just for routine or for physicals, you get paid a different way, and so we've actually been refining the NLP tool to actually say, is this critical care billing or is this non-critical care billing so that we can really make sure if you're an ICU physician, you're billing for critical, and that if you are a primary care doctor, you are not billing for critical care because that is also a legality issue, so yeah, tons of expansions we have planned.

All right, Julianne, thank you so much. Thank you.