Resources

Elevating Public Health Decision-Making with R Packages (Kylie Ainslie, RIVM) | posit::conf(2025)

Elevating Public Health Decision-Making with R Packages Speaker(s): Kylie Ainslie Abstract: Supporting public health decisions in high-stakes environments requires transparency, reproducibility, and efficiency. Analyzing real-world health data with complex models helps policymakers mitigate infectious disease spread. Structuring projects as R packages provides a consistent framework that enhances organization, integrates documentation, and facilitates collaboration. This approach improves coding practices, ensures reproducibility, and enables seamless sharing of tools—empowering colleagues without the resources to develop their own. This talk will demonstrate how adopting R package structures can enhance workflows and impact without requiring advanced software development skills. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, so this may look familiar to some of you. It is what I lovingly refer to as the Project Folder Hellscape. Looking at it, it's chaotic. There's a lot of different folders with not very informative names. There's a lot of different files, a lot of different versions of the same file, and a lot of files that have really uninformative names, like MD7, I have no idea what that means.

And at one point or another, when doing this project, this was from my PhD, I asked myself, where is that script? And which file is current? And what was I thinking months ago?

And I'm sure some of you can relate to this. And so after I finished my PhD, and I moved on to a postdoc, I started to think, okay, how can I do this better? How can I move away from this chaotic Project Folder Hellscape into something more organized? And so what if I told you that I found that better way, that I could turn this chaos into a nice, organized file structure that I could use for every code-based project?

So you may ask, how? Well, by structuring your project as an R package.

And I don't mean from the software development perspective, where the primary goal of the project is to end up with a production-ready R package. I mean by using the actual file structure of an R packaging package, and leveraging the benefits of an R package to better organize, document, and ultimately share your code.

And so for the rest of this talk, I wanna walk you through how I've used this workflow, and I wanna provide you with examples of tangible impacts that it's had on my work. And so in order to give you a little bit of context for one of the project I've used this approach on, we need to travel back in time.

The COVID-19 context

Back to June of 2021 in the Netherlands. Now, I'm sorry, we all know what was happening in 2021. The COVID pandemic was in full swing. And the Netherlands had actually already experienced multiple waves of infections denoted here by the blue line, which shows daily new confirmed COVID-19 cases over time. But things were looking up. COVID-19 cases were decreasing, vaccination was ongoing in adults, and strict control measures were gradually being relaxed.

But the Dutch government was obviously concerned with how best to continue relaxing control measures while preventing future waves of infections. And one of the ways that they were thinking of doing this was by extending the vaccination campaign to adolescents and children.

And so this is where I come in. In my role, I was asked to do a modeling study to try to determine the impacts of extending the vaccination campaign to adolescents and children, and to try to tell them what impact that might have on disease outcomes like infections and hospitalizations. And so that is your context. So for the remainder of the talk, I'm gonna go through each of those steps of the R package structure, and I'm gonna keep coming back to this project and show you how using this workflow helped me perform this work.

Organize

So the first step is organize. So in a typical project, you have different types of files. You may have analysis scripts, data, maybe some R functions, and maybe some kind of documentation. And every person has different ways of organizing their projects. And even the same person may have different ways of organizing different projects. And that can make it very difficult to orient yourself within any given project.

But by using the R package structure, now all of a sudden you have a consistent way in which to organize every single project, and every type of file has its own place. So your function files will go in the R directory, your analysis scripts and data can go in the inst directory, and documentation can go in vignettes. And actually, when you build the package, R will yell at you if these are not in the right place.

And so with this organization comes more efficiency, which was paramount in my work during the pandemic because we did not have a lot of time to do this work. And so I was able to orient myself and work efficiently, which meant that I could update models rapidly as new data became available, which it did consistently. It meant that the analysis was delivered on time. And so ultimately, my work was actually used to directly inform public health policy decisions, and the Dutch government did ultimately decide to extend the vaccination campaign, which affected over a million citizens.

Document

But organization alone is not enough. We need to be able to understand how, what, and why we're doing aspects of a project. And so let's go back to that old file structure. So here we have a number of different documentation files, maybe a code guide, maybe some description of the methodology that the code is implementing, and maybe a final report.

And it can be very hard to have a separate text file in a repo that describes what the code is doing. I once inherited a project where I was given a repository of code and an academic manuscript. The code was uncommented, undocumented, and all I had to go off of was a method section in an academic manuscript. And if you've ever read an academic manuscript, they are notoriously concise. So I had a very small method section with which to try to decipher the code I was looking at. And it ended up, I had to end up rerunning all the code and looking at intermediate results to try to figure out what was going on. And that took a lot of time.

But with an R package, you have the advantage that documentation lives with the code. And I want to highlight two important features of R package documentation. The first will be the automatically generated help files, which gets stored in the man directory. And the second are vignettes.

So if you use the ROxygen, or ROxygen, not really sure, preamble above your function, you can include really important information about what that function is doing, like what that function does, expected inputs, expected outputs, and example usage. And a really nice piece of this is that this preamble lives with your function code. So you can write this as your coding. So it alleviates the problem of having to write a bunch of code and then go back a week, two weeks, a month later and try to remember what you did and describe it. So this way, you can document your code in real time.

The other very handy aspect of this preamble is that when you build the package, it creates automatically generated help files. And so now, you can access this information by running one line of code, the question mark and function name. So you don't have to go searching through some sort of project directory to find the information that you need in order to run a function or use a code chunk. It is easy to access.

And so you may actually find that by using these help files that you don't need that code guide anymore. That maybe you don't need a piece of documentation that you would in a more traditional project. And that's one less thing to have to update and manage.

And the second part of documentation that I wanna touch on is the vignettes. So I have only recently realized how powerful these vignettes can be. So you remember that project where I had a code base and a methods paragraph? With a vignette, you can actually incorporate text, code and results. So now all of a sudden, I can actually describe what a code chunk is doing, show the code chunk and then show the expected output. And that's hugely beneficial and it makes it much easier to try to document a whole analysis.

The other really nice feature of a vignette is that it can be version controlled in the same way as your code. So whereas if you have a Word file, that Word file is being tracked via words track changes and it's separate from your code. So it increases the likelihood that those two documents are gonna be out of sync. With a vignette, it can also be version controlled using GitHub like your code and so it just makes the likelihood of the documentation file and the code file becoming out of sync much lower. Can still happen, you still have to update your documentation, but it increases the chance that your documentation is current.

And so with the vignette, you may actually find that you don't need a separate methodology document and that maybe all you need is a final report.

Transparency and trust

And so with all this increased documentation comes transparency. And transparency is important in all projects and particularly those that inform decision making. And in the context of the COVID-19 vaccination, this meant that the policy decisions were being backed by transparent analysis and this became extremely important because we were audited. So the modeling work that I and my team did during the pandemic was audited by an international panel of experts and they asked, why did you do this at various points of times? What assumptions did you make? Why did you make those? And because I used this approach, my work was traceable. I could go back and say, I made that decision because that was the information that we had at that time and then we got new information on and on and on.

A second feature was that the public could understand the scientific basis for the decisions that were being made which was really, really critical in a global emergency because governments were asking their citizens to make extraordinary changes to their lives and it was very frustrating for people to not understand why they were being asked to do that and just having to rely on the model. And so this ultimately helped build more public trust and you can think about the public in this sense as my ultimate stakeholder. And so imagine how this could build trust with whatever stakeholders you're working with.

And because I used this approach, my work was traceable. I could go back and say, I made that decision because that was the information that we had at that time and then we got new information on and on and on.

Sharing and collaboration

And so the first two features of this R package workflow focus mainly on your individual workflow, how you move through a project. But what I think makes the R package really powerful is that when you combine it with some sort of file sharing platform, whether that's GitHub or a shared network drive or something, then all of a sudden you have collaborative magic.

Because now all of a sudden you have this beautiful, organized, standardized, documented, for lack of a better word, package, that enables you to install it easily. So rather than having to source a bunch of separate scripts, you can install a package in one line of code. You also have a way to have seamless handovers. So it becomes very easy for you to onboard a new person or even for yourself to come back to a project after six months.

Another important feature is that you can create reusable tools that you and your colleagues and other people at your organization can use. So I found out in my team that actually me and two other colleagues were all trying to use the same method and we were independently coding that. But with this approach, someone can code it one time and we can all use it. And this is also really important in infectious disease work because I realized that I'm lucky enough to have the resources to build these tools. That is not the case everywhere. And it's very important that everyone has the capability and the resources and the tools with which to analyze their own data.

And finally, it facilitates collaboration. So it's one thing to have a conversation with someone outside your organization or someone on another team about things you're working on, but it changes the game when you're able to show them what you're working on and generate ideas for how you can either work on the current project or extend it later.

And so with all of this sharing capability comes reproducibility, which is a central principle in scientific research. And how this manifested during the COVID modeling was that because I use this approach, my model code was actually selected to be included in a separate R package called Epidemics, which serves as a transmission model library. So they were able to take my code, make it better, and include it in this other package for others to use. Another thing was that I was able to be involved in a European-wide effort to provide modeling results to help inform COVID public health policy. And that would have been much more difficult if I hadn't used this approach. And so ultimately, one national analysis has now become a piece of international infrastructure that will live beyond the work that I did.

And so ultimately, one national analysis has now become a piece of international infrastructure that will live beyond the work that I did.

Getting started

And so now it's your turn. I really encourage you to try using an R package structure for your next project. And if you're a little intimidated, you've never made an R package, there are some wonderful tools and learning materials available. And I want to point to two. So one is the uses package, which actually makes creating the R package structure really, really easy. And then the other is the R packages book, which gives you details about every step of the process, including some that I haven't even mentioned like testing.

And I promise you'll never go back to messy folders. And I'm also happy to announce that after years of sort of using this approach and iteratively improving, that I've just had my first package accepted to CRAN. And I don't know that that would have been possible without this. So with that, I would be happy to take your questions and please feel free to get in touch.

Q&A

Thank you so much, Kylie. We have a few questions here. So the first one is, are there any parts of the package workflow you've needed to work around for this use case to work well for you?

So I'm gonna answer a slightly different question in a way to answer that question. So one of the things that I think a lot of people think of when they start thinking about R packages and documentation is that it costs time. It costs time in order to do these different bits that you may not invest in a different workflow. And so there were times where we maybe had to focus more on just getting the analysis results and kind of tackle the documentation later because we were under such tight timelines. But ideally you'd be able to do both kind of together.

Yeah, we have quite a few here. So I'm gonna get through them. So how do you deal with multiple versions of data or programs?

So the way that I dealt with it, because like I said, we got new data all the time. And I actually kept copies of the data, different versions of the data because I needed to be able to reproduce the model results at a given time. So I actually kept them in a data directory in the package.

Totally makes sense. So do you have any recommendations for implementing a shared package development workflow when PHI is a factor and online tools like GitHub aren't allowed?

Can you define PHI? I wish I could. It is a private health information. Oh, private health. Yeah, so that is one thing that I didn't mention that obviously some of this can't be shared publicly. Many of us work on projects where you can't share this. In fact, I couldn't share this work publicly at first. It had to be shown to the government and approved and then it could be shared. But I actually, my team works on GitLab, which is in a secure environment. And so this works fine within a secure environment. As long as the team members are able to access some sort of shared environment, it works really well.

What would you have done differently now after seeing the European consortium changes to your code?

Oh, I would have done a ton of things differently. I would have changed how I designed the whole model, but I had about two weeks to create it. And so it was just kind of like, all right, just put something that works. And that's funny because that was also a question asked by the auditor. But yeah, I would have made it more efficient, but that's just, it was a fact of the situation.

How does your workflow work with computationally heavy simulations?

Yeah, that's a good one. So I don't actually do a lot of computationally intensive simulations. I was able to get it quite quick. But I think what you can do is if you have the resources to use some sort of high performance computing, you can have the functionality within the package and then you just run those simulations elsewhere. But I don't think this would prevent you from having the code in which to do something that's computationally intensive.

Do you have any recommendations for getting teammates on board with this very different way of setting up projects, especially with folks who are already resistant to putting time into documentation?

Yes. So I feel like this was asked by one of my colleagues. I have struggled a lot because I think they're tired of me talking about this at work, but I'm finally making headway. And basically I keep sort of showing them the benefits of this and highlighting pain points when it's not done and in gentle ways, not like yelling at people. But I do think showing how easy it can be and how ultimately it can save time, because that's always a consideration that, oh, no, it's more work to make documentation. But in the end, it will actually save you time and make it easier for work to be done.

Amazing. Thank you so much. Thank you. Thank you.