Instant Impact: Developing {docorator} to Simplify R Adoption for Teams (Becca Krouse, GSK)

Transcript#

This transcript was generated automatically and may contain errors.

Okay. Thank you all for being here today. I'm excited to share an R package with you all that we've been working on called Docorator.

So I'll start by saying that this presentation is my personal opinion and not that of my organization.

So in my house lately, my son has taken a liking to home renovation shows, and especially this one called Fixer Upper. I don't know if you guys have seen it before, but if you haven't, the premise is that the hosts, Chip and Joanna, they work with a different set of clients in every episode, and they help them pick out a house that's pretty worse for wear, and they spend the episode redesigning, giving it a facelift, gutting out some pieces of it until it becomes a total dream home in the end. And it's kind of addicting, even though pretty much every episode has the same storyline with that transformation.

We wrapped it all up, we took those bits and pieces, the R Markdown in Quarto, the fancy header, the gt LaTeX conversion, all of those pieces could come together and live in a package.

So in practice, it looks something like this, where we have our original R script, and we can, with a couple extra lines of code, pass or pipe our display right into the doc writer function calls and get a PDF on the other side.

If we look a little closer at these function calls, we have two here. There's an as doc writer function and a render PDF. So as doc writer holds everything about our display. The display itself, that render gt, as well as important metadata, like the headers and footers and display name and sizing things I don't show right here, but lots of key metadata that gets pulled together and then passed to the rendering engine.

Headers, footers, and flexibility

Because headers and footers are so important, I wanted to zoom in even closer. Here is our original display, just blown up a little bit, so we can see the headers more easily. So we have a couple lines of headers here that we can specify through the header argument and doc writer, and through the header argument and doc writer, we use a fancy head function, and this is totally an interface to the fancy header LaTeX library. And fancy head takes a series of fancy rows as an argument, so row by row in the header, you can say, what's my left, center, and right? And so here, we want something on the left, the study name, we want some automatic page numbering on the right through our little helper. In the second row, we just want something on the left, the population, so we leave everything else empty. And then in the final row, we want to put our centered table title.

And so users can do this for the footer as well, it's a footer argument and ask doc writer, and a fancy foot function instead, but all the row, fancy row stuff is the same, and they can just add in as many rows as they want within reason, because it will start to kind of encroach on the display area.

And this kind of flexibility is super important. We know that our user, when we went out to build the solution, we knew there was gonna be a lot of different variations in formatting coming at the tool, and if you've ever built a tool and rolled it out before, you probably know that people will start to throw things at it that you didn't expect, and it could break easily, so we wanted to just avoid that situation as best we could. When we first did a release, we tried to cover the essentials, but make it easy to adapt as people gave us feedback, and we really tried to listen to them early and often.

And something that has come up, and we tried to get ahead of it quickly, is accommodating lots and lots of data, lots of columns, many pages. There's a lot of data to display, and so we added some automated scaling for gt tables in particular. If a lot of columns are coming through, we'll try to help make sure everything fits to the width on the page, and also help with some proportional sizing of the left group and row data, values that tend to be very text-heavy, and then the numeric values in the rest of the columns, just try to help kind of even things out. If a user needs finer control, they can certainly do that through gt directly, and Docorator will kind of assess the situation, and will spit out some messages if things look a little, look like they might be a little bit of a weird result. So we hope to build this out even further, just to keep it nice and user-friendly.

Evolving the framework

We also realized pretty early on that our framework was generalizable, and useful in other scenarios for other document types. So we care a lot about PDFs, but there's other things that need to be made, not just for us, but for others as well, and so we started with just a render PDF function that was kind of all Docorator did in the beginning, and we decided to break it into pieces, and that's where the asDocorator function came to be. So we split up Docorator creation, rendering, and produced an intermediate Docorator object, and optionally a permanent file as well. And so that holds the display itself, and all of that important metadata that can be picked up and reused in different templates. So we have a couple templates now, the original PDF, the RTF, which is kind of very basic compared to the PDF, but it's in a position now, the framework, that it could expand to future output types, just need the template for it.

The ability to evolve is really, really important as well, and so we wanna be able to take our framework and adapt it as technologies change, as new things come in, much like if you're doing the kitchen part of your home renovation, you might make certain aesthetic choices today that you, you know, the white countertops and cabinets and everything. Five, 10 years from now, that may not be what you want anymore, there may be totally different trends. And so with Quarto in particular, we're well-poised to adapt to these trends.

Something like Docorator for Python may be useful one day, and so it's an easier transition to building something like that. And, you know, today we're using LaTeX, which is a little bit finicky with the syntax, it comes with the burden of system dependencies as well, so like Keaton, we're looking forward to the capabilities of types, it's a little bit of early days with gt, and because that's so important to us, we're not quite there, but, you know, looking forward to that potentially being a better option down the line, and having, you know, just a smoother maintenance process as well.

And something I'm pretty excited about with types is branding, you know, I've talked up a lot about how we care about that traditional appearance, but hopefully that will evolve as well, right? And so with branding, we can take things like our fancy dashboards, and we can do something similar between our PDFs and our dashboards and our websites and things like that, and maybe one day our stakeholders will be very excited for pumpkin spice season, and they'll wanna do something fun like this.

But all that to say, Docorator has been kind of the missing piece of the puzzle, the final proof of confidence that R can really be used end to end and create that final output. So it's been very, very useful for us, and, you know, we noticed the gap in open source, and so it was important for us to fill it back into the open source. You can find Docorator on our GitHub page, here's the website, we're working on a CRAN submission now, and likely a Pharmaverse submission as well to contribute that back. So thank you all, and happy to answer any questions.

Docorator has been kind of the missing piece of the puzzle, the final proof of confidence that R can really be used end to end and create that final output.

It will just print page by page. And you can also fuss with the bigger dimensions through Docorator.

Then, is the goal of Docorator to get Pharma to use R and eventually switch to a different reporting tools? Well, it helps with R adoption, certainly. So, you know, like I said, it's that missing piece and the end to end. We talk about end to end, we did an end to end workshop yesterday, about like, here's the pieces and putting them all together and five or so years ago, that was pretty, pretty early on. And I don't think we had all those pieces. So it's exciting to have all the evolution.