Resources

Quarto for Business Collaboration & Technical Documentation in Word docx format (Bill Pikounis, J&J)

Quarto for Business Collaboration and Technical Documentation in Word docx format Speaker(s): Bill Pikounis Abstract: Microsoft Word documents have remained a critical channel of statistical evidence and influence for the manufacturing of a safe and effective supply of therapies to treat diseases. The incorporation of statistical content – narratives, graphs, and tables – into health authority dossiers worldwide requires speed in terms of days and sometimes hours to generate statistical source content for decision-making and official documentation. Quarto provides an efficient solution to address these needs. This presentation illustrates and covers concepts of the solution that builds upon R and the Posit platform to reliably produce an automated and flexible workflow for figure and table captions, autonumbering, and cross-referencing in docx format. posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Very good. Thank you, everyone. It's wonderful to be here. Hello, those of you on camera. Hello, everybody in the room. I won't recite my title because I felt it was kind of long, but I couldn't find any way to shorten it. So the silhouette at the top of the image here is a patient, a patient who has cancer. So we start with that. And I'm very fortunate to be a small part of a team at Johnson & Johnson that works in a called cell therapy platforms. And we have a we have a product now that's been on the market for about three years.

And when you see a patient that has cancer, a common type, of course, is solid tumors. They may have bladder cancer, they may have liver cancer, they may have lung cancer, and so on. And of course, a common worry about that is it's going to metastasize. And of course, that causes a lot of pain for a patient as well. It's all cancers are pretty terrible. But there's another kind of cancer called blood cancer. And that's what we're going to focus on today in this slide. And that's already all over your body. Anywhere there's blood in your body, there are going to be cells and some particular types of blood cancers are lymphoma. Another kind is multiple myeloma, which is the product that I work on treats. And the way this works, as far as treatment goes now, is the blood is taken from the patient first.

And then the T cells are harvested out of that. So that's a there's a lot of complicated steps to this process, even though it may look complicated itself, there's a lot more going on. And what happens after those T cells are taken out is just some amazing engineering goes on. We take those T cells, or the engineers and scientists that work on this and manufacturing take these T cells, and they supercharge them. And of course, T cells come from your thymus. They are your protection against all sorts of things, infections, cancers, and so on. And they are working constantly. And they are produced by the millions every day in your body. And we're very fortunate to have them for a healthy immune system.

The acronym there for GMP is good manufacturing process or practice. We won't talk very much about that except to say, when we have those T cells taken out of a diseased patient's body, we want to give some genetic instructions to them. And those genetic instructions will help in supercharging those already very capable and armed force types of T cells to help protect and attack the cancer. So they're very specific. With the instructions, you can make these T cells grow what's called chimeric antigen receptors or CAR T. So type of T cell, type of white blood cell. And they will find those multiple myeloma cells that are coursing through your blood, and they will bind to them and they will attack them and destroy them.

And the hope with this kind of treatment now, a platform that's been emerging now for about the past 10 years or so, the research goes way behind that, is that it will only have to be given one time. And for multiple myeloma patients, when this product that I work on got first approved, it was what's called a fourth line therapy. So that means they've tried other things. And other things may be very successful for months, for years. There's been incredible since, say, the 1970s, you know, doubling in years in terms of survival. But ultimately, you can't get rid of multiple myeloma. We don't want to use the word cure because that would imply that it will never come back again. In some patients it doesn't, but in a lot of cases it does.

So the T cells, once they are supercharged, they are also given instructions to grow. So this is very complicated. They've got to find something. They have to attack it, kill it. They also have to multiply inside the body after being engineered this way. So where do I come in? I'm a statistician, classically trained. So this quality control part is very important. There's a lot of data that's collected. There's also this notion in manufacturing of specifications. So a lot of software development people in the audience, either like me, an accidental programmer, or you do it really for a living, and you're very good at it. But what we're trying to do here is make sure that we have specifications when we manufacture a batch from the patient do not go outside of those specifications. Those specifications talk about potency. They talk about impurities. There's all kinds of things. It's biology. There's all kinds of things that go wrong when we're trying to control cells and make them grow and do certain things. Then it goes back into the patient.

The statistician's role and the Word document constraint

So as a statistician, I have to fit models to data. I have to evaluate them. I have to do a lot of intervals that predict what's going to happen to help set these specifications, also to make sure when we run out of specification or making changes, we don't increase risks and so on. A lot of times, of course, the patients are at the center of everything, but I do work for a very, very large company, and there are economic pressures all the time. So when I talk about speedy delivery, I wasn't talking about the last slide. I was talking about in my own case and a lot of colleagues that I have that there are asks that are not only important, but they have to be turned around very quickly, and they have to be done well. This is health care at stake here.

So it's a very regular environment. That is a great thing. This is very important. We're talking about disease patients here. So FDA is, of course, Food and Drug Administration. EMA is the European Medicines Agency. Those are the two major ones, but there are regulatory bodies all over the world. And often my requests come like this. This is why I call it a repeated situation. Bill, this is great. You've got the report. I'll talk about the reports that I generate in a moment, but we've got a data change. QC looked at it, and we have to change this value and maybe another value here, or we want to add this extra data here and so on. So putting aside whether it's valid to add that data or not, I have to rerun things.

And I was listening to James quite a bit. I have a PowerPoint deck here, but I never want to actually create PowerPoint decks when I'm communicating results to them. I haven't graduated to producing shiny apps. I think that's a wonderful idea. But I will write Word documents instead, and nobody complains when I show them on a screen, because usually I try to look at, say, a table I've created inside the report or a graph, more importantly. So I get this constant ask about rerunning things and trying to do this quickly. Now I'm a team player, or I try to be, so it makes no use for me to try to complain that I can't do this in a day or two days or so. I've got a lot of other things I need to worry about as well.

So I call this a constraint. I thought about using the word beautiful constraint, and even though I'm an optimist, I thought that was a little too strong. I'll call it a tolerable constraint. So I work in a corporate environment again, so Microsoft Word, Microsoft Office is ubiquitous and probably always will be as far as my career is concerned. But it has benefits. Everybody's comfortable in using it. Everybody's using the same version. When it comes to the point to share my results, they have to be reviewed. The team gets together and looks at things, or they may need to use the document to put into their documentation as well. And it has great collaboration features.

And lots of you, of course, know about version control and Git, and even before that, if I'm showing my age, SVN. But now everybody has an understanding that we only want to use one copy in one place for a document to try to edit it and get comments and refine it and so on. So for collaboration and reviews, it's very good. And then ultimately, the document has to end up in a repository, an official repository, because other than that, it might end up on somebody's computer and they leave the company. It might even end up on a shared drive and nobody can find it. IT makes a migration and it gets all lost and things like that. So luckily, we do have, and I think most companies do in the regulated space, have an official document repository. So we have to be compliant with that.

What's needed from Word documents

So what do I really need when I'm producing Word documents? I like to say I don't want to deal with a Word document until I've created it. And I'll just show a demo in a couple of minutes about that. But here are three things. Captions, okay? I want to italicize my figure caption. I don't want to italicize my table caption. I want to auto number them. And those of you who may have used Word, it has those functions built in, but never matured enough to the point where I was comfortable only doing that. Plus, I have to click menus, which I try to avoid. And then I want to cross-reference. If I want to say in the text, figure one, table one, we'll see examples of that.

Now, just a little bit technical here. I've used Quarto for this. I've used R Markdown before. And I also really love to use this package called FlexTable. There's a gentleman in France that has built this for years. He's built the Officer package. It's really specifically designed to work for our Microsoft Office. So it'll work with Excel, PowerPoint, and so on. So I really like this package and built up a lot of comfort with it and skill with it and so on. And it's not very compatible with Quarto. Now, I don't want to make this sound like a complaint for the Quarto team for my needs. All I'm doing is customizing this.

All right. A couple other solutions before we go to a demo. I want to write once and fast. At least that's the goal. You know, I've talked about once I share it, there's going to have to be going back in to write, to do edits and things like that. But when I want to have to create a new graph or create a new table, I can do that all in R and populate the document or correct it. And then I still think the word magic was used before. I think even now, when I push that render button and something opens, it's still pretty magical to me to do that.

I think even now, when I push that render button and something opens, it's still pretty magical to me to do that.

Demo walkthrough

So let's see a quick demo. This will only go about 45 seconds. I just wanted to organize things, so I put in a package that I pretty much use myself. There are some other colleagues who use it. There's a simple menu there for templates. Only one template is there now. Truvault, don't worry about that. That's just a branding name that we use inside J&J. It populates the window with a markdown document and a template, and then it will go ahead and, of course, render it in Word. And then I can go over to my nice file explorer window inside RStudio and probably sometime soon Positron and open up the document. And, of course, it's got field codes and words that Pandoc is using, and this will be the document that it creates.

So maybe nothing so special about that, but I'm glad at the point when perhaps I've worked all day and I've got to still do some writing, and the oxygen and the glucose in my brain is down to zero, and I can actually just press this button and have the magic document appear. So let's step through this a little bit one by one. This is just a quick snapshot of how it starts, just launching something, and then you can choose a menu. That menu could have other templates there. And then we go to what the document kind of looks like. This is just a function that's built into to actually illustrate it, which is what you just saw in the demo.

And because of Pandoc and the way Quarto and Markdown are all set up, it's very nice. You can do this in the YAML for the table of contents header. It works very well with Word. I can refer to figure one in the text, and then, of course, it'll totally number it in terms of figure two. Figure two is at the bottom, the caption in italics. Table one is not italicized and on the top. So these are just the kinds of things that I needed there.

Figure and table captions in code

All right. And so just a couple of steps. If I really want to create a figure caption, I just have a function in this package that says create figure caption. Now, again, I know Quarto has nice hashes and colons and everything to do this kind of creation, and you can put something in the R block, the R chunk options, but those weren't working for me. And, of course, I came up with the solution a couple of years ago. I came up with it before Quarto even existed, or at least to all of us that it existed out there in the external world. So all this is is source data profile is the name of a graph object, ggplot2, of course, and then I want to create a figure caption. I want it to be below the graph, so I just have to put it in that order, and source data profile will just be the caption that gets generated in the Word document, and then that's the reference label that I have. So then an inline in the text of the document, I can just refer to that document. It will automatically say figure 1 or figure 2, whatever place it is in the document.

And then for tables, again, I like to have the tables with the caption on top, so that comes first. I have a caption that I'm going to use for the table and the first argument of the function, and then I have the reference for it that I can refer to in the bottom there in the actual text of the document.

So I mostly wrote this for myself. I'll be a little bit selfish about this. I tell this to my colleagues all the time. I'm a little later in my career. I do love sharing things. Up until January, I was actually a manager of a pretty large staff. Now I'm back to being an individual contributor by choice and by endorsement by my company, which I'm happy about. And so, you know, anybody who's developed an R package knows this structure, right, and there's a very nice thing with the install folder when you're building the package that you can place any kind of folder somewhere. So within that, this could be expanded, of course, by just having subfolders that refer to the templates. And very easily, if you've ever had to create a document in Word from Markdown or Quarto or Pandoc or so on, you need a doc X that has a template for styles. Doesn't matter what kind of text is in that document, but it's going to have all those styles, and I don't know of any way to programmatically do that within R, so this is still what we have to work with. We have to still use Word documents as templates to create these.

Closing thoughts

So some closing thoughts. I think this works very reliably and very efficiently. I don't know how I would have handled all the work that I like to try to complete in a week or a month or a year without having this. And I will certainly always say my goal of writing once and being done with it probably doesn't occur very much. It does occur sometimes, which I'm happy about. So trying to go back, as you see with the little snapshot there, it's a fake, you know, I just put a little comment of myself there. You know, when I get these Word documents back or I see the Word document that everybody's collaborated on to review what I've created, I haven't found a way to put that back into the Markdown document and then run it. So those manual adjustments are still going to be needed. But that's part of working with a team, collaborating and so on.

All right. Last slide, I promise. So I just want to make some acknowledgments. I'm a longtime user of R. A very longtime user of R. But this is my first Posit Conference, so I've been thrilled to be here. It's been wonderful. So much true innovation. For me, innovation has a simple equation. There's a lot of creativity, but there's also a lot of value for your customers or your colleagues and so on. And I see that so often in all the talks I've been to. So thank you to the Quarto team as well, all the communities. I am very R-centric, but I've seen so many cool things being done with Python. It's really been a pleasure to view all those. And I also want to thank Articulation. So those of you who don't know who Articulation is, they are the company that the foresight of Posit, I've never seen any other organizers do this, hired Articulation to help us prepare these slides. So any flaws you see in my slides or my presentation or delivery are not due to Articulation. They've been helpful every step of the way. And I mentioned David earlier about the FlexTable package. And finally, Professor Lataw, who I've never communicated with, but I've been meaning to, she wrote the Captioner package many, many years ago. It's not actively maintained anymore, but gave me all the seeds that I needed to actually create functions that will help me with the cross-referencing in Quarto. Thank you for your time and attention.

Q&A

Thank you, Bill. We have time for a few questions. First, would you recommend this workflow for teams and industries where there aren't strict formatting requirements in their outputs? Yes, yes. I would say actually this is not so strict. I have a lot of control. When it goes in the official repository, all I need to do is have the margins set up in the right place. And that can be handled by the template that I write. So yes, I know that menu was a little bit small, but I have a later version where I just have some much simpler documents that will have page numbers in them and not those headers and footers. So absolutely, yes. Yeah.

We love converting Quarto to Word or PowerPoint, but no-code stakeholders often like to edit the documents directly or resolve comments. How have you handled that problem? Could you repeat that again? So having colleagues who may need to edit the data and make fixes themselves if they are not data scientists being handled. Right, right. So I don't have any good solution. I think I talked about that in the last part of the slide. I have to get that Word or PowerPoint document back and, you know, transfer the edits. So I do have to do copy and paste, which I really try not to do. But in those cases, so far I haven't gotten a solution for that. But I'd love to hear about it if anybody knows. I'll keep looking.

How do you integrate this workflow to programmatically render documents with colleagues who depend on track changes in Word docs for collaboration? Another good point. I don't know how to do that. Word is still a very binary-focused format. So trying to get the comments out, I've tried to see a few things. There's a vendor called Writage that may do that. But again, I don't have a way to go back once I've created the Word document to sort of fold those in and version control them. I can certainly version control the existing QMD document, the Quarto document.

You mentioned FlexTable doesn't play nice with Quarto. How do you end up implementing FlexTable? Well, again, that last slide has the captioner package. You can still find it online, but it was general enough not to have to worry. I think that from what I've read with issues that have been opened up over the years, there's a format that Word likes to use called OpenXML, which is maybe not really that open. And that just doesn't seem to play very well with Quarto at this time. Or maybe Pandoc. It just doesn't play very well with it.

Great. Time for one last question. How do people typically consume these reports? Is it with your guidance, or is it they read through on their own? Most of the time, they just want them. So I'm not under the impression that... Well, there is their importance. They need to reference them in their larger documents. So we have what we call dossiers in the pharmaceutical industry that we need when we're asking for a new treatment to be approved or we're making manufacturing change where I go in. We have these communications. So a lot of times, they need these reports because they want to reference them, and somebody reviewing them at the health authority agencies will want to see that.

So I doubt... Some of them do read them end to end. So of course, that's a whole other... If I understand the question correctly, that's a whole other topic to try to make them as readable as possible. But this helps me. I go back to when I was... You could tell by my age. When I was first writing my dissertation, LaTeX just became available. And I thought that was the coolest thing. And I'm so glad that after 20 years of having to muck around in Word documents, maybe the last 10 years or so or five years, I've been able to actually use markup language to create the documents. So I don't have a good answer to that. I mean, it just depends on the context whether they want to review it or not. Thank you, Bill. Yeah. Thank you, everyone.