Reproducible Manuscripts with Quarto - posit::conf(2023)

Transcript#

This transcript was generated automatically and may contain errors.

Okay, thank you very much. Welcome to the Quarto session, and I have the distinct pleasure of talking about things that look like a stack of crepes, but actually they're manuscripts. So today I'm going to be talking about reproducible manuscripts with Quarto.

So I want to talk a little bit about the full sort of spectrum of complexity of reproducible scientific projects. What do I mean by that? Majority of my time as an educator particularly, and an educator who tends to teach students who are learning computing for the first time and does so with Quarto, I live in this wonderful land. It is a fantastic place to be. You have a single Quarto document. You are sort of like coding in one language because that's the language we're teaching the students, and we have full control over sort of like everything that they're doing, including the tools that they're using to write their code.

And the type of things that they're producing, while quite in-depth for them, are things that where the code can all live in a single file, and they don't mind running it over and over again with each edit. You want to change a header? Fine. Let's just run the whole compute again, and things go pretty fast and pretty neatly. And we tend to get HTML outputs as a result, and we can publish them on the web.

While that's like a majority of the time I'm living and breathing, the rest of the time I am thinking about, well, we collect data from these students about how they learn. At some point, we're going to want to publish these results. So how do we go from the simplest case to things that get sort of progressively more complex? So from simplest to let's go to simple. We can again have a document, a single Quarto document with just some R code in it, and with the help of the Quarto journal articles extension, I can output things to PDF or Word, not just output to any PDF or Word, but a document that looks exactly like the document in the journals I'm submitting my manuscripts to expect them to look like.

But again, the code in here is sort of, I'm going to say simple enough. That's not to say the analyses are very simplistic, but to say that I'm okay re-rendering this document every time I'm making edits to that, and this is a happy place to be too. But science is rarely that simple. Typically, you're working with multiple collaborators, each with their favorite computing language and code editor. And you're in multiple stages of a project, each with their own level of feasibility of what can be rerun with each edit and what needs to be cached. You don't want to be cleaning your data every time you want to change your headers for example.

So a more complex project might look like this. Maybe you have a single Quarto document, but a bunch of notebooks and stuff laying around, and maybe everyone's coding in R in your team, and you want to output to something like a PDF or Word for a journal submission. Or things can get even more exciting, where you have sort of the diversity of the tooling your collaboration team is using increases, where we have even more notebooks, where little bits of the project are being done. Maybe we have now multiple QMD files, some of them running R code in them, and some of them running Python code in them. But ultimately, we need to bring all of this together into a culminating manuscript that we want to submit to a journal.

The current state of reproducible scientific manuscripts

We can, turns out, leverage the notion of Quarto projects for writing these fully reproducible scientific manuscripts. So I'm going to take an aside for a second and talk about a little bit of a mind shift for some of you, and at least for myself as well. Whenever I use the word notebook, in my mind, I'm always thinking about like Jupyter notebooks because the word notebook is in there. But ultimately, a notebook is a document that contains both code and narrative. So going forward, when I say notebook, I could be talking about a Jupyter notebook, or I could be talking about a Quarto document, where you again see your code and narrative in the same place.

So let's keep that in mind. And think about what the current state of affairs is. Most computational science is born in notebooks, and it dies. That's sort of not so nice. Maybe ends in PDF or Word documents. And what happens in between is that we have peer review and publication workflows that don't tend to support these notebooks as research outputs. And these more complex scenarios that I talked about involve a lot of like manual finagling at the end. And not in a devious way, but just at the end of the project, you're just trying to get it over the threshold to bring the project to this journal submission stage.

And it's during this process that oftentimes reproducibility is lost. Or at least take some sort of second seat to formatting requirements. I will at least be honest and say that I have certainly done the thing where I had this beautiful notebook that I got to 90% of what the journal wanted as a Word file. And then I manually edited that Word file. And then I put a footnote saying the code can be found in the GitHub repository. Choose your own adventure.

And it's during this process that oftentimes reproducibility is lost.

And ultimately, the final submission that the journals want or archive once rarely captures all these computations. I mean, sure, they'll happily take your supplementary documents. But they rarely archive those documents with your manuscript. And sometimes they're not necessarily even reachable. So these can live in supplementary materials sometimes. And there is no ensuring that that GitHub repo I link to in my supplementary documentation is going to be there a year or two years from now. So if you've tried to reproduce others' work so that you could learn from them and iterate on it, you probably have had difficulty trying to sort of gather up all the pieces needed as well.

So there is, you know, this is the current state of affairs. But also within the current state of affairs, there's lots of great minds thinking about, well, how do we make this better? This is not a new problem. And there's been sort of like initiatives, various initiatives over at least my academic career of sort of trying to move reproducible computational science forward. And one of the sort of the current ongoing projects is this Notebooks Now project that's funded through this Sloan grant and a grant from the American Geophysical Union. And the idea is to try to sort of make these pieces work well.

So I want to interact with that code in a computational environment that's just a click away that has all the software and packages needed to reproduce the manuscript.

In 2019, there was this article published in Nature where actually an article on eLife allowed you to do this. As of today, that article is paywalled, and you can't access the links anymore. And this is not to say the authors did anything wrong. It's to say it's been years, and this is very near and dear to a lot of scientists' hearts problems that's really difficult to solve.

So I think with the manuscript feature with Quarto, we're making advances into getting there. So as of today, you can start kicking the tires on the Quarto use binder feature, where you can see in addition to these static outputs, you have a link to be able to launch a binder instance. You do need to be a little bit patient as it does its thing there. But we would love for you to sort of go in and take a look to see how this is working for you. Is it capturing everything that you need? And is it working out for you? And give us some feedback on that as well.

So where do you go from here? I would recommend rewind back to that get started and start again. And I would point you to the documentation for the Quarto pre-release, where Charlotte has actually created a wonderful repository that you can get started with. Thank you so much for listening. And I'd be happy to take questions.

Q&A

I'm back. Thank you for the great presentation. So I'm going to turn things real, real quick with one question. When we submit a manuscript, we often don't do it just once. Do you have any tips for a workflow, like if you have to switch templates because you're submitting to a different journal? And or resources slash ways to contribute to the pool of journal templates that are already available?

Yeah. Wonderful. So it's actually that first bit is easier than you think. And I know that we don't tend to write an article for multiple journals at the same time. So it generally tends to be iterative, where you maybe submit to one and then you try another. But you can add other templates here as well. So in this case, I only had on line 10 here one of the journal templates being used. But you can add other ones here as well. So I could have added something like PLOS PDF or JASA PDF and kept going. And Quarto will happily render all of them for you, grab the necessary bits of the YAML, and create those PDFs for you.

There is a rich and growing sort of ecosystem of these extensions. And I think the best way to contribute, if you have a journal you want to submit to, is grab one of the ones. There's actually a starter repo in that Quarto journals templates organization as well. And start sort of like playing around with it. You may or may not, if you need a PDF output, you probably want to be sort of a little bit confident with writing a bit of like latex styling code. But I would also say that if you've started with a template for a journal of interest to you, but you're running into roadblocks, opening some issues and asking for help would be great as well. As the developers are often like watching that space and trying to help people get over that hurdle. Awesome. Thank you very much.

Reproducible Manuscripts with Quarto - posit::conf(2023)

Transcript#

The current state of reproducible scientific manuscripts

A roadmap for fully reproducible manuscripts

Introducing Quarto manuscript projects

Demo: the manuscript website and features

Embedding computations from notebooks

What's next: interactive computations

Q&A

Featured software#

Quarto