
Keynote, JJ Allaire: Reproducible Manuscripts with Quarto
JJ. Allaire CEO at Posit, PBC JJ is software engineer and entrepreneur who builds tools that empower people with technology. JJ has conceived and designed several industry leading products by balancing market, customer, and technical considerations, and by maintaining intimate involvement in all aspects of software design and construction. He is currently the founder and CEO of statistical computing company RStudio (now, a part of Posit, https://posit.co/). https://github.com/jjallaire https://mobile.twitter.com/fly_upside_down
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Thank goodness for Wikipedia. I'm able to introduce our next speaker. I've learned a lot from him in face-to-face contact. I didn't know, though, that JJ Allaire is a creator of a blog publishing product called Windows Live Writer, initially released in 2007, distributed by Microsoft as part of Windows Essentials. And in 2008, he co-founded FitNow, a company dedicated to mobile health. 17 million users. And now, he will teach us about Quarto. Thank you, JJ, and welcome.
Thank you very much. It's great to be here. I'm going to talk, my subject today is Reproducible Manuscripts with Quarto. This is not going to be a Quarto 101 talk, because I bet a lot of people here know a decent amount about Quarto or R Markdown. I will cover some foundational basics, but I really want to tell you about some brand new things that we're working on with Quarto that are aimed at reproducible scientific publishing. So, that's going to be the bulk of it.
But I will go a little bit into what Quarto is, kind of the philosophy of the project, the goals of the project, not so much the mechanics of how it works. Talk a little bit about the relationship between notebooks and Markdown and scientific publishing, and then talk about this new set of capabilities that we call Quarto Manuscripts.
What is Quarto?
So, most of you probably already know this, but the simplest way to think of what Quarto is, it's the next generation of R Markdown. We saw R Markdown in the last video. Most, I think all of you know what that is. And really, Quarto has really two goals, one of which is to make R Markdown work across different computing ecosystems. So, the Jupyter ecosystem, the R ecosystem, the Julia ecosystem. So, broaden it in terms of computational environments, and also take the things that we learned over the first 10 years of working on R Markdown and really kind of create a next generation of those capabilities.
So, again, open source scientific and technical publishing system. We do support quite a few different computational environments. We support Knitter and R, Jupyter, observable JavaScript. We actually have some folks from the Julia community working on adding a Pluto engine right now. So, the idea is that we're going to build all these publishing capabilities, and then there will be lots of different compute engines over time, and we'll be able to work with all of them rather than being tied to a single one of those. Obviously, the underlying Markdown engine is Pandoc, but we've added lots of enhancements for scientific publishing. Pandoc obviously has some very robust citation capabilities, but we've added lots of other things. And then tried to let you create lots of different types of output, both simple documents, but also presentations and blogs and websites, books, etc.
So, lots of other projects have had similar goals and similar features. I list a few of those at the bottom. So, we're very much in debt to all that work and continue to learn from those projects as we build Quarto. Again, I've covered most of this open source project sponsored by Posit, building on R Markdown, and just I would say I was frustrating. I think we really had a lot of conviction that the ideas behind R Markdown were really sound and useful, and had the chance to positively impact how science was done, but it was frustrating that it was only in the R ecosystem that the tools could be used. And so, we really wanted to, you know, our observation was that certainly Jupyter was quite broadly used. There were other new environments like Julia, and we wanted our work to be applicable across all of those. So, that's really kind of how we got started working on Quarto, which was about three or four years ago.
Another thing we like to think about is that if you think about tools for writing scientific manuscripts, you know, Word, you can see the blue line, it starts off very easy and very quickly it gets totally out of control. LaTeX is harder to start with, but it's actually once you learn LaTeX and learn the basic mechanics of it, the curve is actually relatively flat. It scales really nicely. And then traditionally Markdown has had, it's sort of a mixed story. It's simple at first, and then you try to do advanced things, and you get into all kinds of weird hacks and things like that. So, I think with Quarto, really what we really want to do, and I'll show some work that I think aims at this, we'd like to have an environment that is as close to Word as possible in terms of getting started in the initial ramp, but that also has the scaling characteristics of LaTeX, that as you do more and more complicated things, it works really, it grows seamlessly with you.
And then of course, this idea of single source publishing, which the manuscripts feature will really underscore the idea that we want to author our content in a universal format that can be repurposed across lots and lots of different mediums. Because obviously scientific papers, it's still very convenient to have them in the PDF, but also great to have them on the web and have them on mobile and other places.
So, I think with Quarto, really what we really want to do, and I'll show some work that I think aims at this, we'd like to have an environment that is as close to Word as possible in terms of getting started in the initial ramp, but that also has the scaling characteristics of LaTeX, that as you do more and more complicated things, it works really, it grows seamlessly with you.
Okay. I think I've mostly covered this. Why did we create a new system? It's really just this idea that the languages and runtimes used for scientific discourse is very broad, and we really wanted something that could match that. And kind of the idea is like do all these deep publishing capabilities and markdown capabilities and tooling, do it kind of once and make it very broadly applicable.
So, as I mentioned before, there are several different sort of compute engines. Knitter engine essentially gives us very, very close compatibility with our markdown. Most RMD documents can just be run in Quarto without modification. Jupyter is another engine. And then we've done something with ObservableJS, and as I said, others are possible, and there is active work on a Pluto engine for Julia.
And the way the Knitter engine works is exactly, this is the exact same diagram we used to use to explain our markdown, but instead of RMD, there's a QMD, uses Knitter, makes it very compatible with all the existing RMD files. One difference to highlight is that chunk options are generally encouraged to use YAML rather than put them inline. And there's just an example of a Knitter engine document, and you can see, very familiar for people who've used our markdown, you can see cross-references in here, you can see the label and caption in the code chunk.
So, that's the Knitter engine. The Jupyter engine actually has two different modes. One is you can use a QMD, which is like an RMD, so it's plain text. You can use any text editor to edit it, works very similar to RMD, and in that case, what happens, we take the QMD, we actually turn it into a Jupyter notebook, and then run it through our engine. So, that's extremely analogous to how RMD works. And there's another modality, which is I have a notebook, I have an existing Jupyter notebook, and I just want to render it, and that works too. So, in that case, you use all the tools that you normally use for editing Jupyter notebooks, whether it be JupyterLab, or VS Code, or what have you, Google Colab, and then you can put it through the same pipeline.
Now, in this case, there's no, by default, we do not re-execute the Jupyter notebook. This is one characteristic of Jupyter notebooks that people like a lot, is that for very, very expensive computations, you can control exactly when the computation is done, and then essentially, by not re-executing a cell, you don't have to pay for the computation again. It creates reproducibility problems, so it's not a total panacea, but it's certainly something that lots of folks in the Jupyter ecosystem do take advantage of.
And then, in terms of tooling, we have the ability to do side-by-side preview with VS Code, and Jupyter notebooks, and things like that. We have a JupyterLab extension for Quarto, and we have a VS Code extension for Quarto. There's a NeoVim mode that's pretty good, and there's some integration with ESS as well.
Quarto project types
Okay, so that's kind of a baseline about Quarto, and one of the other features of Quarto is this concept of projects, which is really just some additional behavior and organization around a directory that produces a more sophisticated output. So, the easiest example of this to understand is a website, where I've got 20 documents, and I want to organize them, and I want to provide navigation, and organize them into a site. A blog is really just sort of a derivative of a website that has a set of posts in it. Books tend to have chapters, and numbered chapters, and cross-references across chapters, and also support multiple formats. So, a book is a website, but also you would want to have potentially a PDF version of the book, or a Word version of the book, or an EPUB version of the book, ASCII doc version of the book, etc. And then, finally, journal articles, which is going to be... We're going to explore that a little bit more when we talk about manuscripts.
Right, so we have a system in Quarto for creating custom formats, similar to the R Markdown custom format system, and we've tried to provide a lot of tools for creating documents that create the LaTeX that's expected by journals when you submit to them, but also documents that can also work well in HTML. So, the idea is many of the things you would use like a LaTeX macro for, you could use a span or a div, and then we would automatically generate the LaTeX macro, but also do the right thing when in HTML. And the other thing we did here is a lot of work on a standardized schema for authors and affiliations, so that you can express that data once and have that get mapped into the correct front matter, title block, etc. for various journals, and also make that the data on authors and affiliations computable. So, that's Quarto journals.
Notebooks and scientific publishing
And I'm going to do a little bit of a sleight of hand here, and I'm going to assert kind of a rough equivalence between a QMD and a notebook, because QMDs really are very similar in that they mix prose and code and output. They serve a similar purpose. They work differently. Notebooks bundle together output and code in one file. QMD and RMD files are a text file, and then output is generated separately. But I think we can think of these as notebooks, and in fact, there are some industry-level initiatives about notebooks and scientific publishing, and it's more useful when you're talking to folks who aren't familiar with all the technical details to simply refer to these things all as notebooks versus trying to break it down into this taxonomy of things that have to do with kind of somewhat incidental technical details.
So the idea is that notebooks are a big part of how scientific communication happens, how scientific manuscripts are shared. However, none of the workflows for publications make notebooks a first-class entity. They're just sort of dropped on the floor, and then the content goes through. But how do I reproduce this article, or could I even just see the computations underlying the article? That's really not possible, and when you get into things like archiving, so archiving a scientific manuscript so that we can reference it again or compute on it in 20 or 30 years, again, the notebooks and the computations are completely gone. So that's a problem.
I think what would be good to have would be an end-to-end workflow that treats notebooks, either classic Jupyter notebooks or sort of Quarto documents, as a more primary element of the scientific record, and a workflow that encourages authors and rewards authors for making their work transparent and reproducible, and also acknowledges lots of the work that goes into scientific articles that are a lot more than just the composition of the article.
Notebooks Now initiative
So this is aspirational, and it might sound kind of fanciful, like, well, that's never going to happen. Great to talk about it, but where are these incentives going to come from? How are we going to get publishers on board? But I would tell you that there is quite a bit more hope for this than you might imagine. There is an initiative that was funded by the Sloan Foundation. It was a grant given to the American Geophysical Union, the AGU, called Notebooks Now, and the idea is elevating computational notebooks as primary elements of the scientific record. So this was a grant that resulted in a workshop initially that happened last fall. There were about 50 in-person attendees, and I'll get into who was there, but there were people from the R community, R OpenSci, the Jupyter community, Public Library of Science, Journal of Open Source Software, major publishers, all together to talk about how we can make notebooks a fundamental part of the whole workflow and process.
So we had an in-person meeting. There's a steering committee. You probably know more than one of the people who are on the steering committee. These are folks that are from the R community. Some of these folks are from the Jupyter community. Some of these folks are from the Open Science community, and then the scientific publishing community. So it's kind of the right group of people to get together to talk about how we can make notebooks and computations a more fundamental part of the publishing process.
So what's happened is we had the in-person workshop, created a bunch of working groups, and there's actually a couple of implementations of this system, or these ideas, planned. Quarto is doing one implementation, and then MIST is the Markdown dialect used in JupyterBook, is doing another implementation. So what I want to show you today is this feature, Quarto Manuscripts, is really built to implement the kind of spec and ideas in this Notebooks Now effort. Part of the idea is that the AGU is going to start by having one of their journals do an entire issue where all of the manuscripts have notebooks behind them and use all these tools, and the hope is that lots of other journals will follow suit, and even people who are creating new journals will find this makes it quite a bit easier to create journals that are more interactive, that use the web better, that collapse a bunch of the workflows that exist in the current journal publishing process.
Quarto Manuscripts feature
So basics are, I mentioned before, we have different project types in Quarto, so we have books, blogs, websites. So we have a new project type called a manuscript, which provides, kind of composes a bunch of other Quarto features together, provide a framework for writing and publishing scholarly articles. One of the features, as I'll show you in a couple of demonstrations, is creating multiple formats. So Word formats, LaTeX formats, and kind of give readers the ability to read a web version of the article, but also access all the other formats.
And then the other key thing is this idea of exposing the computations underneath the article. So the idea is that one or more notebooks or QMD documents are sources of content and computations. These are published alongside the manuscript so that readers can look at, even if your code is not printed in the manuscript, as it often is not, there's a way to get to it and view it and reproduce it.
So I'll just show you, let's see if I can, an example of one of these. Let's see if this works here. So this is an article that's created with Quarto manuscripts. You can see just the basic abstract, et cetera. You can see, actually, there's several other formats made available. One is an AGU PDF, Word, and I'm not going to talk about what this is, but this is, I'll briefly say what this is. One of the ideas behind this is that there's a standard form that's like manuscript exchange format that we can produce so that when you create one of these articles, it can be submitted to multiple publishers in a standards-based way. So anyway, different formats. And then, as you can see, as you look inside the article, there's different things. On the left, you can see the normal table of contents, but these are all the notebooks that served as the source of computations for the article. So as you read the article, you might say, well, you're reading about seismic monitoring stations and you want to dive into, well, how exactly did you do that analysis? And you can do that.
You can furthermore let's see here. Here's a figure and you can actually say, okay, great, that's a figure, but how is that figure derived? And you can see we have this source, which will actually take you to the exact notebook cell where this figure was created. So if I click this, you'll see it takes me to the notebook cell that created this figure. And this happens to be a Python notebook, but you could mix R and Python notebooks in the same document.
Demo: creating a manuscript project
All right, so what I want to do is show you a little bit of how this manuscript feature works, what it's able to do, just to give you a richer notion of all the problems we're trying to solve here. So I'm actually going to, I have some demonstrations and they're in the form of videos, and I'm told that if I go, if I full screen this video, it's going to be bad for the audio stuff, the remote participants and stuff. So I'm not going to full screen the video. I think it hopefully is big enough that it's visible to people. Does it seem okay? Okay, all right, so I'll kind of talk you through what we're doing here. And this is basically, we're going to start by just creating a new manuscript project in RStudio.
One thing before I, I'll pause for a second here, all the stuff I'm going to show you is in the daily build of RStudio and in the pre-release version of Quarto. So this is all like bleeding edge, like go get the latest thing. This is not stuff if you just go on your laptop now, you're going to find. I'll provide links at the end for where you can find stuff, but all this stuff is like latest and greatest. So we're going to create a new project in RStudio and it's a manuscript project. And this is, we're going to do, since our examples are all like AGU compatible, so this is about earthquakes. So we'll create the project sort of the same way we create any project in RStudio.
And we get the initial document and we'll just mess around with the metadata a little bit. So this is an empty document. We're going to kind of compose this document from scratch and get a manuscript. So first is making the title more reasonable and then putting lots of metadata in. And you can see this is quite a bit more metadata than you typically see in an R Markdown or Quarto document, but this is going to be used such that it's kind of uses, it uses it to tag the article and it uses it to create the correct front matter and title information for lots of different journals. So put in the rich front matter, again, that's a standard so that one set of front matter can work for multiple journals. So render the project and we see it, a local preview of it. And this is sort of similar to what I showed you before, but this is just like the basic document with no content in it.
So let's see, are we done with this? No, not yet. All right, so now we're going to, I'm actually going to show you how we would publish this to the web. And there's a linkage between GitHub pages and manuscript projects. You can also publish them other ways. They're just regular websites. But here we're going to add a Git repository, and then we're going to connect that Git repository to GitHub. That's using to use this package. So now we have a Git repository.
And once there's a Git repository, you've got this Quarto publish, and we want to publish to GitHub pages. And we say, sure. And now it's going to push the website to GitHub pages. One of the ideas behind using GitHub pages is the website is rebuilt on every commit. So as you make changes, it's constantly rebuilding the website and constantly keeping the published version of your manuscript up to date. So that's one of the benefits of using GitHub pages. You can certainly use other mechanisms, but here's the article now published. You can see the github.io in the URL.
We've added support for hypothesis comments. That's one of the features of Quarto. And so here I'm able to just add a comment to the manuscript. We can see this is a really basic kind of hello world. It's got a Word document. It's got a Mecha bundle. It's got a simple web content and comments. So that's our basic Git started.
Multiple formats and journal templates
So let's see. I'm going to show you the next piece, which is this idea of multiple formats. And so this, I think, will explain better how that works. So here you can see there's a Word. Now, when we publish this thing, it creates a Word doc version. And this is, again, going to be the hello world, really simple. But as, of course, the manuscript gets more elaborate, this will fill in with all the figures and code and content. One of the ideas behind the Word doc is just it's very, very convenient for reviewers to review a Word doc. And so that's a nice side effect you can have. Some people may want to read on the web, but some people may want to actually provide comments in Word.
You can see also we're going to add a PDF version. So you can see all the different formats supported here. So this is editing the project file. We're saying we'd like to have a PDF version as well. So when that's rendered, you can see the PDF shows up. And when we look at it, it's going to look like you expect it to look, which is just a basic kind of bog standard PDF created by Pandoc.
Now, this is where we get into a little bit of this idea of journal formats. So here we're actually going to add a custom journal format to this project. And we're going to use the AGU format. So we're going to install the AGU format or add it to the project. And now once we've done that, we can actually go over and say we'd like to create an AGU PDF. So now when we render, it's going to also create an AGU PDF. And we'll see what that looks like. There it is. And this is actually going to use the LaTeX template of AGU. It's got margin lines for review copy. It kind of conforms to all the expectations of reviewers of AGU articles.
And now we're going to say let's actually also add, we wouldn't do this in real life probably, or you might if you're submitting to multiple journals, public library of science. So we add that. And now we change our AGU PDF to public library of science and render. And you'll see we're going to actually in this case keep the text so that we can submit the text to the public library of science. Render the project. And then you'll see it's changed. It's no longer using the AGU template. It's using PLOS. And you'll see that looks quite different.
So the idea is we can author our stuff once in Markdown, create one set of YAML, and then be able to project that into lots of different journal publishing formats. Here you can see this is the LaTeX that was generated and uses the right document class and is ready to be submitted as LaTeX to the journal.
Okay. I want to just talk a little bit more about metadata and the idea of journals often have lots of requirements around how authors and affiliations are presented and special forms of metadata. And it's quite painful to reconstruct this for every different journal. And so here I want to show you a little bit of like we've added some in a plain language summary and then which is required by the AGU. A key point which may also be required by the AGU. So we've added those to our YAML and then when we render and you'll see the PDFs. In this case the Public Library of Science PDF will provide all the right things in the right places. And then similarly we go back and change this to AGU. It's going to feed the appropriate metadata to the appropriate place in the AGU template. Let's take a second to show that there's AGU and it will do the same. Same bit and you'll see at the bottom there's the plain language summary. So the idea is we want to invest a lot in clean metadata and then we can project that into lots of different publishing formats.
Visual editor for manuscript writing
Okay. The next bit I want to show, I actually want to ask a question before I show this. How many people here have used the Quarto visual editor to write a manuscript? Okay. One person. Great. I was thinking if 80% of you said I have, I'd say we're going to skip this. But this will be new data. I think probably, how many people here have used R Markdown to write a document? Okay. Great. Everybody knows R Markdown. Everybody's comfortable using a text editor to write their R Markdown documents.
But our goal here is that we actually, we go back to that slide about let's have something with the activation energy of Word. We actually want this to be very broadly usable. We don't want to create, we want to do all these wonderful things with being able to tie computations to manuscripts, but we don't want to create such a high technical threshold that only 15% of practicing scientists ever adopted. So we want to come a little bit more, we want to continue to provide the low level, use the text editor of your choice, but we also want to bring the tools closer to sort of the expectations of sort of mainstream word processor users, but also do a really, really good job with tools for scientific manuscripts.
So this is going to be a demonstration of sort of taking that initial manuscript that we created with front matter and actually building a paper. So here we're going to have in our code chunk, I think we're not going to laboriously code the whole thing here. We might actually laboriously code this one, but then there's going to be a plot that we do that we'll just paste in. So here we're just going to load up libraries. This is the Quarto visual editor you can see here.
So now we're, okay, we're doing a little bit of data filtering and we're going to have, yeah, so we've got, we computed a couple of figures that we want to use in our manuscript and now we want to basically, you probably have all, maybe you've all seen like inline R code. So here, the example is we've actually computed something in R. We want to use that in the manuscript. So we're able to add this inline expression. It means every time we recompute the data, the text in our manuscript also updates. So you can see when we render this, it's going to show the correct value of the latest computation of average years between corruptions. And there it is. So that's extremely useful in these manuscripts.
So this is going to be a citation. So here we know we want to cite something. So we'll insert a citation. In this case, we actually just have the DOI of the citation. So we paste in the DOI, it finds the DOI, makes, you verify it's the right one, insert the citation. So now it puts the citation in and at the same time, it's going to, you'll see when we render it, that it gets processed correctly, but that citation has been automatically added to the bibliography as well. So you can see citations there inline, you can see the references there in the document, and then the bibliography has been updated. So that's the idea of making working with citations dramatically easier and more productive, certainly than you could ever find in Word itself.
All right, so let's take a look at, this is now a figure that we're creating. You can see we've got a caption, we've got a label for the figure. So now we want to reference the figure. First, we'll show what it looks like in the document. You'll see it's got, it's numbered, it's got a caption. There it is. So it's automatic figure numbering, so if there's 20 figures, it'll do the right thing. And then we want to actually insert a cross-reference to that. We can, it knows the editor kind of knows about all the cross-references, we're able to look for them, search for them, and then of course it'll resolve that cross-reference correctly as well.
And there's an example again of like drilling into the code. Many articles will not show that code block, but it's very useful to be able to drill into it. All right, so cross-references also, they apply to figures, they also apply to sections. So here we're going to make a section cross-referenceable. And then once we've done that, we've kind of put the ID on it, now we can make a reference to it. I think we'll do that again. You can just type the section, type the cross-reference out and it'll auto-complete it, but this is an interface that lets you sort of browse all the cross-refs you have and make sure you have the right one. You can see reference point, this is the source, it's exactly what you'd expect. There's the ID, there's the cross-reference, and you can switch between the visual and the source editor anytime, and it has pretty good fidelity.
All right, this is going to show, I think, the fact that it resolves the cross-reference to the section and provides a link to it. Okay, and this is making sure our bibliography gets put in in the right spot, make sure that the bibliography section is unnumbered, and once you do that, it'll put the bibliography at the bottom along with other stuff like citation and things.
All right, okay, support for math, you kind of do the same, you use LaTeX to edit all the math directly, but as soon as you finish editing an inline or block or display equation, it automatically just renders it and resolves it so that as you're kind of reading over your transcript as you, or your manuscript as you write, you can see the equations all resolved. Some of you can probably take LaTeX, plain LaTeX, and just in your head, mind's eye, see the equation, but this is for those that can't so readily do that, this is a nice way to have a reference for what you're writing as you write. And then the equations are cross-referencable in the same way that everything else, so you see we put an ID on it, and then we've got the reference to it, and then when we render it, it will number the equation and resolve the reference to it from the EQ Poisson.
All right, and then last tables, which are super fun to edit in Markdown, as you all know. Here we'll add a table, give it a caption, and then once it goes in, we'll just paste that in. We can add an ID to the table, similar to how we've added it to other things, equations and figures and sections, and then we are able to cross-reference that table the same as we are able to cross-reference the other stuff.
Here, I think we'll auto-complete, yeah, we just auto-completed the table reference.
So, hopefully, providing all these writing tools in the visual editor will get more people willing to engage with this toolchain, and get more people working in a way that's like end-to-end computation, has computational integrity from end-to-end. This is all in RStudio. It's also in our VS Code extension.
We need to make this progressively more and more and more usable. You know, having it, I think you still need a pretty strong mental model in your head of what's going on with the Markdown to productively use the visual editor, but over time, we want to collapse that more and more, so that people who are familiar, they're familiar with R, or they're familiar with Python, they're familiar with notebooks, and we give them a writing tool that makes them really productive with scientific and technical content, and married to computations, and hopefully get lots more people working in this way.
So, hopefully, providing all these writing tools in the visual editor will get more people willing to engage with this toolchain, and get more people working in a way that's like end-to-end computation, has computational integrity from end-to-end.
Composing manuscripts from multiple notebooks
All right, this is actually really important. One of the things in our Markdown documents, and I think probably various of you have different workarounds for this, is that everything happens in the main document. So, all the computations go from top down in the main document, but in a lot of actual scientific papers, there are many sort of subcomputations and subprojects that you want to compose together in a paper. So, what I'm going to show here is basically making a separate QMD file, and also a separate Jupyter notebook, and then incorporating them into the main document. And then, again, they're completely navigable and referenceable from the main document.
So, I'll show that quickly here. So, here, we'll make a directory called notebooks. We will put a QMD and a CSV file in there that we've already written. So, this is just a separate kind of sub-document that I'm using to explore the earthquakes data. And we'll render that. And once that's happened, this is now something I can use in the main body of my paper. And so, you can even imagine teams of people where I'm just going to work on this notebook. It's going to create this table. I'm going to give it the right label and caption, and then you can just pull that in.
So, here, you can see I'm going to introduce an embed. So, we have this short code embed. It says embed a cell from a QMD that I have that separate. So, here, we're going to reference the QMD, and then we're just going to use the ID. And then, when we render this, it's going to pull that in from the other QMD and make it part of the main paper.
So, you can see how that works here. There's the table. That was, again, created in a different notebook. If I want, I can actually go and explore that notebook and see how that table was computed. Similarly, you can imagine a colleague. There's referencing the table. Right, and that'll number it and resolve the reference and kind of do what you expect here.
So, similarly, a colleague might have created a Jupyter notebook that creates a data visualization. I would like to use that notebook and visualization in my paper, and they've just handed me a notebook and said, here's my notebook. It has the visualization. Use it. So, that's the data screening, IPyNB. So, similarly, I can do an embed of a cell within an IPyNB. So, here, I say embed and reference the IPyNB. And similarly, use the ID or the label from the cell inside the IPyNB. And now, when I render, we're going to see that IPyNB show up, and it's going to be a figure that's numbered and has a caption. There's referencing it.
And so, really, for working on kind of more sophisticated and non-trivial manuscripts, being able to compose tables and figures and computations from lots of different sources is really quite useful. And then, also, preserving them through the publishing process. So, there you can see that's the visualization from the Jupyter notebook. And if I'm reading the article, and I say, well, how is that visualization created? Or there, how is this table created? You can see the table was created using this Explore Earthquakes. And then, how is this visualization created? And I click the data screening, and I see the Python code that was used to create the visualization.
So, that's a pretty important idea in sort of having more flexible and composable and cross-language workflows for creating these manuscripts, while also still preserving reproducibility.
Getting started and resources
So, if you want to try this out, we've got pretty extensive documentation for it. I'll give you the link to that on the next slide. But just important to know that this requires the RStudio IDE daily build, and requires the Quarto pre-release. And those are the links for those. I'm going to provide the link to the slides as well.
We also have, I showed all RStudio today, but we have tooling for VS Code. And even if you are an R user who uses VS Code, I think it integrates with the R extension for VS Code. That's actually pretty good. So, that's worth trying. But you want to get the Quarto extension for VS Code. If you're using JupyterLab, again, you want to get the Quarto extension for JupyterLab.
And, you know, this is a new feature. It's obviously going to have lots of, require lots of feedback and iteration. So, if you're using it, playing with it, trying to make it work for your use cases, we would absolutely love to hear from you. The slides are here. And then, if you just want to learn more, the docs really are sort of tutorial style. So, they'll take you step by step from nothing to kind of a working manuscript. And if you want to learn more about the Notebooks Now project that I mentioned earlier, that's the website for Notebooks Now. So, thank you very much.
Q&A
Our next session starts at 10. We have time for a number of questions. Go ahead, Rafa.
In my experience, writing papers, when it's us, it's fine. So, when we're collaborating...
And perhaps more problematic. Illustrator, in particular, is a problem because the journals use it too. So, you send a plot that you completely can reproduce and they send it back with changes that...
So, yeah. So, there are two separate questions. One, I guess, would be, really, we do want these plots to be recomputable from code. And as soon as you start using Illustrator, you really undermine that. One question is, can we create sufficiently production-quality plots using the plotting libraries that we all know and love? We're definitely going to be investing in really high-quality SVG plots, which will actually print better. But I think if journals start to value reproducibility like this, they're going to have to accept that that needs to have integrity through the pipeline. And whatever rendered plot I give you, you're going to have to publish. I don't think there's a good solution for Illustrator and then back, and then they re-edit it with Illustrator. So, there might have to be some compromise, both on the side of the creator, making more production-quality, higher production-quality images, and then on the side of publisher.
Now, the first point you made is more interesting and problematic, which is your collaborators don't use this tool chain. You're not going to give them... Even RStudio with visual editor is going to be way too much for them. So, what we want to do there is we want to provide a way of sharing a manuscript that gives your collaborator a Google Docs-type experience where they can make comments, they can actually edit, there's real-time shared cursors, there's track changes, there's all those kinds of things that would also all sync back to your QMD or iFindV source. That's coming. That is something we're going to be working on. We are going to be working on. It's very important. It's just, you know, lots of other things to get working, but we view that as indispensable, and without that, this just fundamentally just caps out at the X percentage, whatever it is, 5, 10, 15, 20. Some percentage of people can handle this kind of a tool chain, and some just are never going to be bothered with it. So, we understand that. We have to do that.
Excellent. Thank you. We do have an online question. Can you ask JJ about where to host and create these notebooks? Many would want to host internally, since they can develop. What are the options?
So, what you're creating is basically just a directory full of HTML files. So, you'll see when you create a manuscript, there's an underscore manuscript directory, and there's just a bunch of HTML files in there. So, the first answer is anywhere where you can just put a bunch of HTML files up, we'll be able to host these. There's no runtime. We have Quarto Publish has tooling to make it easier to publish things. So, there it supports GitHub Pages. It supports PositConnect. It supports Netlify. It supports a little service we have called Quarto Pub. It supports actually Confluence. And so, there are a bunch of options. But really, if you have a way to take web content and put it on a server, then that works too.
So, you already answered half of the question. But on scientific publishing, especially in life sciences, the least common denominator is always the non-technical collaborators. Until we don't have this tool that you're working on, until we don't have this tool that the non-technical collaborators can use at the same time. We'll never, actually, there will always be a...
Absolutely. I totally agree. I viewed like when we started this, we wanted to do that, like, I mean, the first idea was like, let's just do that for R Markdown. And then we said, well, actually, that's just gonna be capped again. So, we just did it for R Markdown. And now, it's only the R community that can use it. And so, we kind of had to back up and say, okay, well, let's make this cross-language and let's make it handle all these requirements. But that idea of making it friendly to the non-technical collaborator is like, it's fundamental to the project. So, we will do that.
So, in theory, I like the idea of running all the computations as you're rendering the paper. But in practice, all of my analyses are multiple CPU years on a cluster. So, how does that work? Yep, yep. Two, there's three answers to that. So, one, I didn't talk about, there's a feature of Quarto called Freeze, which basically says, serialize all the outputs of the computations, and then you can bring them back. So, you could re-render the document, and as long as the document hasn't changed, it's gonna bring those back. So, that allows you to sort of go across time and space and not have to re-render everything. So, Freeze is one answer. And that's where you actually, with Freeze, you actually check in those outputs into version control. They become part of the project.
The other answers are caching-oriented, and one cache is, Jupyter Notebook is a cache, and so that's one way to do it. I put it in a Jupyter Notebook, I'm not gonna re-render it, and I can just reference it, as I showed in the last piece. And then there are, as you probably experimented with, there's like Knitter Cache and Jupyter Cache and things like that. So, there's a few different ways to do it, but it really has to do with just making sure that, at that time of that expensive render, that the appropriate thing is serialized, so that it can always be brought back without requiring a re-render.
So, I'm going to have to do a big transition, since we have so many people. So, I want to take one more question. Please. So, you just talked about Illustrator. I think one of the reasons why people use Illustrator is that, like, we not only have figures produced by code, but we also have, like, microscopy images, and also have, like, we use biorender to make schematics and so on, and it's just, in R and ggplot, it's kind of pain in the ass. So, will Quarto make that kind of thing easier? I know we have patchwork, but it's...
Yeah, I mean, you can bring in, if you have things that you've created, they can be brought into Quarto. And so, the real question is, like, what is the... If the process of changing that is just Illustrator, then, you know, it's still... Like, the problem that was mentioned before was, like, the process for creating the graph is code, and then a bunch of hand edits happen to it. If the process for creating the image is Illustrator, then you just keep re-editing it, and then bringing that edited image into Quarto. So, that shouldn't be a problem. I don't know if I'm missing some of what you're asking about.
I think that we're probably going to have to make a transition so that the next session can start on time. How long are you going to be around here, Joe? I'll be around for... I get to be back at the office at three, I think, so... Okay. So, please do bring your questions to him, and we'll try to get them out as quickly as possible. Thank you very much.

