
J j Allaire Jupyter Notebooks + Quarto for customizable and reproducible documents, websites and
To share our results and communicate effectively in data science, we need to weave together narrative text and code to produce elegantly formatted, interactive output. Not only does it need to look great, but it needs to be reproducible, accessible, easily editable, diffable, version controlled and output in a variety of formats, such as PDF, HTML and MS Word for convenience and often compliance. Jupyter has already made so much of this possible. By combining Jupyter with the open-source publishing platform, Quarto, built on Pandoc, you can easily create the output format and the styling that you need for any situation. With Quarto, you can author documents as plain text markdown or Jupyter notebooks with scientific markdown, including equations (LaTeX support!), citations, cross references, figure panels, callouts, advanced layouts, and more. You can also engage readers by adding interactive data exploration to your documents using Jupyter Widgets, htmlwidgets for R, Observable JS, and Shiny. In this talk, we’ll discuss authoring these dynamic, computational documents with Quarto and Python that bring code, output, and prose together, leveraging integrations with both Jupyter and the Quarto VS Code extension. Whether you’re new to Jupyter or have thousands of notebooks already, we’ll walk you through using a single source document to target multiple formats - transforming a simple document into a presentation, a scientific manuscript, a website, a blog, and a book in a variety of formats including HTML, PDF and MS Word. We’ll also show how you can change themes and styling, and publish these artifacts directly from the command line to the web, so they’re immediately available online
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Okay, welcome. It's my pleasure to introduce JJ Allaire, who's going to speak to us about publishing Jupyter Notebooks with Quarto.
Thank you. Thanks very much. It's really, really exciting to be here this week. And I am going to talk about a new scientific and technical publishing system called Quarto that has it's based on Jupyter. And before I get into all the details of Quarto and how it works with Jupyter, I want to talk a little bit about the context of the project, our kind of overall motivation and goals.
So many of you here have probably read this paper by Brian and Fernando about Jupyter and kind of what has made it such an important part of the scientific computing ecosystem. And they talk about, certainly interactive computing is a big part of it. But as you can tell from the title of the paper, a really fundamental part of it is Jupyter as a tool for helping you think and tell stories with data, code and data.
And a big part of using Jupyter is writing. And writing, when you're using a Jupyter Notebook, kind of helps you think about what you're going to do. By, oftentimes, by writing about the code you're about to execute, you maybe think differently about the code and write it a different way. And similarly, when you present data or visualizations or metrics in a notebook, writing about it helps the reader understand the subtleties and context of the data better.
And many of you may have also seen the Edward Tufte's pamphlet, which is sort of a takedown of the reductive style of presenting data with PowerPoint. And interesting, I think he'd be quite pleased to hear the previous presentation because one of his big examples in this is about how NASA approved the ill-fated Space Shuttle launch based on a PowerPoint presentation that was quite reductive, needed to articulate quite a bit more in terms of narrative and assumptions and subtleties. And I think he'd be pretty encouraged to see that that NASA and the JPL is making extensive use of notebooks for discussing these kinds of technical things.
But really, metrics and visualizations don't tell the whole story. There's assumptions, there's constraints, there's where do we get the data from that are critical. And I think this leads to the idea that we need tools to help with telling stories from data.
And it turns out that the scientific community, the scientific tool building community, has been working on this for a long time, starting with tech and literate programming, through all of the work that has been done on notebooks and various systems, to work on Markdown, Emacs Org Mode. There's been a lot of work. And I think in 2023, a lot of that work is really coming together to create the opportunity to make a very compelling platform for communicating and telling stories with data. So that sort of brings me to Quarto.
What is Quarto?
What is Quarto? It's a new open-source scientific and technical publishing system. It builds on standard Markdown, and it's really, it's kind of hallmark is it has features that are essential for scientific and technical communication. It is new, but its roots actually go back over 10 years. We developed a system called R Markdown that was R-specific. That was like about 10 years ago, and we evolved that quite a bit over the years, but really felt pretty bad about the fact that the system wasn't able to serve the entire scientific computing community. It was just R. And so we actually rewrote it, improved it quite a bit with the lessons learned, and that was what Quarto represents, which is a sort of multi-language, multi-engine re-articulation of the things we did in R Markdown.
The project is primarily developed and sponsored by Posit. You might not have heard of Posit. We used to be called RStudio, and we renamed the company Posit to reflect the fact that we are doing many of our open-source projects are now multi-language. Many of our open-source projects are now multi-language. The company's sponsored a lot of open-source projects over the years, RStudio itself, Tidyverse, Shiny. So Quarto is sort of in that spirit.
And this is a goal that I don't think needs really a lot of emphasis in this audience. We all want computational documents. I think the one thing we talk about a lot on the Quarto team is to help users fall into a pit of success. So we want to make it easier than not to work reproducibly. So give people lots of benefits in terms of the type of documents they can produce, and then have the entire pipeline of producing documents fundamentally reproducible.
I think the one thing we talk about a lot on the Quarto team is to help users fall into a pit of success. So we want to make it easier than not to work reproducibly.
Another goal we talk about is looking at the history of tools for writing and what some of the benefits and trade-offs are. If you look at Word, it's a very accessible tool. Lots of people open up Word and they know just what to do with it, but it actually scales very poorly with document complexity. You take a tool like LaTeX, which is considerably harder to use at first, but once you learn how to use it, it absorbs complexity very well over time. And I think our goal with Quarto is to take Markdown, which is a base that's simpler and easier to start with, and evolve it to sort of give it the accessibility of Word, but also the scalability of systems like LaTeX.
And then finally, single-source publishing. And again, in 2023, when we write content, it's not just, hey, I wrote a Word doc, here it is. It's, I need to publish it on the web. I want it to look good on the mobile web. I may need print for scientific publications. I might be creating an office document. A lot of the content we create today goes into content management systems. So basically, I want tools that help me publish my analysis and data and notebooks that support a wide range of output formats.
Quarto and Jupyter: the basics
So I've sort of described the fundamental requirements for these tools for computational narratives, being able to interoperate with Jupyter, handling technical content, code, citations, cross-refs, lots of output formats, and extensible. Now, there have been tools like this for Jupyter for a long time. NbConvert is the obvious one that's been around forever and does all these things. But in recent years, there's been some new systems developed, JupyterBook, MISTJS, and Quarto. And the main, I'd say the main point of departure for these tools is that they focus a little more on giving you production quality output out of the box, but also supporting things like citations and cross-references, and then more complex project types like websites, blogs, and books. So a lot of these fundamental things have been in NbConvert, NbViewer, things like that for a long time, but these new tools go quite a bit beyond. And so I'm going to talk about Quarto in this talk, but I think the other tools are also very much worth a look.
All right, so what are the basics of Quarto? How does it work? Fundamentally, we start with Jupyter and markdown that's either written or produced by computations in Jupyter. It ends up going into a system called Pandoc, which is a markdown engine, which I'll talk a little bit more about a few slides down. Pandoc itself is able to produce output in lots and lots of formats, and that's kind of the basic mechanic of how Quarto works.
So I'll give you the most straightforward example that you've seen before from NbConvert or NbViewer. There's nothing really special here. It's just taking a notebook that has a table and a plot and producing a webpage from it. So again, nothing different or special about that.
But Quarto has lots and lots of options on the order of dozens of options that you tailor the output that you produce. So as an example, here I've added some options to the kind of front matter of the notebook. I want to use a different theme, I want to use a different highlight style, and I want to let readers comment on the notebook using hypothesis. So when I render this document, you can see the theme is different, the highlighting is different, and then you can see this bar on the right that provides the ability to comment.
So that's a document level. Those are document level options. There are also cell level options. So here I'm again showing some document level options, and these have to do with how code is displayed to the user. I'll demo that in a minute. But then at the cell level, I'm saying I'd like to provide a caption for my figure, I would like it to be a numbered figure, and I would like to make it cross-referencable by adding these two options with this sort of option comment syntax. So when I render this, you can see the code is indeed hidden, the caption is there and numbered, and the figure is cross-referencable. You can see a code menu in the top right, and when I flip that down, I'm able to show and hide all the code as a reader. So those are just examples. There's lots and lots of options to let you customize kind of the HTML output that you get.
Output formats
We mentioned lots of formats. There's actually dozens of formats available. Here's another example of a docx. And this is pretty straightforward, just changing the format. And then I didn't highlight it, but you can see we've got two figures here, two plots, and we've specified that we want them laid out in two columns. So when we render this, we see the code and then the plots laid out in two columns.
Another example here is a PDF, and in this case, I'm bringing in a bibliography. And it turns out that the optimal reading width is considerably less than 8.5 by 11, so a lot of PDFs, that's why you see a lot of two-column output in journals. So PDFs don't use the full width of the page, and web pages don't often use the full width of the page. So here I'm going to put the citation in the margin, and I'm also going to specify that when I show my two figures, I want it to use beyond the width of the main text. So when I render the PDF, you can see that the margin shows up, the caption shows up in the margin, and then my plot and code are able to use the full width that goes beyond the width of the text.
These have been documents. You can also create presentations with Quarto using Reveal.js. This slide deck here was actually created with Quarto. And here I'm just saying I want my format to be Reveal.js, and I'd like a logo in the bottom right and slide numbers in the top right. And then you can see a slide from that where it shows the figures along with the logo and the numbering.
You notice I didn't show the code here by default. When you do a presentation, the code is not shown, but you can show the code. Here I'm adding a cell option that says, please do show the code, because in this example, I might be trying to teach someone about how to build a plot with Seaborn. And I want to highlight specific line numbers as I narrate how the plot was constructed. So here you can see it shows the code and the plot, and then this line four is highlighted. And when I advance the slide, then line five will be highlighted. So this is a good way of explaining code or explaining how things are done using a slide.
Projects: websites, blogs, and books
I've sort of focused mostly on individual documents, but Quarto has a project system that lets you aggregate lots of documents together. This is a simple Quarto project file that says, I want to build a website, some options for the website and how navigation works on the website, and then some options that are shared options for all of the HTML documents in the website. So that's the basic mechanic of a simple project. In the real world, there's more configuration that goes on. Here's an example of the website for the Fast AI Deep Learning for Coders course. And here you can see there's a bit of social options related to social sharing, options related to links back to the repo that was used to produce the website, customizing the navigation. And you can see here, this is now a website where all of these documents on the left are separate documents that are all brought together with navigation into a website.
So that's a website. Blogs are a little bit different. The navigation for blogs is different. And so we have some features that support that. Here you can see similar, some get oriented options, some social sharing oriented options, some navigation options. And here the home page provides like a tiled display of posts and categories. So that's another type of aggregate project that you might see. And then books are another one, where here I'm showing I've got a cover image, I've got an explicit list of chapters. And when the book is rendered, you can see the chapters are numbered, the subsections are numbered. Cross-references go across chapters and use the section numbering.
So books, again, slightly different than websites. They are a website, but they actually have more structure built into them. This is a web version of a book. But when you write a book with Quarto, it actually supports, there's five different formats. So it's very useful to have a web version, but you often want to print version, perhaps a Word version for mobile readers, an EPUB version. And then ASCIDoc is actually useful for publishing. Manning and O'Reilly both use ASCIDoc as their kind of fundamental sort of underlying format for publishing books.
Pandoc as a foundation
OK, so I've covered some of the basics of output formats, some of the obvious output formats that you'd expect to see. But Quarto was built on, as I said, Pandoc, which has been around for quite a long time. It's almost 20 years. It was created by John McFarlane, who's a philosophy professor at Berkeley. He's also been very, very active in the standards community around Markdown. He was the main author of the Common Mark spec and reference implementations. So Pandoc you can think of as Common Mark, plus many extensions for technical writing, citations being the foremost of those.
But there are lots of other extensions. And Pandoc, the reason we love Pandoc so much as a foundation, it supports dozens of output formats. That system is extensible. You can create custom writers for new formats. You can do lots of filtering of documents. So it's a great foundation for building a publishing system. And as you can see, I mentioned the obvious things like HTML, PDF, and Word. But it also supports OpenOffice, different presentation types, lots of different Markdown flavors, which becomes important in content management systems, wikis, some other formats you may or may not have been exposed to, but which are quite important in various communities. So it's a really, really powerful and flexible foundation for creating lots of different types of output.
And we've used that. We've created some custom formats for popular content management systems. So Hugo, you may have heard of, it's for building websites. They use their own flavor of Markdown called Goldmark Markdown. We're able to produce Goldmark Markdown from Quarto. Docasaurus uses another system for documentation publishing, uses MDX Markdown, a whole different flavor. Confluence uses, actually not Markdown at all, but a custom XML format. And then O'Reilly uses ASCIDoc. So we can take that same notebook that we started off at the beginning, and in this case, we're going to produce from the notebook Goldmark Markdown. And now that notebook's available inside a Hugo website here as a blog post. Similarly, with Docasaurus, we have the same notebook. And we tell Quarto to render MDX Markdown. And then we can see that notebook inside our Docasaurus site.
Confluence, similarly, same notebook. Confluence XML is sort of HTML plus plus plus. There's lots of particularities of it. But we were able to create a custom writer so that then your notebook also shows up inside a Confluence site. And then finally, I referenced O'Reilly. We have a bunch of people writing books, O'Reilly books with Quarto. You produce the ASCIDoc to O'Reilly spec. And then they can use that to create their own print, EPUB, and web versions of the book. So using Pandoc as a foundation lets us target this huge range of output formats and lets you even create your own writers to target new formats.
Another exciting initiative that we have, we're actually working with people in the open science community, some scientific publishers, as well as the folks who are behind JupyterBook and Mist.js to kind of create a standard for including notebooks in scientific publications. And we're really excited about that. So there'll be more details on that probably in the next couple of months, so keep an eye out for that.
Workflow and plain text notebooks
So the basic workflow of Quarto, kind of how you use it. There are two main, I would say, verbs. One is render, saying take a notebook and create a document. Here you can see just render the notebook to the default format. Here we can see render the notebook to docx. And then the other big verb is preview, which is render the notebook, but then give me a live preview of it. So as I save it and update it, update the preview so I see what it looks like. And then I'll talk a little bit about this. There's another sort of variation of notebooks that's plain text that you can also use. I'll describe that in a minute.
So here's a basic workflow where you have a Jupyter notebook running locally. And side by side, the web page that's actually produced from the notebook, as you run cells and save, the preview automatically updates.
I talked about plain text notebooks. And actually, there's quite a few different formats for plain text notebooks. There's a project called Jupytext, which translates between IPy and B and plain text formats. And there are actually 10 different formats, five .py scripts formats and five markdown-based formats for representing notebooks in plain text. And representing notebooks in plain text can be, for some situations, useful because you can use any standard text editor. Works well with version control, although there's quite a bit of innovation, as we saw, going on in using notebooks in version control. But here you can see I've got the options in front matter. I've got a code cell. I've got markdown, another code cell, and then options inside the code cell. So that's another option.
And if you're using plain text notebooks, there's a VS Code extension that lets you provide lots of tools for editing them and also lets you do the sort of things you might do in a normal notebook, like running cells and running lines of code. So that is an option. And again, it's sort of different tasks and tools are called for for different projects. You have the option, of course, to use IPy and B or QMD. And we also have a JupyterLab extension that helps out with authoring for IPy and B.
And that rendering pipeline, I showed this before. For notebooks, we're not going to execute the notebook by default, so we assume that execution is done inside JupyterLab. But you can re-execute everything if you want for reproducibility or to reflect updated data. And then the plain text workflow takes the QMD, turns it into a Jupyter notebook, executes it, and so on. Same sort of process. There are some tools that are available for caching, because if you're re-executing the notebook every time, that could be time consuming in a way that's undesirable. So we have integration with JupyterCache from the executable books project, as well as an internal caching system we have in Quarto called Freeze.
Extending Quarto
There are quite a few ways to extend Quarto. We have an extension system that encompasses filters and shortcodes, which are sort of like creating sort of new markdown syntax and macros for generating content. You can write your own formats. Lots of people have written formats for, there's probably 15 or 20 formats for academic journals to produce the LaTeX that's required for various academic journals. And then some things for creating custom project types.
I'll talk just briefly about filters. And the idea behind a filter is that it transforms the document before final rendering. You can implement lots and lots of things with filters. Again, content generation, content filtering, doing format-specific things. I'll give an example of a filter using the PanFlute library. Here you can see I'm rendering a notebook, and I'm saying, please run this filter when the notebook is rendered. This is the whole source code for the filter. Most of it's just scaffolding the real, this is a really trivially simple filter, which basically says if there's a header, then bump the header up by one. So if there's a H3, make it an H, or if there's a, sorry, if there's an H2, make it an H3, and so on. Really simple, but it turns out that filters can do a lot of things, and many of the features in Quarto are implemented as filters. So cross-reference and citations, embedded languages, macro substitution, image conversion. And so we actually have dozens of filters built into Quarto, but you can actually write your own filters. So many times, users who want a feature do not need to wait for us to implement the feature. You can just implement shortcodes and filters to get the functionality you want.
So many times, users who want a feature do not need to wait for us to implement the feature. You can just implement shortcodes and filters to get the functionality you want.
There's dozens of them you can find on our site, but here's some examples. Lightbox treatments for images in HTML, a shortcode for embedding chemistry visualizations, QR codes, more tailored options for how code is displayed. So filters are a really, really powerful way to extend the system. There's a couple ways to write filters in Python. Pandoc filters, which was from the creator of Pandoc, and then Panflute, which is what I just showed. It's a little more modern. Lua filters, there's an embedded Lua interpreter in Pandoc which you can use for zero dependencies, and you can also write them in any language with JSON.
Finally, I just was now talking about how to extend Quarto from within. You can also embed Quarto in other workflows. I believe later today, Hamill's gonna give a talk about NBDev, which is a system for developing Python packages entirely in Jupyter Notebooks, and NBDev version two uses Quarto to produce documentation websites from NBDev projects. So lots of ways, since it's a command line tool, to embed it in other workflows and customize it. So that's all I have.
Would love to take questions. We have a, Posit has a booth just out here, so if we don't get to your questions now, just come by, I'll be there, and the rest of us will be there. We'll happen to answer questions this afternoon.
Q&A
Thank you very much for the presentation. I have two questions. The first question is, is it possible to have some interactivity in the Quarto? So can I, for example, interact with data and re-render the website, something like that. The second question, is it possible to use Quarto as a rendering engine on GitHub? So for example, I have repository of notebooks and I opened the notebook in the GitHub, so I would like to see the output from the Quarto. Is it possible to use it?
Okay, so on interactivity, there's a bunch of different ways to interactivity. You can use Jupyter widgets. You can use observable JavaScript. You can also using, you can use Pyedide and Wasm to embed interactive Python applications inside Quarto documents. So Quarto is, you know, it's deployed as a client-side asset. So the interactivity is gonna be all in JavaScript and we're using Wasm and Pyedide, but there are lots of different ways to do that. And they're outlined on there. We have a whole section about interactivity that talks about all the different options and how they work.
And then on GitHub, we have support for, we have a Quarto publish command that supports GitHub pages. So it's very easy to say I have notebooks in a repository and I want them to be published. The GitHub viewer is controlled by GitHub. So I, we're not able to, and maybe there's an extension API that we could use to override the default display, but that is something that I don't know if is hookable, but again, you can have, GitHub can host the published notebooks using GitHub pages.
Thanks for the nice talk. Is Quarto going to replace some of the R packages like block down and book down? Yes, yes. So Quarto is kind of the next generation. We built a lot of things in within R Markdown for things like books and blogs and websites. And this is sort of our re-imagining of all that that works with R and Python and Julia. So it really is a replacement for those things. And the future evolution of all those capabilities are going to be in Quarto, not in those packages.
