J j Allaire Jupyter Notebooks + Quarto for customizable and reproducible documents, websites and

Transcript#

This transcript was generated automatically and may contain errors.

Okay, welcome. It's my pleasure to introduce JJ Allaire , who's going to speak to us about publishing Jupyter Notebooks with Quarto.

Thank you. Thanks very much. It's really, really exciting to be here this week. And I am going to talk about a new scientific and technical publishing system called Quarto that has it's based on Jupyter. And before I get into all the details of Quarto and how it works with Jupyter, I want to talk a little bit about the context of the project, our kind of overall motivation and goals.

So many of you here have probably read this paper by Brian and Fernando about Jupyter and kind of what has made it such an important part of the scientific computing ecosystem. And they talk about, certainly interactive computing is a big part of it. But as you can tell from the title of the paper, a really fundamental part of it is Jupyter as a tool for helping you think and tell stories with data, code and data.

And a big part of using Jupyter is writing. And writing, when you're using a Jupyter Notebook, kind of helps you think about what you're going to do. By, oftentimes, by writing about the code you're about to execute, you maybe think differently about the code and write it a different way. And similarly, when you present data or visualizations or metrics in a notebook, writing about it helps the reader understand the subtleties and context of the data better.

And many of you may have also seen the Edward Tufte 's pamphlet, which is sort of a takedown of the reductive style of presenting data with PowerPoint. And interesting, I think he'd be quite pleased to hear the previous presentation because one of his big examples in this is about how NASA approved the ill-fated Space Shuttle launch based on a PowerPoint presentation that was quite reductive, needed to articulate quite a bit more in terms of narrative and assumptions and subtleties. And I think he'd be pretty encouraged to see that that NASA and the JPL is making extensive use of notebooks for discussing these kinds of technical things.

But really, metrics and visualizations don't tell the whole story. There's assumptions, there's constraints, there's where do we get the data from that are critical. And I think this leads to the idea that we need tools to help with telling stories from data.

And it turns out that the scientific community, the scientific tool building community, has been working on this for a long time, starting with tech and literate programming, through all of the work that has been done on notebooks and various systems, to work on Markdown, Emacs Org Mode. There's been a lot of work. And I think in 2023, a lot of that work is really coming together to create the opportunity to make a very compelling platform for communicating and telling stories with data. So that sort of brings me to Quarto.

What is Quarto?

What is Quarto? It's a new open-source scientific and technical publishing system. It builds on standard Markdown, and it's really, it's kind of hallmark is it has features that are essential for scientific and technical communication. It is new, but its roots actually go back over 10 years. We developed a system called R Markdown that was R-specific. That was like about 10 years ago, and we evolved that quite a bit over the years, but really felt pretty bad about the fact that the system wasn't able to serve the entire scientific computing community. It was just R. And so we actually rewrote it, improved it quite a bit with the lessons learned, and that was what Quarto represents, which is a sort of multi-language, multi-engine re-articulation of the things we did in R Markdown.

The project is primarily developed and sponsored by Posit. You might not have heard of Posit. We used to be called RStudio , and we renamed the company Posit to reflect the fact that we are doing many of our open-source projects are now multi-language. Many of our open-source projects are now multi-language. The company's sponsored a lot of open-source projects over the years, RStudio itself, Tidyverse , Shiny. So Quarto is sort of in that spirit.

And this is a goal that I don't think needs really a lot of emphasis in this audience. We all want computational documents. I think the one thing we talk about a lot on the Quarto team is to help users fall into a pit of success. So we want to make it easier than not to work reproducibly. So give people lots of benefits in terms of the type of documents they can produce, and then have the entire pipeline of producing documents fundamentally reproducible.

I think the one thing we talk about a lot on the Quarto team is to help users fall into a pit of success. So we want to make it easier than not to work reproducibly.

Another goal we talk about is looking at the history of tools for writing and what some of the benefits and trade-offs are. If you look at Word, it's a very accessible tool. Lots of people open up Word and they know just what to do with it, but it actually scales very poorly with document complexity. You take a tool like LaTeX, which is considerably harder to use at first, but once you learn how to use it, it absorbs complexity very well over time. And I think our goal with Quarto is to take Markdown, which is a base that's simpler and easier to start with, and evolve it to sort of give it the accessibility of Word, but also the scalability of systems like LaTeX.

And then finally, single-source publishing. And again, in 2023, when we write content, it's not just, hey, I wrote a Word doc, here it is. It's, I need to publish it on the web. I want it to look good on the mobile web. I may need print for scientific publications. I might be creating an office document. A lot of the content we create today goes into content management systems. So basically, I want tools that help me publish my analysis and data and notebooks that support a wide range of output formats.

So many times, users who want a feature do not need to wait for us to implement the feature. You can just implement shortcodes and filters to get the functionality you want.

There's dozens of them you can find on our site, but here's some examples. Lightbox treatments for images in HTML, a shortcode for embedding chemistry visualizations, QR codes, more tailored options for how code is displayed. So filters are a really, really powerful way to extend the system. There's a couple ways to write filters in Python. Pandoc filters, which was from the creator of Pandoc, and then Panflute, which is what I just showed. It's a little more modern. Lua filters, there's an embedded Lua interpreter in Pandoc which you can use for zero dependencies, and you can also write them in any language with JSON.