[84] Reproducible Publications with Python and Quarto (Thomas Mock)

Transcript#

This transcript was generated automatically and may contain errors.

Hey everybody, welcome to Data Umbrella's webinar. I am unfortunately unable to share my video today because it's been over 85 degrees Fahrenheit in New York and that impacts my internet access at times, so no video today, sorry about that. I'm going to do a brief introduction and then Thomas is going to do his presentation, and anytime during the talk you can ask questions, but we'll answer the questions, Thomas will answer the questions at the end of the talk. This is being recorded and will be placed on the Data Umbrella YouTube.

For Data Umbrella, if you're unfamiliar, we are a community for underrepresented persons in data science and we are a non-profit organization. This is our team who makes it all happen behind the scenes. We're dedicated to providing harassment-free experience for everyone and we thank you for helping to make this a welcoming, friendly community for all.

There are various ways to support Data Umbrella. The first and foremost is to follow our code of conduct. We'd love it if you could share Data Umbrella events with people in your network and let them know about it, and we have some open source initiatives as well. We have video timestamps where we do timestamps of all of our videos and that helps users find our content. We have a website which is not on Quarto , I'm sad to say, it is using JavaScript, but maybe we'll change that.

We've also had the Sprint website for PyMC and Scikit-learn. And our blog and our events board is as well open source. Another way you can support Data Umbrella is to donate to our non-profit. We're an open collective and if your company uses Benevity for company match donations, that's an option as well. This is just a sampling of some of our videos on our YouTube. We have a playlist for career advice, and we have other playlists for data visualization, data science for beginners, and a lot of open source. We focus a lot of open source and data science.

We also have a monthly newsletter which is at dataumbrella.substack.com. It's really just once a month if we have the resources to do it, so we won't send you any spam or sell your email address.

Today's talk is reproducible publications with Python and Quarto. Tom is a product manager at Posit. He was formerly a product manager in the Quarto team, and you can find Thomas Mock on GitHub or Twitter. With that, I'm going to hand over the mic and screen share to Thomas.

Thank you so much for the introduction and for having me here today. Can you confirm is my screen okay? Yes. Great. I'm going to share my slides in the chat so you can get them there, and then the URL is also here on the screen at thomasmock.quarto.pub. As I mentioned, I'm a product manager here at Posit. I manage the Posit workbench and RStudio IDE and platform, so definitely a lot of love for the data science community and using IDEs to do better reproducible data science. As far as Quarto, this presentation will be about 35 slides or so and really focused on a broad overview with some deep dives on specific topics, but really doing this kind of overview of what Quarto can do and how it might be interesting for you. Feel free to ask questions in the chat, and I'll get to those at the end of today's presentation.

What is Quarto?

First thing I always want to mention is people may have not heard the name Posit, but you might have heard of the name RStudio. RStudio is a public benefit corp, and we recently became Posit as another company name as a rebrand, and we are still a public benefit corp. We mainly went through this rebrand so that people are aware that while we still are investing in tools for R and tools like RStudio, that for a long time we've also supported languages like Python. That's part of why I'm giving this presentation today is talking about how Quarto is this multilingual tool for R, for Python, for Julia, for JavaScript, and for future data science languages as they come up.

As far as what is Quarto, number one, if you've ever heard of things like R Markdown, Quarto could be thought of as like the next generation of R Markdown, but again this language agnostic idea. If you were to ask the Quarto team, they would say something like Quarto is an open source scientific and technical publishing system that builds on standard markdown with features essential for scientific communication. So importantly this includes things like computation, so you can use data science languages like Python, R, Julia, or even observable JavaScript directly inside the notebook document itself. You can also write or mark up or mark down the kind of structure via Pandoc-flavored markdown with a lot of enhancements that are provided through Quarto, and then you can create all sorts of different outputs. So from a single source you can create a document, a presentation like today that I'm giving that was created in Quarto, entire websites, blogs, books, you know, even manuscripts that you can submit to journals. Overall it's a literate programming system in this tradition of things like Org Mode or Weave, R Markdown, JupyterBook, etc. where there's lots of different kind of frameworks and we're trying to add to this community and combine a few different things where you have multiple languages available for them.

As far as the origins of where Quarto came from, it was an open, it is an open source project sponsored by Posit, again formerly known as RStudio Public Benefit Corp, so if you're trying to figure out like where we came from that's our history when we were founded over a decade ago. We have over 10 years of experience with the R Markdown open source framework, which is a similar system to Quarto but that was very R specific. Overall though this decade of experience with it convinced us that a lot of the core ideas were sound and could be applied to other languages. We realized that of course the number of languages and computational runtimes that are used for science are very broad and it's not just R, it's not just Python, it's not just Julia or JavaScript, it's some combination across all these different ecosystems.

So Quarto is really this ground-up reimagining of R Markdown, modernized and made multilingual and multi-engine so that you can use whatever language you want with it and have a consistent framework that is used across all of them.

So Quarto is really this ground-up reimagining of R Markdown, modernized and made multilingual and multi-engine so that you can use whatever language you want with it and have a consistent framework that is used across all of them.

Quarto absolutely gets inspiration from R Markdown as well as the Jupyter ecosystem. It can actually work directly with plain text documents like R Markdown, but we call them Quarto documents, as well as Jupyter notebooks and the IPython notebook structure itself. This overall goal though for Quarto is this computational document, so a document that includes the source code that creates it as well as having a notebook format and a plain text flavor. We're really big fans of plain text but there's lots of people who want to use say a Jupyter notebook or an IPython notebook as their entry point into Quarto and that's absolutely supported as well. And importantly you want to be able to use Quarto to extend further to make it either more reproducible or better automate and have things like parameterization and programmatic automation and generation of your reports or documents that you're creating.

When we think of moving beyond just a computational document though we want to have support for things like scientific markdown. Of course you know you get started in something like Word and if you're trying to create these technical documents you hit this really big learning curve and it's hard to do all of the things you want to do in a single document. All of the things you want there or it requires a lot of fiddling so you're like okay well maybe I'll switch to LaTeX and it's like you start using LaTeX and you're like wow this is really hard to get started with really really powerful but maybe it's hard to collaborate with others on. And then you realize that even for something like markdown it's relatively easy to get started but there's some unsupported syntax that you know you're trying to do more with it it's not available. So Quarto is trying to get the best of all these worlds of as easy to use as Word as powerful as things like LaTeX and markdown but in a cohesive format that you can use across with computation as well. And then lastly there's a goal of this single source publishing. So at the most basic Quarto can be used to create manuscripts and scientific communication but you can also take a Quarto document and create again a presentation, a website, a blog, all sorts of different things and you can use that to have reproducibility across your publishing pipeline as well as creating useful outputs beyond just a PDF or just an HTML but really all these different formats all at once if you desire.

Ultimately I just want to say that Quarto being an open source product is crafted with a lot of love and care and we really thank the community and all the different contributors and open source projects that help build this out.