Resources

Thanks, I Made It with Quartodoc - posit::conf(2023)

Presented by Isabel Zimmerman When Python package developers create documentation, they typically must choose between mostly auto-generated docs or writing all the docs by hand. This is problematic since effective documentation has a mix of function references, high-level context, examples, and other content. Quartodoc is a new documentation system that automatically generates Python function references within Quarto websites. This talk will discuss pkgdown's success in the R ecosystem and how those wins can be replicated in Python with quartodoc examples. Listeners will walk away knowing more about what makes documentation delightful (or painful), when to use quartodoc, and how to use this tool to make docs for a Python package. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Data science with Python. Session Code: TALK-1139

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello friends, I am Isabelle Zimmerman, and I'm here to talk about some docs. And you can talk about documentation in a lot of different ways. You could talk about how to write really accessible explanations to, like, technical terms, or you could talk about how to write examples that people can just, like, copy and paste from your documentation, and it looks great, and it works every time.

But I am an open source software engineer at Posit, which means I kind of am always in the mindset of thinking of things as tooling problems. And documentation is no different.

The R ecosystem has a really well-loved documentation tool called pkgdown. And if you're an R user and you've seen pkgdown documentation, you probably don't even realize all of the fantastic benefits you're getting from this. As a user, you log on, or not log on, you just, like, pull up a pkgdown site, and you have this beautiful index page, which is actually also the GitHub's readme. And then you have some tabs up at the top, and there's one that says reference, and you click on that to see the functions, and then there's usually something that says, like, read more, or articles, and then you go there if you need to have, like, certain workflows.

And if you're a developer, you also really benefit from this. I was told that you can actually build a pkgdown site in less than one minute, and you don't have to trust me. This came from Jenny Bryan, and I think it was established yesterday that we can, in fact, trust Jenny Bryan.

The Python documentation landscape

But I am in the Python space, and a little bit different than the very well-engineered R world. It kind of feels like the wild west out here. When I started building this package vetiver, I really struggled to find a tool that felt just right to create package documentation. In fact, you could say I felt a little bit like Goldilocks, and if you're not familiar with the tale, Goldilocks is a children's story of a young girl, and she's lost in the woods, and she finds a house, and nobody's home, and I guess she just, like, decides to walk in and start messing with the homeowner's stuff.

But it's super cool. They apparently left mid-breakfast, and she's going to eat all of their food, and she tries this first bowl of porridge, and it's freezing cold. She tries the next bowl of porridge, and it's, like, scorching, scorching hot, and the last one is just right. She ends up messing up, like, all of the homeowner's stuff, and then she, like, sleeps in their beds, and then some homeowning bears come back, and she realizes that it was the bears that inhabited the house the whole time, but they allegedly live happily ever after.

And while Goldilocks might have been, like, legally breaking and entering, we'll view this in the maybe more childlike lens of this is someone who approaches a space that they believe, maybe misled, but they believe it's been built just for them, and she knows what she wants. Like, Goldilocks knows she wants this perfect porridge, and nothing feels just right.

And when I started making my first doc site, I felt this frustration of knowing what I wanted in my documentation. I wanted something that was super maintainable, kind of offload that mental load to the documentation framework. I wanted it maintainable. I wanted it to be Markdown-based. That would be super useful. I love Markdown, and I wanted, more than anything, this documentation system to give users everything that they need to be successful when they're using my packages. And because we're living in a fairytale land of docs utopia, it would also be really cool, like, if this tool didn't import a lot of weird things, I didn't have to wrangle a bunch of other dependencies, and overall, I just wanted it to allow me to have more time to focus on the content of the site, rather than, like, fiddling with all of these weird settings.

Trying existing tools

But the tools at hand were feeling not just right. The one thing I knew I needed in this documentation system was long format docs, and that's kind of to tell people the when and the why to use different packages. And the R world really has this down to a science. In fact, they have a whole name for it, called vignettes, and vignettes are great for telling a story. It gives context to some sort of, like, small workflows that a package is useful for doing.

And I tried to use a tool in Python that was really excellent in long format documentation. This tool is called make docs, and it was great to get started. It felt pretty ergonomic, and that was really exactly what I wanted. I could write some markdown. But I realized, actually, my users cannot live on vignettes alone. While people were fantastically prepared for, like, when to use docs and why they should be running this better code, there wasn't much intuition on, like, how to use certain functions if I only had vignettes and, like, examples. So this documentation tool was not just right. In order to get this API docs, I'd have to wrangle a bunch of extensions, and it just didn't really feel like something that I wanted to do with this docs tool.

So to fill this gap, I knew I needed function references, and this is showing people how to run code. And when I write my Python scripts, I write it in a pretty structured way. You know, you define your function, you have your function name. We've seen a lot of code over this conference, and you probably are almost, like, just ingesting this naturally. You know, there's these parameters underneath. You see these three triple, like, quotes. There's a line describing the function. There's parameters. There's a line underneath it. And then there's some, like, special indentation. It's structured in a very, very particular way.

And this is on purpose. This is because there are tools out there to programmatically export this doc string into beautiful, beautiful websites. Here you can see I used a package called Sphinx, and I could get my function name, the package name, you know, Vennifer pin metrics in the yellow. You don't really have to read it if you don't want to, but there are, like, the parameters there as the function signature. In green, we have the description of the function, and we have the parameters with the description of each parameter in blue. It's kind of a lot, but it's all there. I didn't have to write this twice. It just automatically got created into this website, and it was super cool.

But what was lurking underneath the surface of Sphinx was the fact that this was all in restructured text, and I had to write a lot of custom Python code to maintain this in the long term, and that did not feel just right for me. So although I had this doc site up, it felt maybe not ideal for me as a developer, as someone who didn't want to spend a lot of time developing my docs. So my quest for the just right docs framework continued.

Discovering quartodoc

And right when I felt all hope was lost, I would never have the perfect porridge, one of my colleagues, Michael Chow, who works on the Quarto team and kind of the MLOps team at Posit, tuned me into a project that he was working on. And quartodoc is a Python package that generates a function reference page for your Quarto website. And I am maybe, like, this is a self-appointed title, but I am Quarto's biggest fan. I think it can do no wrong. It is beautiful.

And I was able to have Quarto and Python docs at the same time. I was able to get beautiful API docs. We can see here with quartodoc, I didn't make any weird translations. This is just kind of out of the box. I have the name of the function that we saw before, this function signature in yellow, the same description, and I had a really nice markdown table of parameters. So I have the name, the type, the description, and the defaults for each of the parameters. And this was really cool. It felt like a little bit better way to ingest things rather than a bulleted point list. And I was really happy.

So people had these API docs, and they also were able to write their long format vignettes in Python now. And it's just markdown, but Quarto is built to have a really pleasant experience for writing technical documentation. And quartodoc was able to let me lean into that and really put these tools together.

quartodoc is a Python package that generates a function reference page for your Quarto website.

Getting started with quartodoc

So as a developer, I kind of felt like I was winning the lottery. It was quick to get started. I pip installed quartodoc, I did a little bit of configuration, and then I ran two commands, quartodoc build and quartodoc preview. And I feel like people get a little scared when they have, like, a little bit of configuration in the middle of the slides. But I promise you, these 17 lines of code will give you a fully functioning doc site. It kind of feels like magic.

And because this is YAML, it looks and feels like a bulleted point list. Which means it's super literate and maintainable for me later on. I'm not trying to wrangle scripts. I just get to put this in the Quarto.yaml file. If you're a Quarto user, you realize this is just the same file you're using anyway to create your site. It's just got a few extra lines at the bottom.

And if we want to go line by line of how to make a quartodoc website, you're going to start with a YAML chunk that's quartodoc specific. And if you want, you can have this sidebar that quartodoc will create for you. And you actually don't really have to touch this. But if you think about how you're ingesting documentation, you kind of have the function here and a sidebar of all the other functions on the side. This is when it is being created. You also have to say the name of the package you would like to document. It needs to be imported into the environment you're running this in. And then you get to create your index page. And this page is kind of like the landing page for all your functions. You're going to give it a title, like my golden functions. And a description that they are just right. It's going to be a scrollable list of all your functions separated by however many headers you would want.

And underneath these headers, you list what you would like to document. Here we can see I have my house function and then my Goldilocks class. And actually, if you just wanted to list your classes out, that would be just fine. But if you want to have a less nested structure and see every single attribute and method in the sidebar, you can list them out this way for a little bit of a flatter architecture. If you just wanted to put Goldilocks alone, if you clicked on Goldilocks in your sidebar, it would put all of these attributes and methods automatically in your documentation. So you're not missing out on information.

What you get out of the box

And with that little bit of configuration, I had a great site that looked something like this. You can see on the left-hand side, we have our sidebar. We have our index page here with my title and my different pieces of version deploy. And there's monitor below, if you're familiar with the vetiver framework. And because this website is built with Quarto, you get all of the Quarto goodies just in there, which is amazing. You get to write in Markdown because you're getting all of these reference pages generated as Quarto Markdown files. There's accessibility built into Quarto. You have high contrast highlighting. You have accessible sizing and easily customizable alt text for any of the images inside of your website.

And Quarto is able to grow from one page, like how I started with vetiver, to larger sites. And while vetiver is a pretty simple and straightforward website, because I don't get a lot of joy in building websites besides this one, you can take it from other people in the Python community. Ibis is another Python package that just moved their docs to the quartodoc framework. And they leverage a lot of this extensibility that's built into Quarto and quartodoc. You can see that they've added some really fun highlighting. They have this bolded kind of explanation of the different columns. Their types are in gray. We've got beautiful green and teal coloring to match the rest of the vibes of their site. And mostly they were just able to develop inside of quartodoc and make something that felt just right for them.

Benefits for users

But we're not really building docs for just us. We want users to enjoy these things as well. And quartodoc gives you something that feels predictable. So when I was actually building this talk, I talked to some data scientists, and they knew R, and they were learning just enough Python to be dangerous. And they had said one of the hardest parts of learning Python was they would go into these websites and they would expect to see, like, you know, that really structured, like, format. You know where to go to find everything. And they're like, where am I supposed to go in these Python sites? I feel lost.

And quartodoc leans on some of these lessons that are learned from pkgdown. And it starts people off with kind of this index page, and then you have your reference page, and you can build on from there. So developers like me don't have to worry so much about, like, how to drive SEO or how to organize this website so people can find things. Like, I can just focus on the content so people can enjoy the package.

So developers like me don't have to worry so much about, like, how to drive SEO or how to organize this website so people can find things. Like, I can just focus on the content so people can enjoy the package.

WASM and interactive examples

One of the last and most magical elements of quartodoc and the Quarto kind of ecosystem is you can use WASM elements inside of these documentation systems kind of easily. Easily in a WASM way, which is already kind of magic. So the Shiny for Python docs are also in quartodoc. And if you scroll on to almost any one of their functions and go to the example at the bottom of the page, there is a fully functioning Shiny app running in the browser. And we've got great people in the session right after me who are going to explain a little bit more about how this wizardry works. But for now, I just want people to appreciate how powerful it is that you can be on a documentation system and be running code, learning this on the fly, you know, editing this Shiny app without even having to download it. It is an incredible learning experience for users.

So I have been a delighted user of quartodoc, and I have to say a really heartfelt thank you to Michael Chow from the Quarto team at Posit, who's done all this amazing work to make my life easier. Goldilocks in our lens is a hospitality story, where Goldilocks, she felt a little lost. She found a home in quartodoc that felt just right. Thank you.

Q&A

We have caught up, so there is time for questions.

Can quartodoc do parameter documentation inheritance, like the Roxygen inherit params tag? I would love to have whoever asked this question talk to me and Michael Chow later, because I don't know enough about the R side to understand that entirely.

Does quartodoc support cross-references the way you very awkwardly can do in Sphinx slash restructured text? There are cross-references, and it does so I actually I had a Sphinx website that you guys saw before I had a quartodoc website, and it really did feel like a good, like, drop-in replacement, like I thought I was going to be feeling like, oh, no, I'm losing stuff because Sphinx it feels like it has everything, like the Sphinx ecosystem is huge, like I'll just install 17 more extensions, and finally my doc site will be perfect. But I felt like making this jump, I didn't lose out on any of these extensions. I got what I needed kind of out of the box with much less custom Python in my package.

Can you generate a default configuration YAML file automatically using some equivalent of the use this package? There's not, but you can copy and paste from the quartodoc website, which is not a super satisfactory answer, but it is an answer nonetheless. And I also have to plug Michael Chow, the incredible creator of quartodoc, is always looking for people to help make new package documentation, so if you open an issue on the quartodoc GitHub repo, allegedly he will help you build your documentation site.

That seems like a great place to stop, so let's thank Isabel and all of our speakers in this track once again.