J.J. Allaire - Keynote: Dashboards with Jupyter and Quarto | PyData NYC 2023

Transcript#

This transcript was generated automatically and may contain errors.

I think most of y'all are very familiar with our last keynote speaker. He's one of the trailblazers of the world of data science and open source software. As the founder of RStudio and the creator of RStudio IDE, he has had a profound impact on data analysis and software development. His work extends to the development of R Markdown, Shiny packages, and, my favorite, bridging the gap between R, Python, and TensorFlow .

So, today he's here to share his insights on the exciting Quarto project that is a Jupyter-based scientific and technical publishing system that's set to transform how we share knowledge. So, without any further ado, with a warm round of applause, please welcome our last keynote speaker of PyData NYC 2023, J.J. Allaire.

Thank you very much. It's really, really exciting to be here. And I'm here to tell you today about something new, actually. Something to tell you about today, we're announcing today. So, that's kind of always fun to do, too, at a conference.

And by way of a little bit of introduction, actually, most of this has been taken care of. I'm the founder and CEO of Posit, which you probably have never heard of. But you have likely heard of RStudio, which is kind of, we renamed RStudio Posit last summer. And the reason we did that is that we wanted to take the things, we had a lot of success working in the R ecosystem, but we wanted to do a lot of the things we had done in R, do them much more broadly, to do work that affected all data scientists and all scientists.

And so we wanted to become a multi-language data science tools company. And we are early in that journey. I'm going to share with you today a bunch of the work that we've done in the Jupyter and Python ecosystem, but we're going to be doing a lot more of it in the future. And I'll talk a little bit about that today.

I'm going to give you a brief orientation to what Quarto is, because you likely haven't seen or heard of Quarto either. Then I'm going to spend the larger part of the talk telling you about Quarto dashboards, and then just a little bit of a footnote about kind of our nature of Posit as a company, and how we're approaching Python.

And typically you can think your mental model is like a cell in a Jupyter notebook maps to a card, and it presents some data. And that data can be of all different types. It can kind of be anything that you can render from a Jupyter notebook.

Dashboard components in detail

And so I want to underscore that all of the components that I'm showing you here can be authored and customized either within the notebook UI, which in kind of my big demo, initial demo, I showed you using a notebook. Or you can also use the plain text QMD workflow. I'm going to go through some of these in detail. And in this, in sort of enumerating them, I'm going to use the plain text version. It's a little easier for me to show you all of the structure. It kind of fits more compactly. But just know that everything I'm showing you in plain text can also be done in a notebook equally well.

So navigation bar, it's straightforward. You can give it a title and an author. You have pages in your dashboard, and those become pages on the navigation bar. And then if you want to put some nav buttons to provide social links or even just arbitrary custom links, those are easy to add as well.

Sidebars, we'll get into this. They're kind of essential in more interactive applications. And you can have sidebars. So here, these are the headings. And this is a Jupyter notebook cell, basically, in plain text. So just whenever you see this in these demos, that's a Jupyter notebook cell. So sidebars you can have inside a page. So in this layout, you have a sidebar with a column next to it. And that's just like a sidebar that sits inside a page.

But you can also have a global sidebar where the sidebar is available across all pages in the dashboard. So if you want to do a global sidebar, you put the sidebar attribute on an H1. And now that will be global to the whole dashboard. And you can see we've got a plot page, a data page, and a global sidebar. So as I switch pages, the sidebar will remain in place. More useful for interactive dashboards, but also can be useful for static dashboards, for stuff you want to let the user clear away optionally.

So layout, you couldn't see this that well in the notebook version, but those row headers actually had a height attribute on them. So when you define a row or a column, you can give it a height or a width, which allows you to, in this case, focus the user's attention on the main chart. That's the main thing I want you to be looking at. And then say these charts are more tertiary or secondary. So here you just use the height attribute to drive that.

Now going back to the initial set of examples we used, you can see how that's used here. This has three rows, and each of which has three items, one, two, and one, just a single item in a row. But this is a row-oriented layout for the dashboard. So that's the default. You can also switch the orientation to columns, and in some cases that's more desirable or useful. And then with columns, you can set a width. So here's an example of, like, you've got one chart, or whatever this is. It could be just a data table or anything. Here, it's taking 60%, and then my other charts are here.

This is a column-oriented dashboard layout, where I've actually put the value boxes here, and I've got columns, one of which has a tab set inside it. So it depends on how you're trying to present it, how dense the information is, how much you want to focus on one versus many charts. But rows and columns are very flexible, and they can nest arbitrarily inside themselves as tab sets. So you can pretty much build any layout that you can imagine.

And specifically tab sets, you just say, this row, rather than have two side-by-side charts, I would like this row to be organized as a tab set. And now you can see here, instead of having two separate boxes for chart two and chart three, it's just going to make tabs for them. And in that case, you're not required to put titles on cells. Normally, you can have them be untitled and just show the visualization. But in this case, you want to put a title, so the tab set has a title. So you do that by adding this title option comment.

And this is an example. Again, I think we saw this one earlier. Here's the use of tab sets inside a dashboard, so that's pretty straightforward.

So talk a little bit about some of the specific data that you'll put in a dashboard. Plotly has a lot of reasons why it's an excellent library for dashboards. One reason is that dashboards, you have no idea what the size of the browser is going to be, including it could be a mobile browser. And you really want the plots to size themselves intelligently to the space that they're painted in. Plotly does a very, very good job at that. So whatever space it is in, it will automatically be laid out.

I think Bokeh does a pretty good job at this, too. So that's one advantage of Plotly, is that you get this dynamic sizing behavior. And the other nice thing, I guess this also applies to Bokeh, is that Plotly can give you interactive features, even though you don't have a backend.

You can also use other Jupyter widgets, any Jupyter widget. So in this case, I'll show you my PyLeaflet. Here you can see that this is going to look better if I can full bleed to the edges of the card, and I have padding around it, so I set the padding option to zero. And you can use regular static, like matplotlib, seaborn graphics. When you do that, you have to pay a little more attention to the size. Like, okay, this plot's going to be a little wider than tall, so maybe I want to actually emit it that way. They'll get sized automatically, but it's not as fluid as the JavaScript-based plotting library. So there's pros and cons. One of the negatives of JavaScript-based libraries is they embed all the data in the webpage, so if you have a lot of data, you really kind of have no choice but to use a regular static plotting library.

For tables, you can use the tabulate package to create a markdown table from many, many types of Python. Data frame, matrix, many data structures are supported by tabulate, and that just creates a markdown table. You can also use the itables package, which gives you a sortable, filterable view of data. So both are supported.

Value boxes we showed a little bit. The value box, we basically, right now, we'll actually eventually have a package that you can just call functions to make the value boxes for now. It's a little bit of a workaround. We just print a dict with the value box attributes from the cell, and you'll see what it looks like here. But I wanted to focus you in on the fact that these values can be dynamic, so in this case, depending on the value, the icon would change or the color would change. So in this case, we're saying the price is up, so we're going to put an up arrow and green, but you can imagine if it was otherwise, you'd use a different color. So that's a dimension of value boxes that are also dynamic.

And then, I think very importantly, you can include arbitrary text content in dashboards. Here, you can see I basically just put a... This is the plain text version of just a markdown cell. So in a notebook, I would just make a markdown cell, and it would work just this way. In plain text, you delineate the content with this carved div, and then it just goes in alongside your plot. So it's very easy to add narrative and analysis and things like that in alongside visualizations and tables.

And the last piece of dashboard components is all cards have this expand icon by default. So some of these visualizations can get kind of small, so when the user hovers their mouse over the card, they can expand it. In this case, it goes full screen, and they can see a more zoomed-in view of the plot. You can turn that off if you want, but it's on by default.

Deploying dashboards

Okay, so that's kind of like whirlwind grand tour of all the different ways you can... Components of dashboards, how you present data, how it kind of maps to markdown headers and notebook cells. But I want to talk a little bit about different ways to deploy dashboards. Most of what I've shown you right now so far are just static HTML pages. So they actually can be deployed to any web server or web host. They don't require a Python processor or server, so they're very lightweight, very scalable.

So that's the basic kind of... Static dashboard, if you can imagine, like, the underlying data won't ever change. If I have a dashboard about, like, the 2016 election, the data's not going to change. I don't need to keep re-rendering it. It's just the election, or whatever. There's some things that's historical data that's not going to change. A single static rendering of it can be fine.

But then, obviously, you want to, in some cases, have scheduled dashboards. So, you know, use a cron job or any other kind of job manager to basically say, I want to re-render my dashboard every day or every hour or whatever's appropriate. You can also have parameterized dashboards. And I'll show an example of that in just a minute. So you can say, well, you know, the dashboard will... Depending on the parameters, it might target a certain region, or it might target a certain assumption. And in that case, I want to have a different version of the dashboard. So you can declare parameters in dashboards, and then have those create variants of the dashboard.

And that works well with both static and scheduled dashboards. And then finally, you can have an interactive dashboard that has a server backend that uses Shiny, which is another open source project from Posit. And I'll describe how to do that. But when you do that, it requires a server for deployment.

So I want to speak briefly about the idea of a parameterized dashboard. We use the same kind of technique that Paper Mill uses to declare parameters. So in Paper Mill, you basically add a tag to a cell called parameters. And then you declare a bunch of variables in there. And then once you've done that, when you render, you can give it this parameter. You can have it as many as you want. And it'll vary the dashboard based on the parameters. So in this case, this is a dashboard for a single stock, but you might want to render it for different stocks, and that parameter allows you to do it. So that's parameterized dashboards.

Interactive dashboards with Shiny

Interactive dashboards are a whole different thing, but they're a really pretty natural extension of all the stuff I've shown you with static dashboards. Again, these build on Shiny package, which is another Python open source project by both Python and our open source project from Posit. Shiny is kind of its own thing. It's got its own whole application framework and way of doing things. In this case, we're just going to embed Shiny components inside a Quarto dashboard.

This is a super simple Shiny dashboard. Hello, Shiny. You can see on the right, it's just a single plot, and it's got three inputs. I'll walk you through what the code looks like for this. You can see we're still using markdown headings, the kind of dashboard syntax that we talked about, but we've added this server Shiny to indicate that this dashboard should have a backend. First, I'm going to call a bunch of functions from the Shiny package to define three inputs, and those map here to variable distribution and whether to show run marks. And those are just functions that say, here's the variable, I'm declaring it, here's the label I want to have for it, and here's the choices to offer.

I'll put those in the sidebar. And then over here, we define a function that creates a plot. This uses a Python decorator. You'll see in Shiny there's a bunch of decorators, render decorators, and what's noteworthy about here is you'll see that nowhere in this file do we actually call this function. Using the decorator basically instructs Shiny, please call this function whenever any of the inputs that are referenced by it change. Shiny will automatically call this function when it needs to. And you can see in this line here, we're actually referencing those inputs defined above. So whenever you see these render decorators, you're just saying, set this up to be called whenever things that it depends on change. So that's a super simple little Shiny.

You can do quite a bit more sophisticated Shiny dashboards. This one steps up a little bit. It's still pretty simple. It's got a couple pages. It's got more inputs. But all the stuff I showed you with value boxes and tab sets and rows and columns and tables, you can all do that interactively with Shiny components as well.

In terms of deploying these things, there are quite a few different ways to do that. And they're basically standard ASCGI Python apps. So you don't need anything special, per se, to deploy them. You can just call Shiny run. As long as your server environment supports WebSockets and StickySessions, then you can serve it. That said, there are some easier ways to deploy that take care of some of the networking and scaling and things like that. So there's a couple cloud services that support Shiny deployment. ShinyAppsIO, that's one that Posit has created, and then HuggingFace also supports Shiny apps. And then if you're running your own server, we have an open source server that lets you deploy Shiny apps. We also have a commercial product that lets you deploy Shiny apps. And then, of course, as I said, you can completely roll your own. You don't need to use any of these products. They just kind of make some things easier.

So I did earlier say that whenever you use Shiny components, you're going to need a server. That's not entirely true. We actually have a way to deploy these dashboards that use Shiny completely serverless using PyDide. So we actually have an example up on our site to show you. You can use Python, use Matplotlib and Pandas and everything, and actually have it all run inside the web browser. And then there's no server in that case. And so this retirement simulation example we have shows you how to do that, and this is the app. It's just pretty simple. It's just got two scenarios, and you can change them, and it reruns the simulation. But this is all running inside the browser. So there is a serverless option that you have.

Posit as a company and Python investment

All right, so that is now kind of completed, telling you all about dashboards. I wanted to conclude by giving you a little bit of insight into Posit, the company, what our philosophy is, and kind of the work we've done so far in the Python space. So we were founded 14 years ago as RStudio, and our mission was then, and now, and will be, to create open source software for data science. We're actually organized as a public benefit corporation, which means a bunch of things, but mostly it means that our mission is encoded into our corporate charter.

Our officers and directors actually have a fiduciary duty to pursue this mission, and that when we make decisions, that traditional corporations, like a normal C corporation, really needs to narrowly consider the interests of the shareholders as kind of the only thing that really ultimately matters, and public benefit corporations can be much more diverse. They have their mission, they have the community, they have their employees, they have the shareholders, they sort of look at all the stakeholders. So that's our approach, kind of at a corporate level, and we've actually gone to pretty significant lengths to try to make sure that never changes.

So we are an independent company, and we are committed to not ever selling the company and not going public, so we are always going to be an independent and private company. To that end, we have control of the company entirely inside, so outside investors do not have a way to force us to do anything. All of the main control is within people inside the company. I think that allows us to have an imperative that isn't growth at all costs, which is, I think, the imperative of most startup companies, even most big public companies, but try to build something that's sustainable and building open source software and providing a trustworthy foundation for users of open source software for a long, long time. A hundred years might be a little extreme, but as long as possible, a hundred years aspirationally.

but try to build something that's sustainable and building open source software and providing a trustworthy foundation for users of open source software for a long, long time. A hundred years might be a little extreme, but as long as possible, a hundred years aspirationally.

And as I said, we want to be durable, we want to be trustworthy, we want you to understand that our core motivation is to create open source software and make it incredible and easy to use and let people use code to do data science and make it all open, and that's kind of our bottom line.

So we've begun recently, actually probably starting about three years ago, to work on a lot more Python open source projects. So Quarto we heard all about today. There's a separate project called QuartoDoc that helps with creating package documentation for Python packages. A bunch of packages are using it now. Ibis just switched over to using QuartoDoc.

We've also got Shiny, which you heard about today. So Shiny works inside Quarto, but also kind of works Quarto standalone. And we have some other packages you may not have heard of. Vetiver for kind of model ops, Siuba is a data manipulation library, and Plotnine is a data viz library. So we've got lots of projects, we've got a lot of investment in these projects, and we'll have more projects over time. We're also very much involved with NumFocus, and we're working closely with the Jupyter community on some standards for scientific notebook publishing in an effort called NotebooksNow. So we're doing a lot with Python, we're going to do a lot more in the future, and we invite all of you to help us figure out what the right tools are, what the right path is, what things should work well together, and I'm really excited to get going on this.