Barret Schloerke - Editable data frames in Py-Shiny: Updating original data in real-time

Integrating editable data frames into Py-Shiny and Shinylive applications streamlines data scientists' workflows by allowing real-time data manipulation directly within interactive web applications. This new feature enables users to edit cells within the data frame output. Using the empowered data frame renderer, we can facilitate immediate analysis and visualization feedback. It simplifies the process of data exploration and hypothesis testing, as changes to the data set can be instantly reflected in the application's outputs without the requirement to update the original data, keeping data scientists “scientists”, not data janitors. Talk by Barret Schloerke Slides: http://schloerke.com/presentation-2024-08-13-posit-shiny-data-frame/ Slides GitHub Repo: https://github.com/schloerke/presentation-2024-08-13-posit-shiny-data-frame Shiny: https://shiny.posit.co/ Shiny for Python: https://shiny.posit.co/py/ Component gallery: https://shiny.posit.co/py/components/ Edit `Data Grid` table cells: https://shiny.posit.co/py/components/outputs/data-grid/

Barret Schloerke

Oct 31, 2024

19 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So, in 2022, Joe Chang had a really powerful statement where he said, R is the best language for Shiny , and I will die on this hill.

And he was quick to mention that Dan Callahan in PyCon 2018 said Python is the second best language for anything. And my name is Barret Schloerke , and I'm on the Shiny team, and I kind of believe right now my job is to kind of change their minds. So, yeah.

History of data frame support in Shiny

Before we get more into data frames and how they're used within Shiny, I want to talk about where we've been so we can kind of see, you know, where we're going or kind of how we're going to repeat history just a little bit.

In 2012, Shiny was created at the end of the year, and about eight months later, they added support for Render Data Table. And it was neat. Like, you need to add in those components, you know, as they're coming in.

About three years later, that data frame support was offloaded into the DT package. We could then leverage DataTables.js and all of its glory so that we're using other people's experience to our advantage.

Let's take a look at what DT can do. With the table on the left, you can select some rows. You can have pagination. You know, this was an amazing advancement, given the initial early days of Shiny. Search bar at the top. And in the spirit of the Olympics that are going on, I'm going to change my ratings. There's no representation of anyone else, but my ratings into Olympic gold medals.

So, filtering is possible. It's just a simple regex, or, you know, a small matching. You can style. It's a little bit of an uphill battle, but it's possible. The API has a lot of options. It's kind of a little overwhelming for me, you know, in today's world. So, I'm, you know, we'll kind of back down to a bronze. Speed is not the best when we get into the 100,000, in my experience. But it is there and available.

But the editing is pretty good. You're very aware that you're able to edit, or what can be edited. And how it sends back values to Shiny is a little wonky for me, but it sends you all the information that it can.

About one month later, our Hanson table came out. Hands-on table. I never know the name, but I always see it as hands-on. It had a really beautiful editing experience. It came to the, like, towards Shiny using hands-on table JS. And that is a spreadsheet first approach. So, we want editing to be the first experience that you have. And it does a really, really good job. It's very impressive.

So, like, booleans are done as checkboxes. There is a select box. You can have your typed text text. You can have your text text. You can have your typed text section. Even dates are done with a date picker. It's very, very pretty in how it was done.

Unfortunately, there is no filtering with the setup. So, you have to do that on the outside yourself. But you could always just redraw the table. Styling is possible. The API is a little bit better. We're starting to separate it just a little bit.

But the speed, when it also got quite large with that HTML widget, it would flicker just a little bit. And you didn't quite, you may catch it in that, like, special moment. But the editing experience, top-notch. It's very, very elegant.

It took a little five-year gap. You know, things improved on their own. And Reactable came out in 2020. And when this package came out, I remember the, by Greg Lynn, and I remember the speed at to which this data table would be rendered. It was, even just with a simple example here, where we say Reactable of the iris dataset on the website, the GIF that's playing here doesn't show you the print speed and the scroll speed, the effectiveness of what Reactable does. And it was really, really impressive.

It used virtual DOM. So, as you scale out this data frame, we may only be having 100 rows in the DOM. And then that makes it really, really fast, even though you may have 100,000 in your data. So, the browser is not overloaded and thinking too hard.

And if you put in the effort, you can get really gorgeous tables in how it's done. A lot of custom work will go in. So, what you put in is definitely what you'll get out. But, like, having a table that can do this and still be interactive is really, really impressive.

So, there is filtering. It's just for the column. Styling is still a little off because we're trying to push the computation to the browser. But that's faster because you don't need to have the server control that. The API is really good at providing all the different options. And it is separated a little bit to not overload the user at one point. But it is not to the point of another package, which we'll get to next. And the speed is amazing.

But this was built for the as a output towards the end. So, it doesn't have editing. So, we're kind of starting to kind of pick and choose which strategies we want to go with.

But I do want to give a bonus medal of examples. And Reactable on its getting started article has 97 examples in one page. So, if you want to know what to do or what it's capable of doing on one page, you can do a command F on the site and you can most likely find an example that'll be for your needs. So, I really want to point that out and say thank you. It's something we should all strive for in our packages.

It's something we should all strive for in our packages.

Again, one month later, GT was released.

GT is a package that I helped work on a little bit on the infrastructure side. But Rich was mastermind of the whole thing. And I think it was one of the first packages that took every component of a table and treated it as a first-class citizen from the beginning. Normally, it's just our blue summary cells or table's body cells or column label. And that's typically what we just land on. But Rich said, no, no, no. We're going to support anything that a table can support. And it's going to be possible from the beginning.

From that, he can kind of give it the same treatment that ggplot does, where you have your structure data at the beginning and you may have your pretty formatted plot at the end. So with the same idea, he took structured data at the beginning and he ran formatters on it to finally get your styled table at the end.

But GT was only done for static plots in mind. And in 2023, he added in the ability to do interactive by wrapping around the Reactable package, which I thought was great. We'll take a limited subset of features, but you still have that GT interface with one extra command. And I thought that was just awesome. This allows us to update the styling and update the API to a gold medal for me. And I thought that was just absolutely brilliant and how that worked out.

Rich, very similar to Reactable, has beautiful examples. I think they're very rich in their design and much deeper than just a toy example. So please do check out his website and his example datasets and how he uses them.

Shiny for Python's data frame support

In 2023, Shiny for Python was released. And about eight months later, like it was at the beginning, a little faster, six months later, we added our rendered data frame support. And if we kind of look at this timeline, it's a little spooky because there's 10 years of experience that Shiny for Python is now having to live up to. Yes, there are the other existing widgets that Carson had talked about, but they're not native. There's a little bit of awkwardness to me.

And so having something native for PyShiny would be an amazing addition to the package. So it feels like you're just a little toddler and you're living up to your big brother, who's got very big shoes to fill, and it's a little daunting. So let's check out and see what PyShiny has done.

We can quickly see that we have filter support built in per column rather than up top for the whole table. Given that it's per column, we can actually have context-aware filtering. Currently, it's only done for numbers, and everything else is just treated as a text subset. So in the country, I will search for stat, and we will filter down to the United States, and that's updated live.

And for population, because it's numeric, we have the ability to do a minimum of one and we have the ability to do a minimum and maximum. And I just kept entering a number in the minimum until the data was shortened up. Oh, sorry, I didn't mention it. This is a toy example, looking at the gap-minder values per country, maximum values. And so there's population, life expectancy, and the GDP per capita.

And what we can also do is we can style these tables. So it's kind of a little hint as to what's going on. Granted, we can sort and filter on the columns, we can select our rows, and then we can, in addition, we can get that selected data out of the render. I'll go into this a little bit more in detail, but that transition of getting that selected data is very, very slow now.

You can see here as well that the columns have highlighted their top five values in each column. So in the case of GDP per capita, it's green, life expectancy I put blue, and population was red. Kuwait has a very high max value for the GDP per capita, so to show off the editing values, I'm just going to change the value by removing the leading one from the number. And hopefully the top five values within the GDP per capita will be readjusted to the new top five values. So in this case, China will get their value to be selected as green.

So we delete the one, and then the styling, because there's been a new edited value, we can then have the Python code recalculate what style should be applied, and it's sent to the browser. So we don't have to have everything in the browser, because I think we're more effective as, in this case, writing Python ourselves.

Comparing packages and the empowered renderer

So putting up our table of Olympic medals, DT, our Hanson table, Reactable, they all have their own strengths, and then GT improves upon the styling and API on top of Reactable. And Shiny for Python. It's not perfect. We're getting there. Filtering, I think, is improved. We have some context to where in the columns. Styling is on the Python side. We can give some JSON that goes to where it needs to go.

API is quite small. We don't have it cover all the options yet. I know this, but we're adding options as we see fit. The speed is similar to how Reactable is done, very fast with the virtualized DOM. So I do want to toot our own horn there, because it is really impressive as to how that virtual DOM works. And then editing, our Hanson table is something to strive for. We have one more cycle coming up, and I'm aware of the contexts in the browser, so hopefully we can get something in there and use our Shiny inputs that we already have.

So, empowered render, this is something that I want to get into, because I think it's really exciting on the Python side that we can leverage class structures. And this is where we take a renderer object and we can add in extra reactive values and methods to this renderer object. And this is a very exciting advancement in, like, the style of coding done within Shiny for Python.

So let's look at the reactive graph of some made-up example, and this is using an artistic form of the React, R's React log graph. So if you are an R user and you want to see how Shiny thinks about your reactivity, please check out the React log package. I know I'm talking about Shiny for Python right now, but please do check it out. I'm a little biased.

So let's say that there is a data frame in your final output there, and for an input, I want to look at the sort information of that data frame. Normally, this is done from the browser. We set an input value and it comes back to Shiny, and Shiny has it available as an input. However, how does this data frame connect to that input? It's kind of magical, it's kind of voodoo for me, but it just shows up and it's untyped and, you know, best of luck understanding it.

Instead, what we can do with the empowered renderer is add these methods, and it kind of allows us to rearrange that reactive graph so that we can add a method of sort, so that you can access the sort information of your renderer. It'll have autocomplete, it'll have typing, you know, that is just awesome.

Instead, what we can do with the empowered renderer is add these methods, and it kind of allows us to rearrange that reactive graph so that we can add a method of sort, so that you can access the sort information of your renderer. It'll have autocomplete, it'll have typing, you know, that is just awesome.

So, for example, for the data frame, we have typed data objects, such as the data that the user provided to your render function, like the iris data frame. We have data view, which would be how is the user currently filtered or sorting in the columns? What does that look like to the user? That way you can take it and replicate it in a follow-up table if you'd like.

You can have multiple input objects, such as what is the user selected for their cells? What filters have they applied? What sorting have they applied? And anything else that we can try to think of, we can add these there. And similarly, we can have update methods. So rather than going to try to find the corresponding update method that isn't attached and just kind of works with what you're doing, it's now fully integrated into your renderer object.

Data frame library support and the future

Historically, we've always supported pandas, because it just kind of got us off the ground, ready to go for our examples. But, you know, Shiny needs to be generic. We shouldn't have predetermined a choice on what data frame to use. So in the Shiny 1.0 release, we added in native polar support to try to help expand our data frame support for the package.

But what I'm really excited about with this is we're adding narwhals support in the next release. And narwhals is a really good compatibility layer between data frame libraries. And it handles all of the complicated mess for us. So we can just take in the data frame that we don't know necessarily how to handle. We tell narwhals, please select these columns and these rows, and then give me the native data frame back. And it's handled for us. And I think that's such a beautiful solution. And as they get better, Shiny will get better as well.

So looking into the future, Great Tables is obviously out for Python, and we need support for this within Shiny. I think it's just such a wonderful package. And we need to have the support because they know the original data and they know the formatted data. That makes a really nice editable experience that we can just update a cell, have it reformatted, and then we send it back to the browser. And styling is really amazing in how easy it is to do within Great Tables.

In addition to this, Reactable for Python has just been ported over. And this is a great thanks to both of these packages to Michael Chow . And so it's kind of up in the air as to what's going to happen. But we could possibly port our cell editing into Reactable. We could possibly port our data into Reactable. We could possibly integrate Great Tables into Reactable, similar to how we did on the R side. Who knows as to what will happen, but it'll be coming up in our discussions.

So, Joe, did I possibly change your mind just a little bit? I feel really, like, arped or something. For those in the back who couldn't see, it was kind of like this. Awesome. Thank you so much. Thank you so much.

Thank you, Brett. One quick question. Someone said, one feature I like about DT is the ability to have the JS callbacks. I get a bit of high charts JS object that render in the browser inside each cell on the table. It's responsible of high shine. Yes. So in the cells, we do have full shiny input output support. So it was really fun to have a cell of the table end up being a full plotly plot. And, you know, it was kind of like trying to figure out what's capable and not if we should. And so it's all supported for shiny inputs and outputs in the cells.

Featured software#