
Wrangling data for a Shiny app in Python || Michael Chow || Posit
Shiny makes it easy to build interactive web applications with the power of Python’s data and scientific stack. Learn more about Shiny for Python: https://shiny.rstudio.com/py/ Check out our interactive Shiny for Python examples: https://shinylive.io/py/examples/ Content: Michael Chow (@chowthedog) Producer: Jesse Mostipak (@kierisi) Editing and Motion Design: Tony Pelleriti (@TonyPelleriti)
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
So Dave Robinson did a kind of like live analysis of this like dog breed data and he made a Shiny app out of it. So maybe I can walk through what it looks like in PyShiny and just try to translate some of that to PyShiny. So I watched him work with this data so I know a little bit about it and for some context there's I think there's a little bit of cleaning that has to happen and then some reshaping.
The interesting thing so is I understand there's three pieces of data you have these breed traits so this is um for each breed of dog uh it's like ranked or it's I think it's rated on a number of things like how good are you with young children apparently retrievers are really good Norwegian Lundhunds are not not the best. Then there's the trait description I think this is a pretty wild one uh this is um just written out description so and I actually love this like um so for affectionate with family if you get a five it means you're lovey-dovey which I find to be just great so a yeah so retrievers lovey-dovey and then breed rank all is um this is a little bit different this is like from 2013 to 2020 or 2020 um rankings of the dogs basically over time what we can look at is maybe like comparing dogs across traits and working on like a shiny app that lets you kind of like because there are a lot of dogs here if you look in the breed traits so it'd be nice if a user could choose the dogs they want to look at and be able to compare them in a plot.
Introducing siuba for data wrangling
There's a little bit of wrangling here's a weird thing I can either wrangle it in siuba or pandas for me siuba is the most fun so um and I'm pretty used to it so basically this is a port of anything from dplyr to python and I've kind of come to rely on it as a thing that just kind of like wraps pandas and does a little bit on top of pandas to make life a little easier so the way it works is um pretty similar to dplyr with a couple twists so um for example like if I want to find a dog that's not very affectionate uh with family these columns are crazy I might have to fix that but um I could do like where it's one you know so okay so like the anatolian shepherd dogs apparently they're very uh like independent.
So the key with siuba is that um like with dplyr it would look like pretty similar you'd have a pipe and then you'd have like this just by itself like this so the big differences are um now the pipe like a greater than greater than sign and we're using like an underscore rather than referring to the columns um directly uh this is just because it's way easier for me to just keep thinking in dplyr than to kind of switch um and it's not that different from pandas pandas would be like breed traits affectionate with family and then you'd do something like this there are a few ways to do it um so it's kind of like there's always like a pretty simple like pretty the translations to pandas aren't crazy but it just cuts out some writing basically.
translations to pandas aren't crazy but it just cuts out some writing basically.
Reshaping data from wide to long
The other useful things and i'm cheating because i saw dave work this with this but um for reshaping uh it'd be nice to tidy this so that um a lot of these columns become uh to go from wider to longer so it'd be nice if like a lot of these were a column called measure and then these things were a column called say value um one of the big things we have to do though is that this like these are all numeric so there it makes sense that we could reshape them we've got to drop things like this that are use a different uh uh kind of measurement.
So i'll do select um and i'll use the minus sign to remove things actually first i'm gonna i think clean up these column names so there's a pi janitor library which is nice that can clean names or i guess we can um see it'd be something like it'd be something like uh i always forget how to do this i think you could do something like this is crazy but um i think this will do basically to sneak case let's see did something bad columns yeah so this is pretty gnarly pandas code but the trick is that um this gives you back uh index of columns and then indexes and series and pandas have these string methods so you do like string dot lower and then you can kind of like keep going.
So this is i would probably just use in other situations maybe pi janitor like it's nice in the tidyverse there's just a clean names function um to get rid of a lot of this but i'm just gonna um you define it here oh that's bad uh just gonna put it here i'm i think i'll just mutate it uh because life's too short so um that should do it and i'll just return it um and uh so this black i have black formatting setup which is kind of weird i might turn it off.
um let's see do you want one okay we're back all right sorry so that's weird um so that's what's doing a lot of this formatting um i don't really know the weird thing between r and python is i'm not really sure how to r it's kind of nice you can like quickly one funky thing about python is to do the pipes with nice format you kind of gotta um use these parentheses around them um so i'm just gonna pipe this uh using this panda's method.
so it's great so now we've got nice formatted names um and all of that's just so in this select i can now uh just drop using like code type and drop code length.
Okay so i think now we have everything and we can reshape so what i'll do is um gather it uh which so the first argument is the name of the new column that all of these will be put in so these names and then the measure column and then i think that we just say everything we want so negative means we want everything except for the breed column to be reshaped.
So now we've gathered me now we've taken all these columns and basically like right i don't know i don't know how to use human words for that but um so now each each measure is right here so each of these cells and now the columns are in this measure column um this will be useful for the app because i'm gonna call this traits long i guess um is that this makes it a lot easier to select things um now we can just filter to keep specific columns or filter to keep specific uh breeds so i think that's the big dplyr thing is like before we could filter by breed or select a column the beauty of gathering is now we can filter either of these things breeder um measure i'm going to rename measure to trade so all right great.
Visualizing with plotnine
So now we have this data um we can visualize it i'm going to use um plot nine so plot nine is a port of uh ggplot2 um it's super nice i think it's been around for a long time and it's a super faithful port so basically all of this is me trying to like squint my eyes and produce uh what seems like tidyverse code um so all right so um let's see let's just traits long uh ggplot so i'm going to uh actually we need to filter it probably so let's get just maybe a couple breeds.
So i'm going to use the panda's method is in so that's like get it from this list so let's say labrador retriever scared french bull dogs i'll double check that worked oh something bad happened let me see i wonder if this is a little bit um need to somehow see it as a list i think that okay so this is a weird thing uh this is a great data wrangling problem uh i don't this is a character that appears as a space but is it uh i don't sometimes you can google these things and it'll show people will explain like what is happening but usually just string replace them so um let's see i i'm just gonna do it yeah i'm just gonna call it like remove or replace space or something.
Like a column wait i don't care i'm just gonna put in the mutate okay so uh let's put it here for now just to see um so we selected we gathered let's just replace it here and see how this goes date um three equals uh string replace uh i think it's just this character and then let's see if this works okay so yeah visibly like no change but i'm guessing this will let us let's see yeah great that was a real scary issue uh cool okay so this got it.
So i'm gonna filter maybe um i'm just gonna keep say like affectionate family and good with young children okay i'm gonna so um can black format this um it's using this um jupyter what's it called jupyter code formatter library um cool so now we've got two traits two dog breeds we can plot it so you plot you plot uh on the x-axis probably trait y-axis probably value and then i'm gonna um make a bar chart and then i'm gonna make breed its own plot let's make that oh something bad happened uh oh i'm sorry geom bar counts up the values geom call makes a column okay so this isn't super exciting they're fives on both of these uh but this is maybe a nice example because there's a lot of room for improvement.
So uh one thing i like to do is um you fill the breed or the trait um this just kind of like recodes this information but makes it easier to see and then um the other weird thing is this text is horizontal so we can um change it in the theme i think it's like axis axis text x and then this kind of tricky ggplot thing element text angles 45 um and then the last thing this is probably the most common thing i do in ggplot is um notice that it it kind of be nice if this end character like matched up with this and i think that's just h just equals one i don't even honestly these this is like this very snippet i use so much that i just like reflexively slam it every time um so okay so we've got plots uh we did like a little bit of cleaning a little reshaping.
Building the Shiny app
Um i think so this is something dave did i think it's like perfect like this would be a nice kind of like starter for a shiny app like we did like a little analysis we made our plot um but now it would be so nice if we basically could um select the breeds and select the traits there's this weird thing like if you watch people build shiny apps actually i think the most intriguing part is they often have like analysis code somewhere and then like shiny app code i think this is actually one of the hardest parts of starting with shiny is the workflow like having your analysis on one side and your app on the other.
i think this is actually one of the hardest parts of starting with shiny is the workflow like having your analysis on one side and your app on the other.
Type juby text no i don't let me install this really fast uh what i often end up doing is um text so this is the code um what i can do is um say juby text one thing i like to do sometimes is i just put the shiny app at the bottom of the notebook so um and the way i do this is um you can pair the notebook with something called light script and notice that when i save it it produces a python file so now what happens is like basically i just want to keep exploring but i'm gonna tuck a shiny app like down below i don't know if this is a good idea or if it'll even work but um and i'm gonna add a section like exploration okay so now the trick is that now i have my exploration code up here and jupyter lab has this nice um table of contents so i just marked like oh this is my exploration code this is my app so i can jump between them and then uh what i'm gonna do is i'll just scaffold out like a really tiny shiny app and then i'll just run the whole script um because like life's too short i just i i kind of want to just do the exploration in the app in the same place.
Uh okay so so the key is like this doesn't do anything really because shiny needs the cli the command line interface to run but it's kind of nice like it's all in a notebook and then what i'll do is um i'll run it separately uh so you could also do this in the jupyter lab terminal that might be a good i should down there but um dog app so i'm just gonna run oh i don't remember how the cli works uh shiny run i think i'll keep looking at the help docs uh so the thing is i i want to run shiny run but then i also want to run this reload so that it auto reloads the so shiny run analysis.py and reload okay i might need to also set the port depending on okay so basically what's happening now is it it executed this notebook so just just to back up this notebook through jupytex is now saved also as a python file every time i click save it syncs with the python file um shiny cli is just running that file and the reason that's useful to me is i'll just put like a lot of exploration code in here and i'll define the app down here and then i'll just check over here to see like what the app looks like.
So all right and it's kind of convenient because a lot of shiny apps i just dump this stuff straight in you know i can like kind of like dump it in and then i can clean it up and move it into its own file if it's useful so like for example we already have traits long so i could just even put this in let's do like a render uh let's see return like traits long i think that uh so where we are right now is that um rendering tables is still a little bit is in the works but um so right now it's rendered as a ui uh let's just see what happens why don't i just i'll drop in the plot instead so uh let's see so i'm just gonna dump in this plot code uh just paste it in.
Um put this here uh and i think this should just show up i'll do render plot output plot i'll change this this isn't a table anymore this is like a trait graph this trait graph see i love it still has this rogue input so um i'm not seeing anything and that tells me probably that there's an error or something so i'll look here i don't see anything it's probably something i did i think the i didn't return it all right so this is the graph we have this kind of useless input now we have this like hard-coded thing um what i'd probably do next is i'd like swap out these hard-coded things for a like dynamic input.
Wiring up interactive inputs
So um you do input ink it select ties and i'm gonna call this traits and so this is the label that'll show up on the input and then uh so many options choices here so we can do is multiple equals true choices equals what we could do here is we could actually set these as the choices to start just to make sure things wire up um let's just see that it worked cool so the choices are here they're not wired up yet but this is a kind of a nice i think for me sanity check that it's like ready to go and so now i'll just swap out this for um input dot traits so this thing here just to link it up all right so yeah so the move was kind of like i set the choices i copied the choices from here to here and then i like swapped in the interactive piece uh cool so now we can see like i can toggle it which is great.
Um this is still kind of boring because i just hard-coded two choices so the a nice move here is you can actually just hard code the choices based off the data so if i go to like traits long dot traits unique let's show this it's trait okay so this just gets me each unique trait um and then the last thing i need is shiny i think wants it as a list so i'll just use this panda's method to list um so what i'll do is i'll probably do like um options traits or something trait options trait so now it's all of the traits um data so now we can see like for a french bulldog or retriever um different uh things i don't want to make this little bigger um nice so we have the traits wired up uh let me go ahead and try to um set the breeds now so so uh i'm just going to do the exact same thing but drop in breed so i'm going to put this trait maybe and paste in the exact same thing and then because it worked out okay for the other i'm just going to jump straight to it input dot breeds so this is not right uh so i did something bad let's see oh i put the traits here so options breed yeah that's cool i'm partial to beagles do we got corgis we got we got multiple kinds.
Yeah i love that pembroke what welsh corgis seem a little more affectionate you know cardigan welsh corgi is a little bit too independent i don't know um yeah that's nice i think the autocomplete is great like um being able to search super fast.

