
Ryan Timpe | Learning R with humorous side projects | RStudio (2020)
What should you name a new dinosaur discovery, according to neural networks? Which season of The Golden Girls should you watch when playing a drinking game? How can you build a LEGO set for the lowest price? R is constantly evolving, so as users, we’re constantly learning. Over the past few years, I’ve found that working on side projects is great for hands-on learning - and for me, the more absurd the project, the better. Side projects provide a safe, low-stakes environment to learn new packages and methodologies before using them in work or in production. Sharing those projects can help publicize the package and increase its accessibility, benefiting both the original author and future users. In this talk, I’ll share my experiences with side projects for learning state-of-the-art data science tools and growing as an R user, including how one project helped me land my dream job
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
I'm not nervous. I'm just excited. All right
So my name is Ryan Timpe. I'm a senior data scientist at the Lego group on the marketing effectiveness team. Basically build marketing mixed models in R to help the Lego group optimize all their different forms of media spending. You can find me on Twitter at at Ryan Timpe where I like to tweet about my dog, dinosaurs and all the fun things I do in R. You can also check out my Twitter for exclusive bonus content to this talk that did not make the cut.
So I'm here today to talk to you about learning R by creating side projects. Side projects have been a critical part of my learning experience in R and I want to share some of those experiences with you today. Whenever I'm interested in learning a new package or a new technical concept, I have to come up with a project design for me to learn it and then once I'm comfortable I can begin using it in my work.
So for example two years ago when tidy text came out I used data science to answer this really important question. Which season of the Golden Girls should I watch when playing a drinking game?
Yeah, this is silly and stupid, but that's my point. So R is always changing and we're always learning and so sometimes it feels like a struggle to keep up with all the new technology and packages and over my eight or so years of using and learning R I found that side projects really work well for me to learn these new tools on my own terms. And then once I'm comfortable I can use them in my work.
So I like to come up with this really ridiculous question like the Golden Girls drinking game. And this works for me because I get to learn the new tool using data and a topic that I'm already familiar with so the only unknown for me is the new tool itself.
Learning tidy text with the Golden Girls drinking game
So this is what I did with tidy text if you're not familiar. Tidy text is an package by Julia Silke and David Robinson that can take text and turn it into a tidy data table. It's a really powerful tool that can make possible data from anything, from literature to financial earnings reports or in my case TV show scripts. And I just love the idea of these two professional programmers out there making this really powerful tool for the R community and I'm just using it to get drunk.
And I just love the idea of these two professional programmers out there making this really powerful tool for the R community and I'm just using it to get drunk.
But you know what that worked so I can make this look a lot more like data science with some charts.
So again if you and your friends are hanging out one night and playing the Golden Girls drinking game, which season should you watch to maximize your drink consumption? So if you're responsible and not familiar with drinking games you basically watch a television show and each time one of the characters performs a specific action you take a sip of your drink. So for the Golden Girls when Rose talks about her hometown of St. Olaf, Dorothy talks about her ex-husband Stan, or when any of the women eat cheesecake you're gonna drink.
In looking at this lovely beach colored bar chart on the left I proved with data science, I remind you, that if you're going to play the Golden Girls drinking game and you want to drink the most watch season 5. And that's because in the later episodes of the season Rose talks about her hometown a lot and you're gonna drink about 10 more drinks that season than season 6.
That said, I'm not sure how much of a good idea it is to watch an entire season of a TV show for one drinking game. Definitely not healthy so instead look at the other chart on the right. This shows the cumulative drinks per minute for each of the seasons and here you can see that maybe seasons 4 or 6 are gonna be a better idea for you because they ramp up the drinks quickly in the first 100 minutes. If you watch season 5 you need to watch 400 minutes of the show or 16 episodes to exceed your drink consumption from season 4 or 6.
So this is how I learned tidy text. I hope you're proud of me.
And but I get it, tidy text I use it all the time now in all my projects both at work and for side projects. And I get it you'd like Golden Girls is a very old show and you might not be interested in that. So we can use tidy text for a lot of other things, like the Good Place drinking game. Because once you learn how to do it once it's really easy to repeat.
So here you drink every time Eleanor says fork or Janet reminds someone that she's not a girl, or Janet reminds someone that she's not a girl and here you're gonna watch season 1, or the Jurassic Park drinking game. Literally any drinking game you give me the TV show or a movie and I'll solve it for you using tidy text. And so here we now know that if you watch the two-hour movie you're gonna consume 80 drinks during the course of that movie.
Learning gganimiate with Jurassic Park data
I already had the data because of a different mini side project I did when a different new package came out. So I want to learn how to make animated ggplots with the gganimate package and again using data that means something for me. It's just way more fun. So I spent three days watching Jurassic Park, I paused the movie every few seconds to figure out which characters were in the scene to count all the dinosaurs on the screen and to just jot down all the locations. And I did this all for data science.
So here we have animated character paths of the main characters in Jurassic Park and where they move throughout the movie. We have three maps, we have the globe if they move from the Badlands where they're digging up the bones to the island itself, we have a map of the island in the middle where they moved between the different dinosaur exhibits, and then we have a map of the visitor center with the interior scenes. And in the middle every time a dinosaur eats one of the characters a little skull emoji pops up.
So this is a small silly project, but I learned gganimate this way and now like I learned all the features and transition elements and how to use them and when to use them and that set me up for being able to use it in some of my more serious work.
Building datasaurus: learning new tools to solve a fun problem
Other times I approach the learning experience from the other side around. I dream up a really fun project I want to complete with R, but I don't have the tools to do so yet.
So take a look at this chart, this is a rolling average of some mortality data from the United States. It's generally decreasing. So that's a really good thing. Take another look though and look closely. Do you think maybe, does this chart look like a dinosaur to you? And spoiler alert. The answer is yes. Um, it looks just like a dinosaur.
So doodling on charts is a lot of fun I do it a lot especially with tablets, yeah. But for this project I wanted a lot of doodles like thousands of them. So in this case getting computer to doodle for me was gonna be way more fun. So here we have another dinosaur doodle drawn from this data, but this doodle was made with a lot of data science and data science that can create any dinosaur from any data.
Yeah problems you never knew you had.
So a few years ago it seemed like everyone out there was building Twitter bots. And a Twitter bot is a win-win, a computer does all the work and a human gets all the credit, all the Twitter likes and all the retweets and I wanted in on this. So I built datasaurus which takes a time series of data and it finds a dinosaur outline that's closely correlated with it. It then redraws a dinosaur using that time series as the outline, colors it in and displays it on this fun poster. R sends out a tweet, rinse and repeat every few hours forever and you have a Twitter bot.
The thing is when I set out to do this, I did not know how to do this. My knowledge of R at the time was very limited to data manipulation and regressions and I didn't really have the tools to accomplish my goals here. So doing this I learned a lot of new packages.
So I started with what I could do, and then I would plan out the next step whenever there's a roadblock. I would do some research to see what tools and packages were available for me to accomplish this and then yeah, I would get it to work for datasaurus, I learned the package along the way and I move on to the next one. So some examples of this are I use the Flickr API to actually get all these dinosaur images onto my computer.
Geom raster actually lets me draw the dinosaur on a ggplot, grid extra to arrange a lot of ggplots on the same chart, rvest because I wanted trivia facts to make this more scientific. So on the bottom I had to scrape Wikipedia to display some facts. You see the fun color patterns on that. I had to relearn some basic trigonometry from high school because that's all sines and cosines. rtweet and the Twitter API to put that into a tweet and then batch processing so I did not have to hit the enter button every time I wanted to make one of these.
So the output of this is really silly, but I learned a ton of new tools that I use every day at my work as a data scientist. Not the dinosaur drawing part, but everything else. And so solving this silly problem just made me a much better data scientist.
And so solving this silly problem just made me a much better data scientist.
Naming dinosaurs with deep learning
Before I move on take one more look at that dinosaur. We use data science to make this really cool dinosaur. Right now to only exist on my computer, but pretend it's real for a second. We have a brand new problem. What do we name it? Or if you're a paleontologist and you are about to publish this brand new dinosaur discovery, what are you gonna call it? So from what I can tell and I'm not a paleontologist, um, naming a dinosaur is a huge privilege. Like there's these really great powerful sounding ones that are used like that use Greek and Latin like brontosaurus, which means thunder lizard, tyrannosaurus rex, which means the tyrant lizard king and I want to name my dinosaur something really cool, too. But I don't know Greek and I don't know Latin.
So in this case, I'm going to introduce you to my friend deep learning and artificial intelligence, quite possibly the trendiest of the data sciences.
So for this side project, I use someone else's side project because I saw someone do a really cool side project and I was like I could do that too. And I did it. So Jacqueline Nolas, you may have seen her give a talk a few hours ago with Heather Nolas, published a really fun project using recurrent neural networks to generate new offensive license plates from a list of plates that had been banned in Arizona. So people in Arizona, I guess, sent out like they try to get some vanity license plates with bad words on them, Arizona said no way and then kept a list of those words.
So she used that as her training set and she generated this list and a small sample is there. And along with her output she shared her code and a blog post explaining some of the math and tricks behind the algorithm. Thanks to open source, I click the clone button on her repo and I change it to work for extinct reptile names.
So after a few minutes my computer spat out a list of brand-new never-before-seen awesome dinosaur names. So here we, I can't say them all. Uh here we have Sarasaurus, metroterosops, Dino-ryrosaurus, Alloraptor, all these really awesome sounding names that don't yet have an actual animal attached to them.
Um, but seriously, I like this project because it lowered the barriers of entry for me to begin using deep learning with Keras and R. So AI and deep learning are buzzwords in data science closely associated with advanced image recognition and natural language processing and that's true. But that association can make it super daunting for someone like me who is new to deep learning to actually start using a deep learning project. So when I saw something using deep learning as simple as license plates, I was really excited to leverage Jacqueline's work with my own data. So I can begin building my own models and begin using deep learning and Keras in my work.
Building the Bricker package and landing a job
And then finally, side projects are great because sometimes they can become real things and land you a job. So a while before I joined my current company, I wanted to figure out how to use the tidyverse to build a Lego mosaic and to build it as cheaply as possible. And so it's a relatively simple problem. You take an image you pixelate it you change every pixel to a Lego color and effectively you have bricks and then you combine single adjacent pixels or bricks of the same color and that saves you money and so I shared this on Twitter and a blog post and people really liked it and that motivated me to keep on working on it. And so I kept on developing this.
I added a bunch of new features some that were requested some that I just wanted to do. I learned a lot of packages to make it work and I figured out how to put it into a package of its own and then eventually I ended up with Bricker. So Bricker is an R package also a side project that can take an image in a ggplot and turn it into Lego bricks. So you feed the function an image and it returns a Lego version of it as well as the instructions you need to build it and then all the pieces you need to build it.
It also can take a list of instructions and then create a 3d Lego model for you. Um, this in itself is a side project. So this in itself was a side project within a side project because I really wanted to learn Tyler Morgan Wall's rayshader package but again using data. I want to use not his data. So instead of maps and topography and boring stuff, I used it to make Lego bricks. But in the process of doing this I learned so much about his package how to make it work that in later versions actually switched from relying on his package directly and instead uses underlying source code as a jumping-off point to get my project to work.
But to be honest the cool factor is pretty high with this, the usefulness factor is pretty low. No, really low. Yeah. But developing this has really helped me to learn a ton of new things about R. There are so many tools needed to get a package working and what were these working well and then to ensure that it builds correctly for many people and is as accessible as possible to as many people as possible. In doing this and I'm still definitely working on it. This is not on CRAN because it's too much work. Really challenges me as a programmer.
And then beyond that building Bricker as a side project really helped with my portfolio. So around the time I was working on this the Lego group decided to hire data scientists in the U.S. I just let me tell you this, already having a portfolio of work idolizing your future employer can do wonders for your job application.
And now I'm here standing in front of you today at this super serious conference talking about drinking games. I'm pretty sure I professionally peaked.
Closing thoughts
So there we have it. Working on these humorous side projects really open some doors and develop me as an R programmer. Side projects provide a safe learning environment to enable you to set yourself up so you can succeed and learn on your own terms. And then building side projects can create new resources for yourself and others, making it so much easier for other people out there to learn from the work that you did. And I hope maybe I've inspired some of you to kind of go home with all these new skills that you've learned over today and you will learn tomorrow and create some really cool side projects and share them with me.
So thank you. I am happy to take any questions and while we wait. Oh, thank you.
Q&A
Okay, so we have time for a few questions. So I'm not tethered to it, okay, so it looks like one popular question is how do you find time for side projects? Yeah, I was expecting that. I mean, I'm definitely privileged because I have a lot of free time. I'm just saying or I don't have any kids right now. So that's a big thing. But I stopped playing video games at one point. And so I found a lot of free time, but that's it.
Another question we have from the Jacqueline Nolas who inspired one of your projects, how did you make the background maps for the Jurassic Park gganimate? So I'm not a drawer but I got a drawing program on my iPad and traced it with an SVG and then loaded the SVG into the ggplot which I originally had a slide on but I had to cut it.
So maybe one final question and then we might get the next speaker set up. Have you made charts of the variation how much you have to drink per episode? I mean the one we're showing on the screen right here still up, is it? Yes, that's actually per episode for the office. I haven't done it for the Golden Girls. But yeah in theory, yes, but I've never played this game any of these games just putting that out there. It's purely for academic research.
Well, I'd said that was the last question, but I have one more question that's really good and we have time you okay, okay. So where do you get inspiration for new side projects? I was a weird kid. These things come to me and I just have to do it. So like I don't know. Yeah, that's it's my brain wiring. Sorry.
So then my question is are you a Golden Girls fan? Okay, good. Okay, good, so what is your favorite character? Okay, mine's Rose but yeah, you got to go with one, okay, thank you Ryan.
