Javier Luraschi | Using pins with Python and JavaScript

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone, I'm Javier Luraschi, and today I'm super excited to talk to you about using pins with JavaScript and Python. But why? Like, you're probably already aware that R is a pretty great programming language for data science, so why should we care?

Well, let me show you something. There was a survey created by Stack Overflow that shows that the top 5 programming languages that are the most popular are JavaScript, HTML with CSS, SQL, Python, and Java. These happen to be the programming languages that basically run the software in the world. Now, R is not there, it has 5.7% awareness compared to 67.7% awareness of JavaScript. So the likelihood of you having to interoperate with JavaScript or having to collaborate with someone that loves JavaScript or Python or SQL is quite high. So how can we get R to interoperate with these programming languages and for us as R data scientists to collaborate with others? And that's what I want to explore in this talk.

And sure enough, the whole point of this talk is for me to introduce you to a new project, PinsJS, which is a reimplementation of the PinsR package into JavaScript that supports Python as well.

So how does this look like? Well, here we have some HTML code. It's quite simple. It's declaring that this particular HTML file needs the pins library and also is running within the browser environment. And then we're basically just defining some JavaScript callbacks to use pins. In this particular case, we are saying board register to register a local board. And then we're creating a dataset, which the dataset is just the number 42, and we're saving it in a local board. And then we're retrieving the value and then we're printing it into the web page inside a div, which is our result.

And we can complicate this a little bit more. Rather than just creating one dataset, we can create multiple datasets in JavaScript, say using a for loop from one to 10, we'll push those datasets into the local board. And then we can search them and create perhaps a page table on JavaScript that shows us all the different pins that we have stored, which you can see on the top right. And we can even go a little bit step further.

If you're really into state-of-the-art JavaScript applications, you can use a library called Babel, which allows you to transpile modern JavaScript into compatible JavaScript. And you can make use of perhaps the new pipe operator to get the iris JSON file using pins and then pipe it into the function that basically reads this dataset from JSON, and then creating a data table with the entire iris dataset. And that's what we see on the top right. So again, you're just using JavaScript to different degrees, using pins in the way that you would expect to use JavaScript.

Real-world use case: Game of Thrones

All right. So this gives us kind of like a pretty broad overview of how the pins package works. But we want to see it in action on a real kind of like use case. So let's think about this. Darla, Greg, and Monica get together over a lunch, and they're trying to figure out which is the most important character of the last book in Game of Thrones. So, you know, like they have some insight that perhaps, you know, like either Daenerys or maybe Tyrion are the most important characters, but they can't really figure out exactly which one is the most important one.

So Darla, being the competent data scientist that we know she is, she just after lunch, she gets back to her office and, you know, launches the R instance and runs the pins library to find out if there's any interesting datasets on the Kaggle service. And sure enough, she finds that Kaggle has at least three datasets, one containing all the scripts from the HBO series, the other one, the subtitles, and the third dataset contains the actual books from, you know, with all the dialogues and content from Game of Thrones. So sure enough, this looks interesting. She uses pin get to get this particular dataset and finds out that there's five books, five files, one for each book. She loads the first book using pin get and the head of that particular file. And she finds that sure enough, the file starts with a Game of Thrones book of one song of ice and fire.

Great. She has the data. So what she can do next is something that you're pretty familiar with. She can use tools like deployer or tidy text or even string R with regular expressions, whatever skills you already know and that you are learning about during this conference. Basically, Darla makes use of those skills to transform the dataset from raw text into a tidy table that contains the proper relationships between characters. She finds out, for instance, that one of the first interactions is Adam Arbran with Jamie Lannister and that with a weight of three. And maybe the way that Darla accomplished this was by parsing each sentence and extracting the characters from each sentence and figuring out just how they're connected. So yeah, sure enough, she creates a beautiful dataset, which contains most of the interactions. And she reports back and tells Greg and Monica is like, hey, I think that on the fifth book, there's a lot of relationships between Jon Snow and also Tyrion and other characters. So check it out. And what she does is she shares the dataset using pins by registering a board in the S3 AWS service. And she shares this cleanup dataset with Greg and Monica.

And what is great is that when Greg hears the news, he's like super excited because Greg is not that much interested into doing data science, but he's very interested in creating intuitive visualizations that can really help us understand how data behaves. So she looks at Darla's dataset and just runs to boot his Sublime Editor or Visual Studio Code or whatever he uses and loads the pins package and then retrieves the dataset from S3. And as you can see, he's accomplishing this with just two lines of code. He gets the data and then he loads it and he's good to go. So then he thinks about it and he's like, well, maybe I can use Dtree, which is a modern data visualization library available in JavaScript to kind of like create kind of like a network visualization of how all the relationship of these characters looks like. And sure enough, he uses his skill to create this particular graph, which looks actually quite compelling and interesting. As you can see on this graph that graphic Greg created, there's kind of like two major components. There's one component on the top kind of like surrounding Jon Snow with other characters like Theon, Greg, Joy, and also Stannis Baratheon. But there's also like another component of characters surrounding Daenerys Targaryen, like Tyrion Lannister and Cersei. And it's interesting because this is just like a different type of skills that is super great looking, intuitive, that anyone can understand. Perhaps it's interactive and it really prompts other people to understand this data set with the skills from Greg and Darla combined.

And sure enough, like Monica is also super excited about this. She looks at the data set and is like, wow, I really need to get into this. But what goes on her mind is like, I really want to know exactly who's the most important character. Is it Jon Snow or is it Daenerys? I want to know with a numeric value, which is the most important character. And sure enough, with Pins, she can now load the Pins library and retrieve this data set that Darla carefully created and load it into her Python session and load the characters. And again, with three lines of code, like she's ready to go. She can kind of like do some more analysis. And taking the data set from Darla and the inspiration from Greg, it really pops on her head and she figures out that maybe she shouldn't use some graph processing. So she loads the NetworkX library, which can allow her to do graph processing over graph data sets. And she basically uses a concept called degree of centrality, which, you know, the way that she interprets this concept is like, if she can find the degree of centrality of this particular graph of Game of Thrones characters, that could also imply which is the most central character in the roles of the different in the different books that are available. And sure enough, after running her script and doing some magic, she finds out that on the first book, King Stark is the most important character. But on the fifth book, the race between Jon Snow and Daenerys are pretty close. Jon Snow seems to be a bit of more of a central character with a score of 0.19, while Daenerys Targaryen has score of 0.18. So again, you know, she finds out that perhaps objectively Jon Snow is the most important character in the entire season and the series, or at least on the last book.

But honestly, more importantly, what we just realized here is that we found out a way for three people with three different technologies and three very different set of skills to collaborate together using technologies that they all love and common libraries that they can reuse to better collaborate and tackle bigger, more complex data science problems.

But honestly, more importantly, what we just realized here is that we found out a way for three people with three different technologies and three very different set of skills to collaborate together using technologies that they all love and common libraries that they can reuse to better collaborate and tackle bigger, more complex data science problems.

So I'm really excited to see what you do with this new library. If you need more information, please visit pinsjs.github.io. I want to say that this is a community library that we've been developing, and it definitely needs the help of the community to extend support for more boards beyond S3, local boards, and RStudio Connect to, you know, boards like Kaggle and GitHub. And I also want to say thanks to people that have been involved in this project, like Natalia Stefanova and Michael Calleghan. Thank you so much for listening to this talk, and I hope that you are ready to start sharing your datasets.

Javier Luraschi | Using pins with Python and JavaScript | RStudio

Transcript#

Introducing the fictional team

The collaboration problem

The pins package recap

Introducing PinsJS

Real-world use case: Game of Thrones

Featured software#

rstudio