RStudio's {pins} package: what it is, how it works, and what it can do for you! || RStudio

Transcript#

This transcript was generated automatically and may contain errors.

So pins provides a way to easily share things, either across projects or across people. What ends up happening oftentimes in data science work? We have some asset, and maybe it's data or it's a model or some other R object. And we either need to reuse this asset later on downstream or across different projects, or we might need to share it with different members of our team. We say anything that can be serializable in R. So think about how you might stick something on a corkboard so you can keep it handy, right? Or you'd stick a flyer on a community bulletin board at the grocery store, right? So that other people can see it. And so the pins package pins objects to what's called a board.

This is analogous to this corkboard idea. And so this board is this place where files are written and they're read from. Examples of boards are things like you can have a board that's S3 storage, shared folders like network drives or Dropbox. You can have your RStudio Connect be a board, a SharePoint site be a board. Really, it's just a file store, a file share. Nothing magical about that. It's just pins has a nomenclature and a way of organizing things in this place on the board so that you can interact with them through the package.

Okay. So what I'm hearing is I can pin all kinds of things in all kinds of locations.

Yes.

It works nicely with Connect, but there's nothing specifically implemented in pins for Connect. They're the things that Connect does to make life for pins a little bit better. Connect will give you access to the same sharing content sharing settings that you have for any piece of content on Connect. So I can put my pin on Connect and I can specify who should be able to access it and share it. And then pins does have a nice preview of the pinned object. I like that feature, but that's a Connect implementation for how it reads pins, nothing specific in the pins package itself.

Pins use cases

So if I work by myself, either for whatever reason, I am a team of one, how might I use pins? A lot comes down to what are your pain points and what do you have available to you already for storing data, sharing data, accessing data downstream, things like that. Right. So do you, in your team of one, do you need to reuse something in other work? Right. Do you have like a reference table or, you know, some kind of other information that multiple projects are going to use? You might need to come back to this.

Pins could be a pretty good use case for that. Right. Or, you know, do you just not have a convenient place to put things? You know, pins comes in and gets to be really helpful when you just maybe don't have another place to store data. Sometimes, you know, teams that don't have ready access to a database, you know, find pins to be pretty helpful because it kind of makes them self-sufficient. They can be more autonomous in this way. You don't have the mechanism to, you know, either get ahold of a database or have one set up or even like the data that you're working with isn't, I'm sorry, worthy of being in the database. Right. Then you can use a pin as a place to, I'm going to have this nice location where it's organized, it's versioned. I can work through cached versions of it as well. So things go faster.

And it helps to alleviate other pain points. If you have places, like if putting things in Dropbox or in GitHub or if those things are working for you already, then, you know, you don't need to bring this tool into your toolbox per se. Sort of like what pain points do you have and what needs do you maybe have in terms of being able to track versions or share things more readily.

Replacing final_final_01_noreallyfinal.xls

That was sort of my entry point for pins too, is working with Excel files just like that. Right. And what do you do when you've got this Excel file in your workflow? At some point, you're going to get another version of it. Right. And either on the file server where the file's stored or in your code, every time there's a new version, you're going to go in there and rename it. Right. And so it's going to be final version one or latest copy or whatever. Right. Final, final, final.

Right. With your initials on the end. So somewhere along the line, you're going to be putting some kind of hacked versioning nomenclature on it. And there's this introducing possibility for error, of course. Right. How many times you've gone in and you've run your whole analysis and realized, oh, I forgot to change the read underscore CSV to the latest version. Right. Or, you know, or you actually have more rigor of going to the file server and changing the file name so that the latest version is always called whatever, you know, data.csv and then everything else is archived. In some form or fashion, you're implementing this hacky way of keeping things current. Whereas with pins, right, you can pin that CSV. And so pins has different options for versioning. But, you know, fundamentally, you can have pins always pull the latest version.

And so whenever, you know, your code just says pin read, you know, data from my pin board, it's always going to pull the latest version. Or if you want to get specific, you can say, I want this particular version always each every time so that even if new versions come online, then you're always referencing the specific one.

In some form or fashion, you're implementing this hacky way of keeping things current. Whereas with pins, right, you can pin that CSV. And so pins has different options for versioning. But, you know, fundamentally, you can have pins always pull the latest version.