Building R packages with devtools and usethis | RStudio

Transcript#

This transcript was generated automatically and may contain errors.

All right. Howdy, everybody. Thanks for joining me today. We're just going to get kicked off here, wait a couple minutes for some folks to roll in, but excited to be talking to you today about building packages in R. My name is Tom Mock. I'm a customer enablement lead at RStudio , and I'll be talking a little bit about everything we're doing today.

And let's see what time it is. We're about 9 a.m. Central Standard Time, where I am in Texas, but it'd be great to hear from the chat in terms of where everyone else is coming from or if you're in the U.S. or the U.K. or Africa or Europe or wherever else you're from. I'll probably wait another minute or so, and then we'll jump into the slides. I'll share that link real quick in terms of the GitHub repository with all of my code and slides from today, and then a link directly to the slides themselves.

Awesome. Looks like some folks from the U.S. We've got New Jersey, Maine, some folks from Canada, Switzerland, Chicago, Mumbai, Brazil. Fantastic. It's very cool to get kind of the international experience here, so thanks for dialing in from wherever you are. As a reminder, this is a live stream, so we'll kind of take things as they come, but it will be hosted up on YouTube, on RStudio's YouTube in the future. So if you do have to step out or if you have a colleague who can't make it today, feel free to send them the link in the future.

Awesome. So cool to see all the different groups, and thanks again for joining me today. We're about two minutes after the hour, so I'm going to go ahead and get started. As much as possible, feel free to post questions in the chat. I'll see as much as possible, try to answer some of those. And then again, the slides themselves are going to be linked to in the slide deck. So let me go ahead and start sharing my screen, and we're going to do a couple things today. I do have RStudio open in the back, so we will show a little bit of some live coding, but we have some slides today that we'll be covering for most of the time.

Again, the GitHub repository has a link to the slides as well as to the R packages book. This is an amazing textbook written by Hadley Wickham and Jenny Bryan from here at RStudio, and they do just a great job of walking through in obviously much greater depth than we can do even in an hour of the process of building an R package. For today, we're going to be kind of walking through end-to-end building functions, building packages, and then hopefully sharing them with others or at the very least kind of using them within your own team or within your own organization.

I also want to give a big shout out to Josiah Perry. He was pretty keen kind of kicking off this presentation and kind of the motivation behind it, as well as the idea of like spending quite a bit of time talking about functions, because really that's the part where you have to spend a lot of time thinking is like, what do I want to build? What is my function going to do? And then lastly, these slides are released under CC5 2.0, meaning feel free to refactor them or deliver them to other people. You can reuse these slides as you see fit.

Why build R packages?

So you're here today. Hopefully you want to build an R package. The packages at the most basic are the fundamental units of reproducible R code. They include reusable R functions, the documentation that describes how to use them, and potentially some sample data. In other words, package is a home for functions. And functions in R are a home for source code. So if we really want to start talking about packages, we really want to start talking about functions.

So stating this a bit differently, functions in R are just wrappers around longer source code. So one line function in terms of like do this as your function name is actually calling quite a bit of R code behind the scenes. And packages are just a way of describing, distributing these functions in a structured or consistent way. So taking a function that you've written one time and making it available on multiple projects or even potentially on multiple computers if you distribute it to your colleagues or to yourself on a different computer.

So in reality, if you want to build a package, you want to build a home for functions. And as far as kind of the motivation for writing these functions and writing packages, it's all about reproducibility or reusability. So reproducibility in code in some ways is actually all about being as lazy as possible in a good way. And we'll talk about that throughout.

So reproducibility in code in some ways is actually all about being as lazy as possible in a good way.

So functions are a way to don't repeat yourself and be more efficient in terms of you can use a function multiple times and you don't have to type out the same code over and over and over. You just use your function one time or even apply it multiple times with other functions. It allows you to share workflows and empower both yourself in terms of I write packages that are only for me. Or you could write functions and packages that are empowering the rest of your team to do work, to do their core job functions. They also allow you to test your code and trust your work or trust others' work who you may not know or you may not be able to learn how they write functions. So testing allows you to make sure that functions work as intended. And that's another part of writing packages.

So ultimately, functions make your work much easier, faster, and more reproducible. And our packages let you share these functions and be lazier in a good way in terms of this idea of reproducibility, about being as lazy as possible, using functions, packages, so you can recreate that environment as easily as possible.

Seriously, it doesn't have to be about sharing your code or getting your code on CRAN, although that's an amazing added benefit. It's about saving yourself time.

can write a function, you're being lazier in a good way. Rather than having to worry about copy pasting this code all around, you now have a function with very specific purpose, with very specific arguments that you can reuse over and over and over.

I also want to give a shout out to Emily Reiderer. She wrote a great blog post about building a team of internal R packages. So if you're thinking about how do I use this at work or how do I go even deeper, she's got a lot of amazing ideas written here about how you can actually integrate this into an enterprise environment and build out a team of internal R packages that all work together. So let's jump back to package development. We'll use the remaining time to talk specifically about packages now that we've covered a good chunk about R functions themselves.

Why packages instead of sourcing

So argument one, and it's something I hear a lot of time, you know, why do I write a package? Why can't I just like source code, you know, just use the base R source function to pull code from somewhere? You know, I'm just, I have it on my drive or I have it on, you know, GitHub, and I just read the source code in. So source references a very specific dot R file, reads it all in and executes it. And you could use this to add a function to your environment.

But sourcing doesn't know anything about the versioning of the code in terms of you're not using a specific package version, you're literally just reading in text and then evaluating it in R. It doesn't have, you know, included testing or structure around it. It doesn't have included documentation that both you can use and other people who use the code can use. And it requires the R file to be copied into every project that needs it. So if you use it in one project and move to another one, you have to read the file back into that. And what if you change it upstream? So if you need to make those changes, those changes need to be made in every project that needs it. And it could be modified or deleted accidentally by the end user or collaborator. So maybe your collaborator says, oh, okay, well, I'll just change this one part. It'll make it easier for me. You go back to use it. You read in the dot R file and nothing's working. So overall, it's just, you know, sourcing is fine in terms of like if I'm just sourcing a dot R file so that I'm, you know, being more compact with my code. But packages basically solve all those different problems for you and provide all these different things that you need.

Anatomy of a package

So a package in terms of what we're thinking about the anatomy of a function, let's talk about the anatomy of a package now. The metadata is basically the description, the name of the package, a description of the package's purpose, the version of the package and any package dependencies. So again, this structure allows you to install it and share it with others and have them be able to essentially recreate that same environment that you want to be evaluating things in. It has source code via dot R files that live in the R directory. It has special Roxygen comments inside the dot R files that describe how the function operates as well as its arguments, dependencies and other metadata. It has the namespace basically saying here's the functions that you're surfacing in your package and here's imported functions you bring in from other packages. And then it also can have things like tests that confirm your function works as intended. So when you change things in your function or if you add something new, you kind of confirm to yourself and others, hey, I haven't broken the whole rest of my package or the whole rest of my code.

So this is kind of the minimal anatomy of a package. You could go further, you could have data, you could have vignettes, you could have examples, all sorts of other things. But we're going to start with kind of that MVP or that minimal viable package for what we're going to do today. So all these other things in terms of installed files, compiled code from C++ or JavaScript or whatever else, we're not going to worry as much about that today and focus more on this kind of minimal parts of a package.

devtools and usethis

So while writing packages and you're thinking, well, there's four things I have to learn, the tidyverse team has spent years crafting metapackages to make their life and your life easier to create other packages. So these packages are used every day by thousands of package developers and really do make things easier with functions because you're writing packages and writing components of packages with these functions with very specific purposes.

So the two packages that we'll talk about today, devtools, the purpose of this is to make package development easier by providing R functions that simplify and expedite common package building tasks and usethis, which is a workflow package. Again, automating repetitive tasks that arise during project setup and development, both for R packages and non-package projects. So these are sub-linked in the slides. So if you go to this link for the packages and then you can go to each of them and kind of explore them deeper, but we'll talk about them a little bit today.

Live demo: building a package

So we've talked a lot about slides. We've talked a lot about packages, talked a lot about functions. Let's build a demo package real quickly in five minutes. So let's go to RStudio. This is where I was building the slides for today. So what we're going to do is we're going to create a new project. So within RStudio, I can create a package in this way. So I can go new project.

And now I'm going to say I'm going to create a new working directory. I want to create an R package and I'll give it a fun name. So fun name is what we'll call our package. And it's going to do all the cool things we want to do. So I'm going to create this project. It's going to switch over to this new project that we want to work in. So rather than package building, we now have our new project of fun name. Because we create it as a project, it's already pre-populated with a lot of the different things we need. It even gives us a nice friendly little hello world function saying like here's how to write a basic function and some useful shortcuts in RStudio or wherever else to install, check, or test these packages. So I'm going to delete my namespace because I want to build that from scratch. And I'm actually going to delete this hello world example because we're going to build our own function real quick.

So let's start off. We're going to clear this. We're going to do usethis, use R. And this is basically saying create an R function with a specific name. So let's do square val. When I do this, it will create this dot R file for square val. And then I can do fun. If I type just the shortcut here in RStudio, it's going to give me the snippet which is built in. And I can say square val x x squared. Now I have my function, you know, square val. So square val two. And that's four. Square val 16. 256. Great. But rather than, you know, reading it in that way, we can actually use devtools load all. And this will load the entire package and potentially dependencies within it. So I can do the same thing of square val two, whatever. It's been loaded but in the context of a package as opposed to me just, you know, reading in and saving something, a function as an R object in my local environment.

Now another part in terms of documentation, and I'm just going to kind of abbreviate this in saying that I'm going quickly and we're going to go through all these steps in the slides. Just trying to show you how quickly you kind of move through this process. So I can use command shift P. And I can say I want to insert a Roxygen comment. So insert a Roxygen comment. And this will give you this scaffolding for all the different things. So, you know, my function is square a value. A numeric value to be squared. It returns a number.

And for example, square val two. Perfect. So now when I actually am ready to go, I can say devtools install. And it's going to install my package. And it's going to install everything. It's going to take a second because I'm streaming. Everything is going. So let's enter that.

And because I'm live streaming, I've got to do one step. So let's restart everything. And devtools document. devtools document is basically going to say, you know, we've done something with this R function. Update the documentation. So it's going to rewrite some of the documentation. Now let's do devtools install again. And it's going to take my package and install it locally for me. So now in the context of, you know, this environment that I've done, I can load fun name. And I have my package loaded. And I can do something like tell me about square val with a question mark. And just like any other R package that you install from GitHub or CRAN or anything else, I've now gone from no package to creating a package to creating a function to installing my package. And it has documentation about it. So in this short span, I've created something that I can use in any other project. And I'm ready to go. I have my package.

Obviously, there's more we can do with this. We probably want more than one function. We want, you know, better documentation. We want to share with others. But I just want to very quickly show you that, like, in the span of a few minutes, you can create your function, create your package, and have it working for yourself locally.

I just want to very quickly show you that, like, in the span of a few minutes, you can create your function, create your package, and have it working for yourself locally.

Walking through the steps

So now that we've kind of hopefully kind of motivated you to kind of take that next step, let's talk a little bit about what we just did and talk about the individual components so you feel comfortable building on it. So we created a blank package. So I use you can either use usethis great package or from within RStudio, I just clicked open new directory, open R package, and then gave it a very specific name. And then that actually created my package in my environment that I could work with and then install. So you can kind of get started from there. Either way is fine.

So now that we have kind of a new project that is set up as a package, we're going to go through the whole game, essentially creating a package end to end, creating our function, creating the package, the documentation and some basic testing. So for the whole game, we're going to use those two packages I was talking about devtools and usethis, those are available on CRAN, you can install them with install dot packages. And again, these just make some of the process a lot easier. So rather than having to manually learn all the different pieces, you can just use these functions to build out a lot of the boilerplate or reuse components.

I do want to take a break here in terms of saying like part of the way that you can share packages is through version control. And you know, never going to try and shame anyone or throw shade at anyone. But this is the kind of the best way to share packages. And you probably should be using version control for your R packages. So you can, you know, step back and forth and collaborate. And when you're working on, you know, production packages and production code, you can check them into version control and understand like how changes were made over time, when they were changed, you know, documentation about the changes and collaborating on their development. At RStudio, we, you know, typically default to using Git, and a lot of us use GitHub. But other folks use things like SDN, Bitbucket, GitLab, whatever you're using is great. Version control is fantastic in any way you can get started with version control is great.

A lot of the functions and usethis are closely tied to GitHub, specifically because that's what the tidyverse team uses. So they're writing things that they're familiar with and things that they're using. So in that package we are using, we can actually do usethis use Git. And this will add Git into that component and we can start tracking our changes over time. You can also do something like usethis use GitHub or use GitLab or use whatever to, you know, start referencing it on a remote repository. So Git initially is local. This version control remotely is something like GitHub, which is based off the cloud or GitLab off the cloud or Bitbucket, which is an enterprise like on premise version of Git. Just in general, version control will allow you to take a package and share it with the world via something like GitHub or GitLab or other people can install it. To read a lot more about version control and specifically how to use Git with R, please see Happy Git with R by Jamie Bryan, another great resource that can help you get there.

All right. So again, the kind of the first step I did in my new project was usethis use R. And this is me basically saying, hey, I want to create a minimal R function and then open it for interactive editing. So I'm going to do squareval.R and this will create a function. It will open it up with basically me ready to go. And that was the first thing I did was usethis use R. At this point, you could copy over some code you're using or you could write it all inside that environment. It's just another .R file. You can do whatever you want in there and interactively code, interactively edit. Just you don't have to create a manual file. You can just usethis use R and get started really quickly.

The next step I did, once I wrote a function that I was like, I think this is working, let's test it, you know, I could highlight that code and load it into my environment. Or the better option is devtools load all. Because we're working in a package environment, you may actually write functions that are depending on other functions you've written. So in this way, you can load all the functions together in the context of the R package and it's not going to load them as specific objects in the global workspace. It's kind of doing something behind the scenes. But this devtools load all loads all the different functions you have in your R package all at once. So you're kind of ready to go. This is really helpful when your R package suddenly has like 10 or 15 or 100 functions in it and you want to load all of them and interact with them. Again, per version control. Once we've confirmed the minimal function is working, we should probably commit our changes. And you can do that via the built-in Git pane or via the terminal for GitHub.

So if I'm inside my R function for my R package, again, I can usethis, use Git, and it's going to ask me if I want to commit them. Yes, we've done initial commit. And then in RStudio, I have this Git pane at the very top for version control. And this will allow me to kind of interact with things like, okay, well, I've changed some parts and I can do commit messages and pulls and pushes and basic changes there. So you can use that or if you're comfortable using it from the terminal, you can do all sorts of Git commands from the terminal. There's really no difference in terms of the end result.

Checking the package

All right. So we've created a function. We've loaded it and used it. We've checked in into Git or version control. But, you know, we want to make sure it's working. So how do we go about checking the function? You know, again, we have evidence that the function works because we've used it and we interactively loaded it and used it. But how can we be sure that all the different components of the package still work? It might seem silly after you've only done one function to check it, but it's a good habit to establish the habit of checking this very often. Because you want small moments of friction as opposed to you've written all these different things and you check it or test it and you're just overwhelmed with the output. So we can use devtools check. And this will use kind of it will load the package, check the package, and use all these known best practices.

So if you go to our R function, devtools, check. So this is going to load it. It's going to do a bunch of different things. And you'll see a lot of things passing through. It's meant to be used interactively in terms of it's telling you all the different components and you can read through it as you want. And it's going to tell you warnings, failures, errors, and all sorts of other things. And it's just checking all this massive stuff for you so you don't have to manually go through and check all these components. You can actually look at the different things as noted. So it's got a couple different errors. So it's looking for the hello function that when I deleted. So we're going to have to remove that. It's giving me a warning saying that, hey, there's no license on here. And it's saying that the hello function is documented but not in the code. So we need to fix those things in terms of it's telling us issues about the package without us mainly having to check it with our own kind of eyes. We can look through here and get the actual output.

So the output of this function is really verbose. It's doing a lot of different things. And for the vast majority of the time, they all go yes. If you set up a project in the way that I've shown you and kind of with RStudio, it's ready to go and it's got all the different components. However, there's still mistakes you can make. You saw I had some errors and warnings and other things. So it's really good to run this frequently so I can make changes and fix things in the moment rather than having to do it a lot later and having this overwhelming amount of changes to make.

Adding a license

Now, the one specific thing that it showed us and it kind of gave this error was now this function works fine but we haven't added a license which check will throw as a warning. And this license is because I'm using open source code and anyone can see it, I need to provide a license basically telling people how they're allowed to use it. What are they allowed to do with it? Are they allowed to copy the code and use it in their own package? How do they reference it? It might seem silly to add a license if you're not showing it to other people. But even within your own org, you should at least define the license so that your wishes are respected and it's clear as to reuse and ownership of the source code.

The software licensing is really complex and luckily we don't have to like redefine things. There's common patterns. I typically default to use MIT license but other people use other licensing. There's a few different resources here for talking about why you should choose a license, how to choose a license, what they actually do. And I highly suggest if you're going to use a different license, check out what they mean at these different resources.

Documentation with Roxygen

Now, another step you saw me do was I wrote my function, I added the Roxygen comments which were those little indicators or the documentation at the top. Wouldn't it be nice if you documented how your function works and you were able to get help like you were with other functions? So if you did question mark on your functions, they actually told you what they were going to do.

So this requires that you have documentation. But again, rather than having to write all the R specific .rd files that are kind of like LaTeX, you can just use Roxygen to generate those automatically. So in RStudio, you can go to the code panel and do insert Roxygen skeleton. Or what I showed was with the command palette on RStudio 1.4 or later, you can just do command shift P and say, you know, insert Roxygen skeleton and it will show you the different parts.

So let's go here. If I go to code, there's a bunch of different things going on. And I can insert Roxygen skeleton. As long as I'm within a function, in terms of within these arguments here, when I go to code, insert Roxygen skeleton, it will give me these different comments that I filled in above. Or I can do command shift P to open this command palette. I can say insert Roxygen comment. And as I filter this down, it allows me to do the same thing. Insert Roxygen comment. And if you want to memorize it, there's this longer shortcut. I like using command palette because I can just type in what I want to do. And I don't have to memorize, you know, 15 or 100 different shortcuts. I just use one and then raw text.

Now, as far as what documenting is doing, is you're basically letting your code breathe on its own. You're self describing the code with Roxygen. And specifically Roxygen 2, like the modern version of this. So, the premise of Roxygen 2 is simple. Describe your functions and comments next to their specific definitions. And Roxygen will process the source code and the comments and generate those RD files for you. As well as update the namespace and potentially update the description. So, again, rather than you having to learn these 10 new things, you just learn how to write Roxygen comments. And that generates the downstream documentation that R package needs to operate. If you want to go really, really deep on Roxygen, there's a nice intro to Roxygen you can go through. But the basic ideas we'll talk about right now.

So, Roxygen items are basically special comments. In terms of a normal comment is just a pound sign or a hashtag. And a Roxygen comment is a pound sign with a single quote behind it. And then you give it specific things that you're changing. So, you do like an at param. This is the parameters or the arguments for your package. So, I say a parameter, an argument, and then describe that argument. So, for square val two, it was a numeric input that will be squared.

Now, while that basic idea, you can kind of repeat it over and over. And you can change out like at param, argument, and all the different at things you can change with Roxygen. It can seem overwhelming in terms of like, oh, I still have to know all these different things. There's really only a few things that are necessary to, again, build that minimum viable package. You're going to give the title of the function with that title. You can give a description of the function purpose with at description. You can document the function arguments with at param. And you can specify for export with export. And if it requires other packages, you can either globally import all of those or import specific packages from other, specific functions from other packages. And then tell R what is the function return.

Again, when I inserted this basic Roxygen skeleton, it basically gave me the minimal viable components. So, I didn't have to type out param return export examples. I just described them. You know, it already said, hey, this parameter is x. So, I just describe what x is. And it returns something. So, I say it returns a number. And yes, I do want to export it so when people load my package, this function is loaded. And for the examples, I'll just, you know, use the most basic one, square val two. But I could also do square val 16. And however many examples you want to put in there, it will execute those. So, while it's not required to add those examples, those can be really helpful. You know, if I call, you know, question on our norm, for example, it tells me a description, the title, and then it has some usage. But then here in the examples, it also gives me some common ways people use it. You know, like maybe you generate random data and then you plot it or you create a curve in base R. Or if you want to do, like, error functions. So, it basically gives you some context about how people use it as opposed to just documenting what the function does. So, those examples are really helpful.

So, let's, you know, show a full blown example in terms of a title, a description, parameter, return, export examples. All the different minimal kind of common items. So, again, these are special comments. They have the pound sign and then a quote. So, for the title, take a numeric value and square it. For the description, this function takes a numeric value and squares it. It's intended to be used as a replacement for value squared. For the one argument it has, so param argument, a numeric input that will be squared. It returns a numeric value. Yes, I want to export it so when people load my package, this function is loaded. And for the examples, square val four, which returns 16. It will actually show in the documentation because it will evaluate that function. So, while as you build more complex packages or more complex examples, you might expand upon this, in short, you're able to add kind of a ready to go function that's pretty well described just by adding these specific components. And then when you devtools document this and install the package, you can do, like, question mark square val and it will give you all of this in the pretty way that R displays it in the help panel.

So, now that we've added these Roxygen comments, we basically need to update the documentation or the metadata. So, if we do devtools document, it will read through all the different .R files and write out that documentation for you. And then, just as expected, you can do, you know, question mark square val and it will show you all that documentation you wrote. So, you get to immediately see the benefits of your labor and other people downstream will be very excited to actually figure out what does this function do and how do I use it. So, while this is great, it's done some more behind the scenes work beyond just, you know, the help panel. It's also added the square val function as an export in the namespace file. The namespace is something we don't want to edit by hand. Again, it's a downstream thing that, you know, devtools document and do for you. But this basically just tells the package, yes, we're exporting square val too.

So, if I go back to R and I can go into my package and you can look at the namespace, it has export square val. Just like we expected, because we've documented a package, it's added R function as an export. So, when I load the package, it's available. Now, we can devtools check one more time and make sure we haven't missed anything or that something hasn't broken. And then we'll basically, again, go through the whole process of checking it for errors or warnings or issues. And we're going to check it into version control. We're going to commit basically very early, very often, basically whenever we're making these changes and we're ready to kind of commit them into version control, make that commit. And then you can always kind of go back to that moment in time when things were working if you break it down a little bit.

Installing and using the package

All right. So, we've documented our package. We've created R functions. Now, we can actually install it and basically make it available on any R session within our computer. So, devtools install, when you run that inside your specific project, will install that package locally. Meaning, even though, you know, I've been doing everything inside function name, I can actually go back to my package building project, load my package, and use it there. I don't have to source the file in. I don't have to reinstall it. It's just available. So, if I do fun name, I think, yeah, square val 2, I can call this in this package. And this other, I can call the package in this environment. Even though it's a different environment, that package has been installed. So, I'm ready to go. I can use this.

For other people, they need to install it from version control. They need to download all the things and compile it and build it themselves, which is fine. But for me, I'm ready to go. I've created a package that's useful for me, and I'm happy with it.

So, as you'd expect, I can load it in any other environment. I can load demo package or package fun name or whatever else I want. And then I can use the functions as I need to.

Adding more functions and tests

Now, in most situations, you probably want to add a lot more functions. You're not just sticking with one function. You have ten functions you want to use or a hundred functions or however many you can think up and get ready to go. So, we can add all of our functions with a similar workflow. usethis, use R, define the function, write the function, make sure it's working, add documentation, and you're ready to go. Again, checking all these different components in the version control as you go. But because we've already covered that, I want to jump on to the next thing, which is let's add some tests. So, for the sake of time, and while we can add all those different functions with a similar workflow, we should talk about testing our functions with test that, because we want to show the whole game. We want to show, like, all the different components. So, testing is basically, in test that, our way of doing unit testing. And you've been doing unit testing, essentially, all the time, whenever you test your ideas.

If I say, like, 1 plus 1 equals 2, I've now checked myself on making sure 1 plus 1 equals 2. I've just done it in the R console. So, it's only useful for me at that moment in time, and I have to remember everything. Or if I do square val 2, and if I actually do fun name, and actually bring in the function, it gives me 4. And if I do 3, it should give me 9. So, I'm testing it, and it's, you know, going along with what I want it to be doing. But six months from now, I don't remember if it actually did what I wanted it to do. So, by adding tests, I can basically, you know, make these tests occur with code automatically, as opposed to me having to test it in the R console, or manually, or interactively.

So, whenever you're attempting to type something into just, like, a print statement, and be like, okay, yep, it gave me what I want, or a debugger expression, write it as a test instead. So, up until now, we've only tested our function interactively, literally used it in the R console, and checked for package errors via check. We can formalize and expand this with unit tests via the test that package.

Again, just a user-friendly package, making it easy to write tests in R, and basically check your assumptions. A unit test in R basically means we're expressing very specific expectations about our example. So, we want square val 2 to always generate squared values. And we can test that with a couple of different inputs. As far as unit tests, I came from a neurobiology background. I never used a unit test until I started writing packages. So, it was a new term to me. Unit tests are basically automated tests run and written by developers to ensure that their application or their unit of code behaves as intended. So, by writing these tests for small components, and then adding up all the tests together, you can basically test very complex objects, and with very simple tests.

Setting up unit tests with usethis

So, let's do this real quick. In terms of usethis as a workflow package, just like everything else, there's a use test set. So, we can load usethis, and then usethis use test set. This will create a test file for whatever file you have open.

So, let's go back to our function name. I do see a question in the chat about the console prompt.

I actually have a tweet about that and an example. I need to write a blog post about it. But I will, yeah. So, this is part of my R profile. It basically says, like, what is today's date? It gives me some kind of motivational examples and says, like, what R version are you using? And then every time I add something, or the time changes, and I executed something in our console, it gives me the current time, a nice little laptop emoji, and tells me which branch I'm working with.

This is probably beyond today's scope, but I will promise I'll tweet out about it again, about how to do this. There was a great example from someone in the community on how to do that. But we're going to be talking, we're going to focus again on unit testing. So, I'm going to use this, use test. Now, you'll note that before I call this function, I want to go to the file I have open. It will basically say, whatever file I've opened interactively, create a test for that.

And it's going to tell me a little few things that are going on. It's adding test that to suggest. It's creating a test folder. It's writing the test file. And now, it opens this up. Now, we have a test that function, which is basically, you know, it defaults to this example, which is test that multiplication works. And I need to load test that. So, library test that. We'll call it again.

And it gives me this happy, yeah, multiplication and R still works. Two times two is four. Great. That's good. But we're interested in square val actually squares. All right. So, let's do square val. And we want to square two, expect equal four. All right. So, what we're doing here is saying test that these different things are expected. So, we're going to expect that square val two returns four. And when I test that, let's see what it says. It says cannot find function square val. So, devtools load all. And the reason why I did that is because I moved into this project and didn't do anything yet. But if I do test that, it will say test pass. And it will always give you a happy little message every time it passes.

But what if something doesn't match? So, what if we say square val four is 15 instead of 16? So, we run this. And now, it gives me a failure. And it says square val actually squares. So, the text I have up here. The actual is not equal to the expected. The actual should be 16, but it returned 15. So, now, I can basically generate many of these different tests that test different components of my package and tell me, oh, yeah, by the way, like that thing that you thought was working is not working. And it failed in this exact way. And whenever someone opens an issue and says, hey, your package is failing, you can write a test for it to make sure it doesn't happen in the future.

And whenever someone opens an issue and says, hey, your package is failing, you can write a test for it to make sure it doesn't happen in the future.

Now, most of the time, when you're writing these functions initially, they just work. And that's great. Like you've written a function that works. But these unit tests allow you to standardize how you're making sure it actually does what you want. Now, again, there's many different things. So, maybe you don't want to expect equal. You actually want to expect failure. So, let's say expect failure. What happens if we square a cat? You know, it's Halloween. We're going to be squaring some cats.

So, the fun times about live streaming. So, now I'm in my own personal package. This is actually a package that I use a lot for writing R functions. So, let's find expect failure.

I can spell failure. Apparently I haven't written one of those before. So, we'll just abandon that for now and go back to the slides. And if I have time at the end, I'll keep exploring. Again, the idea is that we can use this, use test that to get our expectations. There is a way to test for errors that I'm blanking on right now, but I want to keep going rather than getting us off track. And the way that tests are written is if we saw that let's go to test again. So, we're going to test that. Let's do dot bar. So, this is a more complex test in terms of like we're testing SVG and HTML. But the way this is set up is we have a dot R file that is doing tests of the specific function. It's creating an object. And then we're doing expect equal. I'm basically saying like do these values match my expectation?

That's the way that it's set up. So, I created that with usethis, use test that.

And these expectations are grouped into those tests or that dot R file. So, an expectation is the specific unit of testing. You know, does it have the right value, the right class? Does it produce error messages? What does it kind of give as the output? These expectations are functions that always start with expect. So, I was using expect equal. And the one that I failed on, which is kind of ironic, is expect failure. And then you have a test, which is a group of multiple expectations. So, if I were to, you know, go back to function name, we'll go there for a second. The tests are the overall kind of parent of all those different testing that I'm doing or expectations with the error message. You know, that's one unit of functionality is a test has multiple expectations. And as long as those are all true or all met, you get your happy message out.

And the file is grouping together the tests, which is grouping together all those expectations. So, if we do this test, this whole file is the test file. Test that is the actual test. And these expectations or these expect equal are the units of testing. So, again, we'll load test that. We'll do test that, test equal. And as long as it expects, we'll do devtools load all. devtools load all. Going all the way here because we moved into our new thing. So, we got it loaded up. Test is passed. Happy days. We got it all the way back. So, we have a test file. We have an expectation, which is expect equal. It's your expression and the expectation. And then the testing that you're doing around that with the message if it fails.

Expect error vs expect failure

All right. So, interactively, you can write and check those tests. So, I can, again, load all, load test that, and I can run this interactively as I've been shown in terms of, like, I do this, and it gives me happy message. I do this one, and it expects, ah, this is why I was saying expect error, not expect failure. Here we go. So, good time. So, let's do this one more time. I should have just kept going on my slides.

Test that. And because we haven't written square val two, this is square val and binaric operator. So, square val cat. And this is what it gives me. This is the error message. The way that expect error should go is it should give me the exact message because we're trying to test for a message. And it's still going to give me a failure. Let's do one more. Expect error. Now it's passing. So, with this expect error instead of expect failure, which was my failure earlier, is basically saying, like, if I pass something that it can't operate on in terms of it can't square a cat, it's just going to error. So, if I do this, it generates an error. I'm expecting it to generate an error. And that all passes. So, not only can I square values and get the expectation, I can square the value of something that can't be squared, and it gives me the error that I want to get. So, we went full circle. We made it all the way around. Thanks for sticking with me on that. I'm glad we were finally able to square our cat.

Now, the way that these tests are written, you can see that I've done a few different things. So, for my expectations, I have, you know, 2 should be equal to 2 squared, 4 should be 4 squared, 16 equals 16 squared. So, I've written essentially, like, multiple tests testing the same idea, but with different inputs. Because while one input might always work, other inputs might not work. And this is probably more impactful for more complex functions, but we want to test our expectations multiple different ways. And then I can do test that non-numeric or missing input should error. So, A should always error as input must be numeric. Factors, data frames, missing values, those should all error out in specific ways. And for expect error, you can either do it as just give me an error, or give me this specific error message as it's expecting that. I'm writing this on SquareVal2, which is the one that had the little helper function saying, if it's given, give me a nice friendly output in terms of input must be numeric, rather than this kind of less helpful error, non-numeric argument to binary operator. So, kind of coming full circle.

Those tests, again, optional but useful, live in test slash test that. Names must start with tests. And as far as the usable test file, this is basically the exact same thing where you're doing interactively, but now we've put it into our testing file. And the reason why it's useful in a testing file is I can do devtools load all, devtools test. And now, it's going to go through and run my tests. And it gives me A okay. I have two passes because it basically evaluated this and an evaluated SquareVal. So, in this case, when I have the situation of I have dozens of functions, or 10 functions or 15 functions, rather than, again, manually going in there, I'm able to go in and test all of them at once, figure out which ones pass, which ones fail, which ones have warnings or other messages. And when I run devtools check, again, that thing that I should be doing very frequently to say, like, how is my overall package looking? It will, again, run all of the tests that I've written and test that.

So, what happens if they don't match? Again, it'll throw this error message. So, that was that one, if I would have kept going in my slides rather than trying to live code it all out, expect error, you know, give me this, you know, squaring of this, and the input must be numeric. So, you can still test things that should always fail or should always error in a specific way. So, you can still get those expectations and test both successes and failures in your package.

And if you do something that doesn't match, it will, again, show you that message of, like, you tried to square something, it wasn't a match, and here's the difference between the expectation and the actual value. So, while, again, I've gone through a lot of the different components of testing, and we went through kind of a side path that I appreciate you sticking with me on writing expect error versus expect failure, you can read a lot more about this in the R packages book, there's a testing chapter, and test that, test that documentation, and I wrote out kind of a longer form blog post example of robust testing.

Building a pkgdown site

Let's move on to some more documentation, though. So, while you might, you know, be relying on built into R kind of documentation, you can also build your own package down the site, just like the one for devtools that they have. They've got this really fancy, you know, cool devtools package down site with, like, documentation and links and articles and all these cool things. They didn't write all this out manually in terms of, like, this was built through the package down package. So, you can do the same thing. Just with the documentation we've done, we can use devtools build site, and it will actually build out that website for us with the documentation and all the examples. So, we'll let that run for a second. It's going to build out the documentation, but the benefit here is that documentation you've written ahead of time is used here. So, every time you run devtools document, it will, you know, update this as well. So, I can go into reference, I click on square value, and it says, hey, this is square value, and it shows me how it's supposed to work, and it shows me the examples.

So, if you do square value 2, it gives you the output. So, again, like, in literally seconds, I was able to go from writing a package, writing some documentation, to now having a website I can publish other people can go to and get information about how my package works. And this is how, like, my personal packages have their own documentation. So, for, like, GT extras, when you go to the reference page, I didn't have to manually build out documentation for all these functions. You know, I was able to go in and, you know, write the documentation with Roxygen, and it builds out all this complexity for me with, like, examples and all the different parts of the code and everything else. So, very, very powerful things you can do with package down and just the basic documentation.

So, I think this is really powerful. I'm a big fan of that. You can host that package down output anywhere. You can do it on, like, GitHub pages is where one of my examples is. Or you can host it on something like RStudio Connect, which, again, if you're building packages inside your organization, you may not want to show the entire world. You might just keep them inside your organization. But you can host that HTML anywhere. For, you know, if I was hosting an example of, like, one of my basic packages on RStudio Connect, I can, you know, load it with this kind of code, which is basically saying, like, hey, this is a lot of different files. Take all of them and deploy to Connect. And then here's one running on our demo server with the hello world example. Operates just the exact same way. But just you can take that documentation and host it anywhere.

Referencing external packages

Now, the last component we'll talk about as we're kind of nearing the end here is what if we want to reference external packages? So, I've written everything with just base R. The expectation is that while base R is super powerful, at some point you're probably going to bring in another package, whether you wrote it or someone else wrote it. For example, I wrote a lot of wrapper packages or wrapper functions around GT. That's what this whole entire package here is. All these different functions are wrappers around a GT package. So, I have to reference GT as one of the packages I'm bringing in because I'm essentially importing all these functions that other people have written and loading them into my package. So, GT add divider was one of the basic examples we showed when we were talking about tidy evaluation. Now, this only works if I bring in things from GT as well. So, tab style, cell borders, cell body, cell column labels, all of that is actually being brought in from GT. The part that I'm doing is just GT object and the actual inputs because it's wrapping all those other components.

So, in order for this to work, I have to tell the function, hey, by the way, I didn't write any of this. Please bring in this other package that's really beautiful and amazing. So, now in our documentation, we have the things that you've seen before, title, description, the function parameters, all those cool things. And then import from GT and import GT. Now, I'm importing from GT the McGridder pipe because inside my function, I want to be able to use McGridder pipe. I want to use it all over because I have really long functions and I want to pipe things together. I'm also importing all of GT in terms of all the functions that are available in GT are now made available inside my function. So, when I do GT add divider, it's able to load in all of those functions and use them. It also adds GT as an import in the namespace file, meaning when someone installs GT extras, it says, hey, by the way, you also have to install GT for this package to work. So, it also downloads GT and installs it. So, again, part of this is that with simple documentation, you can build out all these different components of all the important things in your R package. When you call devtools document or do something like usethis, use package GT, it will also update your imports in terms of, like, I need to import GT. I want to specifically have people bring in package version equal to 0.3 because they added in a lot of cool things in 0.3. So, when you import or you install my demo package, it also installs GT.

So, again, this structure gives you the ability to do your documentation and metadata and attach all the pieces together in a way that other people know what to do with it and R knows what to do with it when you try to install it or load it in other locations. So, usethis, use package is what allows you to add specific packages to your documentation. Although you can always manually write this out, if I'm in my package, let's, there's GT extras, we can show it here.

So, if I pull this down, you can see the GT extras brings in a lot of things because I need to, like, manipulate data with dplyr. I'm doing a lot of inline plots with ggplot. I absolutely need GT to actually build up all these things. So, this tells it every time that I install my package, you can actually import the specific one or you can bring in this specific package at a specific version. I do see a question from the chat. It looks like I thought that you could actually do a specific version in terms of if you wanted to say equal to 0.5, but typically, I would say greater than or equal to and most often greater than because someone could actually install a newer package version. So, this means as long as you have 0.8.5 or newer, dplyr is actually on version 1.0. So, we actually want to bring in anything where as long as you have dplyr 0.8 or newer on your system, it doesn't have to be installed. It can just import that package.

Workflow summary

Again, the whole kind of summary here as we get to the end, you know, we've talked about documentation. We've talked about building functions, building the package, writing all the different components. The basic workflows within a package if we're summarizing this is, remember, small changes committed frequently via version control. If you want to share it with the world, you probably want to check it into version control so they can download and install it. And if you do those in smaller components, if you do break something, you can go back earlier in the process. usethis use R allows you to add a new functions as you go. Control, command, alt shift R or command shift P or however else you want to do it. Add the Roxygen skeleton when inside the function. devtools load all lets you load or inactively test your new function.

And then usethis use package for a specific package you want to bring in to import to add that package as a dependency or something you want to bring in. usethis use version when you're updating your package. So, allowing you to change like, oh, this is 0.2 package version. It's the very first one I did where it's 0.2.1, you know, versioning your package so you can install it at a specific moment in time. And then devtools document to document the package with your various changes as you make them. So, any time you update the Roxygen comments or other documentation, you need to recall devtools document to document everything in an appropriate place. And lastly, devtools check all the time and devtools test to check and test the package so you're making sure that all the things are working.

Now, again, I've shown a lot of different stuff and it can still seem a little bit like there's a lot of different components going on. All these things don't always have to be done. Like, if you're not importing packages, you don't have to do that. If you're not checking into version control, you don't have to check into version control. I'm just trying to show you kind of the different levels of where you can go and where you can go in the future. And your minimal viable package can be done in just a few minutes.

And your minimal viable package can be done in just a few minutes.

If all is well, at the end of the day, just install the package and you can use it locally forever as long as you want to. And then if you want to update the package, you can, again, install the newest version once you've made changes to it and we'll just overwrite that package install.

Now, as far as devtools install, you know, we will have been installing it locally. So, no one else can install my package. That's fine. Maybe I'm the only person who's supposed to be using it. But to share with your colleagues, they're going to actually take it from the remote or from the cloud or wherever else you have the version control check to install it. So, for example, my GT extras package and the way that you install it is you would actually remote install from GitHub from my specific handle this specific package. So, even though all the source code doesn't live on the computer today, they can still install that package from GitHub or GitLab or wherever else and install it into their local environment. And if you get it on the CRAN, then you can just do install.packages, whatever the package name is. Now, you don't have to get things on the CRAN for everything. Again, you might have packages that are only used internally or you're only using it for yourself. So, while it's great to kind of shoot for CRAN in the future, that's not required for every, you know, package that you're building.

And again, while I'm using install GitHub, because that's where I have it, you could install it directly from GitLab or just from Git if you had like a local environment you're installing it from. Another thing that we use a lot in terms of like RStudio package manager allows you to host like your packages on premise. So, that's really useful in terms of like, again, if you're not releasing it to the world, you're holding packages internally, you can actually host them on RStudio package manager and install from there. So, you're not having to breach your firewall to install.

And package manager, whether you're using R public package manager, which is free, or the on premise RStudio package manager, which is a paid product, is it has all the different versions available. You can install specific package versions. All the packages are binaries. You can install them very quickly, especially in like a Linux environment.

Wrapping up

So, to wrap this up, and thanks for sticking along with me for everyone. We've gone through the end to end process of writing functions with both base R and with tidy eval. We've created a minimal package in just a few minutes. We've added more functions to the package. We've documented the package as well as potentially external dependencies. We've even covered testing and unit testing with your package, how to create public documentation internal to R, as well as package down for documentation that's external to R just on a website. So, we've really covered a lot today. And you may only use components of that, and that's fine. Really just trying to show that you can create a package very quickly, and you can build up those components over time with tasks or documentation, external documentation, other packages, whatever you need, you can kind of build up over time as you learn more. So, I really do hope that you feel empowered to create your own package.

You can use as many of the best practices as possible. But ultimately, once you get started, you quickly see that the tooling has really been written in a way to make the process very user friendly and really ready to go with it all. As far as follow up, you can read through the whole process again in R packages book. They have a chapter called the whole game, which goes through the entire process, basically what we did today, but written out.

So, this has like a basic package you're building and loading different libraries, creating functions, and all the different things that we did today. So, maybe if you're more of, you want to read through it as opposed to watch a video, you could always go through here. I was really just hoping to provide a video compliment to these type of resources where you can see some of my successes and failures in real time in terms of we struggled a little bit together with that expect failure because I was looking for expect error.

But overall, very, very kind of quick to get started. Again, the slides from today are available publicly. So, you can kind of go to the slides. I'll throw those in the chat again if you just want to look at it all. Or if you want to look at the source code for how I even built the slides or some of that other stuff, I have that on my GitHub at the second link I shared, which is here at github.com slash J Thomas mock slash package dash building. Overall, 90 minutes. We built a package. We looked at a lot of different things. We had a lot of fun. Thanks so much for hanging around. If there are any other questions, I'll hang around for a little bit more and answer some questions. But I think it was a fun stream, and I had a good time. So, thanks for hanging out with me.

Importing tidyverse packages

When importing tidyverse packages, would you recommend importing the whole tidyverse or just the individual packages? I would definitely recommend just importing the individual packages in terms of like the tidyverse is actually like 20 or 30 packages, which have quite a bit going on. And it's a meta package in terms of it's really just loading those other packages. So, for GT extras, for example, like I depend on some components of the tidyverse in terms of like dplyr, ggplot, you know, tidyverse packages. But I'm not bringing in tidyr, I'm not bringing in lubridate , I'm not bringing in all these other different components of the entire tidyverse. And I don't need to depend on those, and I don't need users to install all those packages, even though they may already have them installed locally.

I would definitely recommend just importing the individual packages in terms of like the tidyverse is actually like 20 or 30 packages, which have quite a bit going on. And it's a meta package in terms of it's really just loading those other packages.