Kelly O'Briant | Configuration management tools for the R admin

Transcript#

This transcript was generated automatically and may contain errors.

Hi, welcome. My name is Kelly O'Briant. I work at RStudio as a solutions engineer, and my talk today is on showing you some configuration management tools if you happen to be an analytic administrator, or as we kind of like to use the term, an R admin.

So I'm really happy that R admin as a concept, as a role, is becoming more recognized in different organizations, and people are starting to identify as this personality in different organizations. But in case you don't know, an R admin is usually, as we define it, a data scientist who has crawled into doing more IT-esque type work. So they take on the jobs of onboarding new tools, deploying solutions, and supporting existing standards in their organizations. They work closely with their IT groups to maintain, upgrade, and scale their analytic environments. They influence others. They train. They try to make things more effective in their organizations. And in general, overall, they are really passionate about making R a legitimate analytic standard in their enterprise.

So if you're not familiar with the R admin, you think you might be one, you want to read more about it, our Director of Solutions Engineering, Nathan Stephens, put out this blog a while ago on our R Views blog called Analytics Administration for R, so definitely check that out if you are interested in reading more.

If you speak the same language as they do, if you can show that you know how to administer the Rstack using the set of tools that they're comfortable with, that is a hugely powerful thing.

So some things I like about Ansible are, of course, that I can stand up these servers very quickly in custom ways on demand. But I also really love the rich module ecosystem that it comes with, especially the modules that they have around your basic big players in cloud compute space. So they have great modules for AWS, Azure, GCP, and they have a lot of other modules, some of which I'll talk about today, that are also nice to have and helpful. The other great thing about Ansible is that you write playbooks in YAML, and that's awesome because it is super human readable and super machine readable, and I love that. It's also super easy to install. This is how you install it on a Mac, and then you're up and running.

So I'm going to go through kind of what creating an Ansible project looks like, but as a high level overview before I move on to the next slide, I'll say that what I'm going to talk about is kind of a nested directory of YAML. So the very container-ish level, you have your playbooks, and then within playbooks, you're going to have roles, and then within roles, you're going to have tasks. So I'm going to go through all of that. First with the playbook role structure.

Ansible playbooks and roles

So this is a project that I actually did a couple weeks ago to help check out the new feature that just got announced a couple minutes ago in RStudio Connect 1.7, which is support for these new content management APIs that allow for programmatic deployment of content to Connect. So this is what my playbook role structure looks like, and I'm showing right here the create sandbox playbook, and this playbook contains two roles, one for provisioning my cloud infrastructure and a second one for installing RStudio Connect on it. So I have this playbook, create me a sandbox, and then within that playbook, different things happen and I connect to it in different ways, but I have two roles, one provision, one install and configure.

This again is about the content management APIs for programmatic deployment. It's kind of a two-for-one in this talk, because I was very excited about this feature that's coming into Connect, and so if you are working with your IT department and you've ever heard these two questions, hey, we can't allow push button publishing or how do we implement a dev test prod setup, I believe that our new stuff around programmatic deployment will be really helpful in having those types of conversations. There's the key resources, the RStudio user guide has just been updated with a cookbook for various server API recipes, and we also have a GitHub repo that has scripts that have examples of how to do programmatic deployment, so those, the scripts that are in this GitHub repo are going to be really useful, and those are the ones that I used in this Ansible project.

So again, back to roles, as I mentioned before, these roles have tasks, and the basic anatomy of a task is that it's a good idea to name it, and that name should in general be relevant to what the task is doing, and then you also provide the module that you want to use. And then finally, if it's applicable, you'll then provide parameters for that module and plug in your variables.

There are a couple of really cool things that are happening in this particular group of tasks, and this is the entire role that this Ansible project uses to create and upload and deploy the content. So this is programmatic deployment as defined in an Ansible task listing. The cool things that are happening here is that I am leveraging the script module, because if you thought, like, this can't possibly be all that programmatic deployment is, that's correct. I'm leveraging some shell scripts, because I didn't want to take the time to transfer all of the commands in those scripts into Ansible code. So the scripts that are provided in that GitHub repo, I grabbed those out, I put them in a scripts directory, I edited them slightly to make them do what I wanted them to do, and then I'm just calling those scripts, and you can it's a really cool way to take baby steps into moving any of the configuration scripts that you currently have into a more reproducible configuration management type workflow.

So that's the number one cool thing I'm doing. Number two is that you can see after I've run that first task, I'm registering an output object of what came out of that first script, and I'm plugging it in to the second and the third task. The final cool thing is that I remembered to use a debug statement, which is also great.

I'll start moving faster. So I talked about roles have tasks. This is the task for the deploy content role, and finally when you're ready to get up and running with your playbooks, this is kind of how you run playbooks one by one. So I talked about I usually have two or maybe three playbooks that I'll run to in any given Ansible project, sometimes more, but usually two or three, and one will be the playbook that installs and sets up my infrastructure, installs whatever I need on it, and the last one will be the playbook that tears everything down.

This is my favorite part of Ansible sandboxing. Once you have created this thing, you now have the power to write a good read me, check it all in to version control, and then burn it all to the ground, because you have the ability to reproduce this environment at any time within minutes, which is awesome.

Once you have created this thing, you now have the power to write a good read me, check it all in to version control, and then burn it all to the ground, because you have the ability to reproduce this environment at any time within minutes, which is awesome.

Interoperability and legitimizing R

So that's a little bit about how I do daily work as a solutions engineer who works with engineering teams and how I stand up these environments very quickly whenever I want, but obviously you would do things differently. You have different needs for sandboxes, and so this slide kind of shows what is available at a high level through docs.rstudio.com and all of the various configurations and integrations that you could possibly use.

This is a little resource that I have available on our Solange GitHub, and it covers if you aren't ready for the cloud yet, but you want to start using Ansible and creating sandboxes on demand, how you might do that with VirtualBox and Vagrant. It also shows kind of if you're really interested in seeing just the task structure of Ansible and how I have done the data lab sandbox inside Ansible task structure, that's available to look through here.

So finally, I want to kind of end on thinking about, like, why did my talk get put in the interoperability section, which kind of seems like an odd fit, but I also really like that it's in this section as well, because going back to the idea of how we get R legitimized in an organization, I often hear the stock answer, like, show the value of R, like, build a bunch of cool Shiny apps, and I love Shiny, I do. I'm writing a book about Shiny in production, apparently, but it isn't actionable if you don't have, like, awesome killer Shiny app ideas.

So my idea for you, a better way to frame this sort of advice is to think about linking the tools that you love, R, to the tools that you know other people love in your organization. Powerful sandboxes really leverage interoperability, and we've done a lot of work at our studio to make cool integrations that are turning into true interoperability opportunities for R. Throughout my career, I have had, like, a lot of success helping get R legitimized by putting it in terms of other people's favorite tools, and so I'll leave you with that. Thank you.

Kelly O'Briant | Configuration management tools for the R admin | RStudio (2019)

Transcript#

The R admin's role in legitimizing R

Building sandboxes

Configuration management with Ansible

Ansible playbooks and roles

Interoperability and legitimizing R

Featured software#

rstudio