
Miles McBain | Our colour of magic The open sourcery of fantastic R packages | RStudio (2019)
What does it mean to say software is, to quote one Twitter user, ‘so f***ing magical!’? In the context of our popular community hobby of rating and sharing R packages, the term ‘magic’ seems reserved for our most powerful expressions of visceral approval. Why is this? And what does it say about how we value software? Can this magical quality be quantified? We will consider these questions in examination of magical specimens, and in the process reveal the surprising depths at which notions of magic are embedded in the R zeitgiest. VIEW MATERIALS https://github.com/MilesMcBain/rstudioconf_talk About the Author Miles McBain As an Applied Statistician Miles combines a theoretical statistical knowledge and computing expertise to help organizations understand their core business and their customers. Miles is a hacker at heart, which he channels into regular contributions to the open source and open science communities. In addition to commercial projects Miles is always interested in small data/statistics consulting jobs for start-ups and non-for-profits that enable him to expand his applied experience in areas such as as A/B testing, experimental design, and statistical power analysis. He does this mainly for the thrill of learning new domains and the opportunity to meet fascinating people
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hello, and with this talk, I want to explore the making of software that feels like magic.
And I want to start by reflecting on people's responses to data parser. So this is a package that I put on CRAN in 2016, and the community's response to this was like beyond anything I ever imagined. Lots of kind words in emails and tweets, even at this conference. We had some sort of news headline, a crossover with actual parser in a school that I don't fully understand. And amongst all that, like this little meme emerged. And this brings me a lot of joy.
Even though I've never spoken with anyone about this, it's like somehow I understand what's being said here. And I've experienced this myself, and I sympathize with this, and, you know, I appreciate this.
So what is being said? I think, you know, the natural definition of magic is like phenomena that violate the rules of our world, say. I think what's happening here is we're violating people's assumptions about what is within their power to do. And that is a pretty intoxicating feeling, when you're suddenly shown you have access to this power that you didn't realize you had. And I think mixed with that is some astonishment and wonder that someone is just handing this to you for free.
So that's what I want to talk about with this talk. I want to give you three things. So one is, like, the general forms of tools that can do this. Two is, like, some ideas about how they do this and how you can do this with the things that you work on. And three is, like, some broader ideas about, like, how we can make sure we keep seeing this type of stuff come out of our community, okay?
R packages is magic
So this is a monthly plot of the proportion of Rstats tweets that mention magic, witchcraft or sorcery. And something pretty interesting is happening in 2016, right? Like the rate at which people are likening R to magic has accelerated and is continuing to accelerate. So who has a theory as to what this is due to? Yes? Tidyverse. Yes.
And we can confirm that by tallying up the mentions of CRAN packages within those tweets and you get this distribution of magical mentions. And the dark bars are the ones that I think are signal. The gray ones are a bit noisy because of their names. So yeah, we've got some household names up the top there. But this is, like, more or less what I expected to see. What I didn't expect was this. So we've got a really long tale of magic in our universe. And that is intriguing. Because that suggests, like, there's a lot of, like, lesser known stuff out there to be discovered.
Three forms of magical tools
So what I've done is I've, like, formed them into categories because I think that's the only way to sensibly discuss so many. I feel like there's probably, like, five or six natural categories and I'm gonna talk about three today. So the first one is Zaps. And they remind me of the way that magic works in Harry Potter. Because you have this tool and you have this problem and you point and you say the magic word and magic happens and there's a problem that's solved. And the magic word might be, like, install tiny tech, in which case the dependencies to knit a PDF would just manifest themselves on your computer.
But the reason I'm showing you this code is because I think this is, like, the template for a Zap. They often have, like, one hero function that does all the work. And they tend to share a name. That function will share a name with the package itself. So they're very much, like, one function packages. This is, like, the largest category. A few other interesting things. So the most tweeted about was here. And I want to give a special mention to Janitor, which, like, although it has many functions, I feel like it's like a spell book, if you like, of these little Zaps.
Now I want to introduce you to genies. I call them genies because they convey magic by some kind of interaction with a program. And their goal is to assist you to do something. So they do this by creating, like, additional UIs with RStudio add-ins or, like, console dialogues. So they only work in the interactive context. And I want to highlight an example called map edit by Tim Appelens. And map edit will allow you to, like, draw geometry directly onto a leaflet map and then export that back into your R session as a geospatial dataset, which is pretty badass.
And now the third one that I want to introduce you to is tongues. But in this universe, there's a very strong theme of, like, the power of words and in particular, like, the power of names. So much so that if you can discern the true name of something, like, say, the wind, you can, like, gain power over it. So that kind of reminds me of what's going on in this universe. And I don't think we can, like, understate the contribution of the name tidy data alone. Like it's an accessible name. We don't have the third normal formally verse, right? And that name sets up an intuition that is just confirmed by a definition.
And that pattern plays out all over the tidyverse. It's not all magical. But where it is, you can just navigate it with a cheat sheet and autocomplete because the names of the functions map so well to the problem domain. So this is not, like, unique to the tidyverse. I think the general pattern of a tongue is it tries to create, like, a new language for you to more easily express and solve a problem.
How magical tools work: the litmus test
So here's, like, an idea. Like, talking about expectations being violated, I think we can frame that as, like, maybe magic is the appearance of, you know, when you're granted power disproportionately to the pain that you're expected to, like, have to go through to get that power. I've got the, like, subscript I there because I think it's important to acknowledge that this is individual. Because people have different skills and experiences, so they're going to see this a little bit differently.
But what I noticed was, like, when I started to look at the packages in this set, like, they all grant power, definitely. But they seem to, like, invest a whole bunch of, like, dedicated effort into taking away user pain. And that started me thinking, like, maybe this is, like, it. Maybe this is, like, the special thing. So I created this little litmus test, and I think this basically holds up. I think if you can, like, write this out convincingly for a tool, then there's probably a good chance that someone will find that, like, to be magical. And it's a pretty simple, like, test that just goes, like, the magic of tool is that it grants you power while saving you from pain, and you put in, like, what it does for power and what it, like, saves you from for pain.
I think if you can, like, write this out convincingly for a tool, then there's probably a good chance that someone will find that, like, to be magical. And it's a pretty simple, like, test that just goes, like, the magic of tool is that it grants you power while saving you from pain, and you put in, like, what it does for power and what it, like, saves you from for pain.
So let's try Deployer. Magic of Deployer is that it grants you vocabulary for wrangling data, right? Now I haven't told you the pain that it saves you, and because of that, like, Deployer sounds a bit flat. Like, base R can do this, right? Well, the magic of Deployer, sorry, is that it saves you from, like, mindless typing because of the pipeable interface and nonstandard evaluation, saves you from, like, having to read because the functions are so well-named. It saves you from having to learn SQL because it will write that to a database for you, and that is pretty awesome.
So let's go to TinyTech now. The magic of TinyTech is that it grants you the things you need to need a PDF while saving you from, like, bloated 4 gigabyte downloads, navigating, you know, ancient websites and reading horrible instructions. There's a catalog of user pain associated with setting up a LaTeX environment, and the author of this package has, like, documented it all in this neat page. I think the link to this page is even called, like, the hall of pain. So this is a really interesting thing to read, like, if you want to get insight into, like, the mind of someone who is really, like, in tune with user pain, like, it consistently creates things that people perceive to be magical.
Using the litmus test as a design tool
So the real value of this little thing, though, I think, is that you can flip it and it can be a design tool. So if you're working on something that you want to imbue with more magic, like, I think you just have to, like, shove as much stuff into that last section as possible, right? But you need some ideas of, like, what you're going to do.
So obviously one place to start is, like, what are my pain points when I'm doing my own R analysis? And I don't know yours, right? But I know mine. I can tell you my top three. So I hate mindless typing, and that's, like, typing where I know what I've got to type and I'm just, like, waiting for my fingers to finish this verbose, like, thing. I hate reading code. I hate reading about code. And I hate making decisions about code. Because all these things, like, take my mind off the analysis that I'm trying to do and, like, put me into this, like, code world where, like, often I prefer not to be.
So those are a bit abstract. Like let's think about some concrete ways you could, like, implement these. So saving typing, like, here's the continuum. You could, like, do the pipeable functions with the nonstandard evaluation. You could use the autocomplete friendly names to, like, you know, only type part of it. Or you could take the typing away completely. And that's what I did with data pasta. I really tried to bank heavily on that for magic and drill into, like, minimizing the keystrokes to the absolute.
And saving reading, well, we've talked about the intuitive names. Conforming becomes really important when you are making a model or a method, right? Because what your users fundamentally want to do with a model or a method is compare it to one they actually understand, yeah? And to do that, it needs to have, like, it needs to look a certain way. Like, it needs to take a data frame, use a formula interface and return an object that implements certain S3 methods. And if it doesn't do that, like, you're just foisting a whole bunch of reading and learning onto your user to allow them to make the comparison that you know they want to make. So don't do that. Conform.
Also be flexible. So I think potentially the worst kind of reading you can do is, like, reading error messages and then reading stack overflow pages, trying to resolve those, right? So actually base R is really good at this. Base R will happily convert between lists and vectors and matrices and data frames and all different types of objects. So if you create an interface which is very rigid by comparison to base R, it's going to feel really nonmagical to people.
Finally, this is probably one of the key categories. Making people decisions is a really easy way to, like, make them happy. So defaults are a way to do that, but I've had this interesting conversation with people about defaults a number of times over the last year where I've got this counter argument to defaults, and I want to share it with you. So the argument goes, so I could use a default here, but I'm worried that my user will blindly use that default, arrive at an invalid answer, and then blindly use that answer, okay? So I guess, like, my rebuttal to this is, like, why does your tool give answers that users can't, like, discern the validity of, you know? So could you potentially invest in some diagnostics or some things for your users to, you know, be able to tell if the answer is right? And if you did that, you know, you've got, like, pain being saved on the sad path where there's no default and pain being saved on the happy path, you know, because you've got the default and you've just got a more magical thing overall.
In the middle there I just want to cite two examples. So there's no way I would have released data pasta were it not for, like, DevTools and R hub. So these packages, I see Jim and Gabor in the front there. So these packages help you navigate, like, the rat run of releasing an R package. Like they give you little dialogues that make you, like, help you think about the right thing at the right time. So that's something that you could emulate.
And finally I think, like, this applies to all tools, like, just give people less choices. Because for a user to, like, make a confident decision, they have to kind of pause this choice space. And the more things you're, like, tossing in there, like, the bigger this space is growing and the more I have to try and understand to, like, be confident. So you're actually just creating anxiety for me, you know? So shrink that down, reduce my anxiety, curate your experience, and I think, like, that will definitely make something more magical.
Keeping magic coming from our community
So those are, like, three pain points and then, like, three ideas about how to address those. And I know you have your own pain points and your own ideas and, like, I'm really here for that conversation. But I just want to switch up now to something a bit more important. Which is, like, how can we make sure we keep seeing more of this come out of our community?
So I think there's, like, a segment of our community that is really in tune with user pain and we should listen to them and we should learn from them. And that's these people in the red zone. So these people are not professional software developers. They are amateur software developers, but they're professional analysts and they're professional scientists. And they are, like, living the pain every day of, like, having a problem and having a tool and having those not quite meet. I think it will be their insights into those gaps that will spawn, like, the next wave of magical tools. And that has the capacity to become, like, a virtuous cycle when those tools attract new users to our community, right?
And it's worth thinking about, like, amateurs have already created a bunch of cool stuff for us. Like, DataPasta, obviously written by a total amateur, but on a more significant level, we've got Knitter and GigiPlot and Sybil now, you know, chamber award winning package. These are all created by students, right? Were they wasting their time writing R packages instead of working on their thesis? No. Like, that's garbage, right?
Were they wasting their time writing R packages instead of working on their thesis? No. Like, that's garbage, right?
So what I want to do now is, like, for anyone who is, like, on the cusp of, like, committing to, like, developing a tool that might take away some pain, I want to, like, dispel a few myths for you that might, like, get you over the line. So the first myth I want to dispel is that for a tool to be magical, it needs, like, a grand vision, right? So say, like, GigiPlot is, like, going to change the way you express graphics. But DataPasta is going to help you move text from this window to this window on your screen. And something like Glue is going to, like, reduce the code you write and make that code more readable so long as that code is for string concatenation, you know? But I'm an ex-heavy paste user. I use Glue all the time, and every time I do, like, I get a warm, fuzzy feeling. So that's busted.
Now, the second myth I want to bust is that magical code needs to smell good. And here I'm referencing Jenny Brian's amazing talk, code smells and feels. And in this slide of that talk, Jenny is discussing how the level of indentation and nesting can be used as a metric to identify code that can be refactored. So going from something on the left to something on the right. Yeah, cool. So here's DataPasta. And look at all that paste, paste, paste. Yeah, so, like, this is a little bit scary to me. I do feel, like, some pressure to rewrite this, but at the same time, like, I'm a little bit scared that I don't fully understand what's going on here anymore. And actually, I don't mind having this, like, around. This is, like, nostalgia for me. This is what I thought writing R code was only three years ago. So I've come a fairly long way since then.
So it's nice to have this touchstone, and, you know, it helps to be in good company. So this is one of the most magical functions on CRAN. There's some hints in there. Can anyone pick it? So this is auto.arima from forecast. And you just kind of, like, fling at a time series, and it will catch it and churn away and give you back the best model that it can, which is really awesome. So if this is good enough for Rob Heinman, and this is good enough for me, it's good enough for you, and, like, the properties of your code in no way correlate to the amount of magic that it can make someone feel. In fact, it might even anti-correlate, but that's another story.
So now I want to make an observation, right? I think it would be really cool if we had, like, an organization in our community that was dedicated to, like, welcoming amateurs and making them, like, empowering them to become toolmakers, right? So we need a forum and a website, of course, and that's all, like, details, details. But what I want to discuss right now is, like, the important issue of, like, the logo that we'd use for our hex sticker, right? So I want to propose this motif. This has been my progress tracker up until now. But it's also a mana orb, and a mana orb, like, represents a pool of magical energy in, like, computer games and video games and stuff like that, right? So we're an R community, so we drop an R on that.
But the reason that I'm showing you this is because, like, I don't know if this is a coincidence. Just, like, I don't know if this is a coincidence. This is, like, this is the first RStudioConf at Harry Potter World, you know? And it's, like, maybe there's something embedded in our collective understanding of what we're doing here. I don't know. I've got some more examples that I might blog or tweet or something.
So this is basically my idea. There are two ways to create software that feels like magic. Number one is, like, directly. And you can, you know, create something that passes this test, take away user pain. And number two is indirectly. And you do that by supporting these people, our amateurs, you know, listening to them and empowering them to become toolmakers. So that's my talk. Thanks very much for listening.
Q&A
Thank you very much, Miles, for that wonderful talk. The magic of this talk, of course, was it gave excellent insights and saved me from trawling 14,000 packages to do this work myself. So we have two or three minutes for questions.
Hi, great talk. So I feel like the distinction between creating magic and then someone calling it API obfuscation, I feel like is drawn between how much trust they have, right? In what you're doing and the functions that you're building, right? And so as a student, right, developing a package, how do I build that trust in the work that I do and, you know, make it so it's people can use it and have trust in that what's working on the back end is reliable and safe?
Yeah. Okay. That's a really good question. So if you develop something magical and no one ever uses it, I guess that's going to be a real shame. I mean, I can't really give heaps of advice. When I created Datapaster, I mean, the thing that got Datapaster off the ground really was that GIF. I tweeted that GIF, and that just sort of went viral from there, and I think the use case was so compelling that, like, the trust issue wasn't, like, what's the cost here? You know? Like, if this doesn't work for you, and often Datapaster doesn't work for people, but, like, you've lost nothing, really. You've lost a couple of minutes, and you'll go back and you'll do this the normal way. But if this does work, like, it saved you a whole bunch of time. So I think it's a combination of, like, making a compelling case and also, like, putting it out there.
So sometimes in software development, deep magic or dark magic is kind of a pejorative. Right? So I'm wondering about, you know, you're talking about reducing user options, and I think that's generally a good idea, but one of the worst experiences is when the magic kind of runs out and you find out that the tool can't do the thing you want to do. So I'm wondering how you think about balancing kind of exposing some of the deeper workings of a thing and allowing people to go deep when they need to.
Yeah. I mean, I actually think you just have to really be led by your users and listen to them. And so, like, I wouldn't, like, put those options in speculatively, but I guess I would try to be really responsive and listen to what people want, you know, and listen to the issues that are coming up on my GitHub and people are tweeting about. So I guess just in time magic, maybe.
