Alex Hayes | Solving the model representation problem with broom

Transcript#

This transcript was generated automatically and may contain errors.

Who am I and why am I talking to you about broom? I interned at RStudio over the last summer and worked on the broom package and have since taken over maintaining it. So if there's any bugs, send them to me and I will try and fix them. And also the person who originally wrote broom is at RStudio conference. It's David Robinson. So give him a big shout out if you use broom at some point. Um, but I'm going to talk to you about what I'm calling the model representation problem and how broom kind of alleviates that to some degree. If you want to follow along with the slides, these are available online and you can download them.

The model representation problem

Okay. So anyway, what is the model representation problem? So if you have spent some time in a stat class or maybe even reading some tutorials online, uh, you will probably see things like this somewhere. Here I've said, okay, there's this YI and I'm using this little tilde, which means that this Y is a random variable and it has a normal distribution. And so you can write this down. And if you've seen this before, you probably know what it means. And similarly, you can kind of like pick one normal distribution. We have standard notation for that. And when we have like a way to kind of represent factors of parameters and you can write down estimates.

So in my mind, there's like these kind of three key objects. We have like models and then we have like individual fits for models and we have estimators, which I don't want to get into too much, but you can write these things down in ways that other people understand. So that's the, the kind of takeaway is that when you think of these things in terms of math, we have shared notation and community standards for how we write these things down.

Yeah. So, uh, but in, in code, that's not the case quite as much as one would hope. Um, and so it's just a quick example. So one thing that you might imagine they might want to do is if you have like some classification model, you might want to get a class probabilities for that model. Okay. But, uh, that's kind of hard to do sometimes. So most of the time you can get class probabilities, but the question is how do you get class probabilities? So what I have here is, uh, some example that I have stolen from Max, but you have these different, uh, objects or models that you'd like to predict from. And if you want to get class probabilities, there's all of these different kind of interfaces, right?

So it's not a question of whether or not you can do what you'd like to do. It's really a question of cognitive load. I don't want to have to remember all the different type arguments when I want to get class probabilities. Really. I just like to sit down and say, remember one thing and always get what I want. Okay. So what's happening is there's this extra friction that takes time away from being productive towards your end goal, which is really just like get the class probabilities and then get done with your job, right?

It's really a question of cognitive load. I don't want to have to remember all the different type arguments when I want to get class probabilities. Really. I just like to sit down and say, remember one thing and always get what I want.

Okay. So this is kind of what I'm calling the model representation problem is like we don't have a standard way to represent these things.

Alex Hayes | Solving the model representation problem with broom | RStudio (2019)

Transcript#

The model representation problem

How broom helps

Working through examples

Using broom in practice

Working with multiple models at once

Documentation and resources

Q&A

Featured software#

rstudio