Podcast | Not So Standard Deviations Episode 100 | RStudio (2020)

Transcript#

This transcript was generated automatically and may contain errors.

I'm really excited to announce our last, but absolutely not the least, keynote speakers are Roger Ping and Hilary Parker, the hosts of Not So Standard Deviations. If you've never heard of it before, really fantastic podcast, they've never invited me to be a guest speaker, but fortunately I'm not bitter about that at all.

But seriously, I'm very long-term fans of Roger and Hilary, Roger has been a very supporter of the Tidyverse really before it was the Tidyverse, and I think his USAR 2018 keynote does a better job of kind of articulating the philosophy and the benefits of the Tidyverse than I've ever managed.

I'm also very fortunate to call Hilary a friend, Hilary is a fantastic data scientist, I've had a number of really insightful conversations with her, one recent one which led me to really understand like one of the things that I do is design, but as well as being a data scientist at Stitch Fix, Hilary, I'm also very lucky that Hilary has acted as my stylist and she actually just picked out the clothes I'm wearing today.

So I'd like you to please join me in welcoming Hilary and Roger.

And what was making sense is that design... the way that designers work and the way that they structure the work, the way they talk about it, you could essentially, like, swap in the word data analysis, and it was the same thing.

But, so, that's kind of, like, the end of our audio clips. But, like, essentially we got to this place where it's, like, okay, data analysis is a type of, essentially, like, either an independent thing or a type of design thinking, where it's, like, the answer to the student's question is essentially, like, adopt this whole other way of, like, approaching the world that's totally different than science. And we haven't, like, totally, as a field, we haven't totally, like, addressed that difference.

I kind of wonder what I would have told that student now, you know, more than 10 years later. And, you know, I think the way that I kind of think about it is that when you go outside, you don't see a data analysis walking around, right? It doesn't naturally occur like a tree or, you know, what else naturally occurs? A rock, right? So if it doesn't naturally occur, it has to be built by someone, or it has to be designed by someone, it has to be built. And so why shouldn't we use the same ideas there for a data analysis as for, you might, for a chair or a bridge or whatever?

And then I also think what's interesting is that whether you're building, like, a production machine learning pipeline or you're building an analysis, like, the tools are different, the types of testing you'll do is different, like, kind of, like, the technical requirements are different, but ultimately you're doing the same thing, which is, like, you're creating something that's gonna do something for someone. Like, either it's gonna be a recommender system that creates recommendations for a website or it's, like, this analysis that some sort of person's gonna consume and make a decision based on that. Might just be a PDF document? Yeah, it could be, I mean, even, like, an email, literally even an email with, like, one number in it is still, you have to decide, like, okay, this problem is only, this problem can be sufficiently addressed with an email and that person doesn't need much context and, therefore, if I just send them the email, that'll be enough to, like, convince them to do something.

And so that put us on a really interesting path of, like, okay, so if we think that data science is, like, a type of design, then how do designers work? Like, if you look at architects, how are they trained? If you look at, you know, other designers, how do they work in companies, et cetera?

And so one of the things in this design thinking book by Nigel Cross is talking a lot about, like, you just have to do it, like, over and over, and you're gonna get better every time you do it. You're gonna go through different thought processes, but you have to literally, like, exercise the part of your brain that does this constructive thinking rather than, like, the deductive thinking of science. So it's like, okay, so in data science, we kind of don't do that apprenticeship model so much. I mean, I think we wish we didn't have to. Yeah, like, we wish we could just, like, teach the stuff and be done with it, because it's a lot of work to, like, go through someone's work and tell them if it's working or not. But ultimately, like, if we kind of accept that this is a type of design and construction, then you need to be able to, like, practice it.

So we did some cool stuff with that where kind of, like, very different than the types of challenges you do with, like, Kaggle and stuff, where they're like, here's a data set. Analyze it. Instead, it was like, okay, let's say you had to build your system end to end. Like, you needed to answer some question. We did, like, commute times. Right. Yeah, my commute. So it was like, in San Francisco, it's like, I want to know exactly how many minutes and the variance in those minutes for different commute methods so I can know, like, the last possible minute that I can leave my apartment to make it to a meeting.

And so it's like, okay, how would you solve that? How would you get the data to solve that problem? How would you store it? How would you access it? How would you analyze it? How would you display it? And then how would you model it? What are the fixed effects? What are the random effects? It's amazing how just a simple kind of formulation of a problem can bring in every single aspect of design, of analysis, you know, presentation, communication.

And what I like about it, too, is that the things like how you store the data is sort of on the same par as, like, what models you choose. Like, all of those things are equal. You have to end up spending time on all of it instead of... I feel like as a field, and what I like about this conference is that, less so here, but as a field we focus so much just on the methods, and that really bothers me, in case that hasn't been clear.

And I was actually talking with someone last night about that, where I'm like, you know, in some ways like, the way that normal data science conferences are structured, it's like, everyone wants to talk about the best method, like the latest thing. And literally, in that system, you're motivated to have bad data, because it gives you more opportunity to do fancier models. And so, like, any system where it's like, oh, okay, this person could work on worse and worse data and be equally happy because they get to do all these fancy things they learned at this conference, is like, that's not solving the problem. That's like choosing not to solve the problem.

So by focusing on the whole thing, it's like, no, okay, can you hone in on exactly what data you need, rather than just making do with what might have fallen into your lap somewhere. I think one of the hardest things to do in general, but also in data science, is to kind of pull back from trying to maximize in a single dimension. And I think, to me, the way I interpreted even JJ's talk, is like, the way that corporations are structured is they maximize on a certain one dimension. And I think it's hard, that can end up with some good and some bad. And I think in data science, it's very tempting to kind of go for the optimal approach, go for the optimal method, go for the best prediction. But there are other elements as other stakeholders, there are other trade-offs to be made.

I really like that because in the second talk, she just touched it briefly at the end, but it was like, what are UX problems and what are designer problems? And then the one that was like, this is actually a design problem, was the loss function. It seems like it's just a data science problem, but that's actually the core user experience, is what loss function you use. And what system are you actually building?

I feel like it would be great if we could move to a place where instead of saying, this is the best thing, we could say, I really appreciate this set of trade-offs.

Creativity in data science

One other thing that, kind of like the end of this timeline, this has just been briefly, but essentially by getting to this place, now it opens up a whole other set of fields to look into, which is everything around creativity.

The book I'm mentioning, that we're mentioning here, is called The Creative Curve, and we had a one-episode book club on that. Learn our lesson. The idea with that was that it was someone, this guy, Alan Gannett, who essentially empirically studied creative people and how they operate. I thought it was awesome. I really liked it. It really made me feel like, oh, I can apply these principles to my data science work.

The principles were... there was consumption. Creative people will frequently... people who are engaged in movie making, they watch movies 20% of the time. You look at... just again, from empirically looking at people in these creative fields, consuming other people's work is a big part of it. You digest it, you think about it, and you iterate on it. If you want to write, you read a lot of books. If you want to write music, you listen to a lot of music.

One of the aspects that's difficult about data analysis is it's not always easy to read a lot of data analyses because they're not out there. Especially in the corporate world, data science specifically. If you have a big team, you can read whatever other people are doing, but otherwise, basically only at conferences. Or David Robinson's livestream. It's hard to consume a lot of data analysis is what I think it comes down to. I think that's a key step to becoming an expert in something.

Iteration. Iteration is this idea that you just keep going. You keep doing things. You do it over and over. Again, that is touched in the design literature too. You've got to keep doing things. Ideas are going to evolve over time. Part of why we wanted to go through those clips and put up this timeline is that even the creativity that we got from this podcast that I totally wasn't expecting. It took us four years. This timeline is over four years. The number of ways that we attacked this problem and thought about it was a lot. It never felt like it was work necessarily. It was just iterating and going on.

Then there's the community. Another one is creative communities. You look at artists. Andy Warhol had the factory where it was just a bunch of artists in a loft painting, giving feedback. With that, surrounding yourself with other creative people helps you be creative.

Coming to conferences like this, they're fun, they're energizing. I do think they embody this creative community. Especially because so many people are isolated as data scientists. Not every company has 100 data scientists. Most companies don't. Being able to get together and bounce ideas off and see what other people are doing. I guess what I'm trying to say is that is the work. It's not like this isn't productive, if that makes sense.

There was one agenda item for starting the podcast. Part of it was to produce that community of people who may be sitting in a department somewhere or in a company somewhere by themselves. They're forced to be the lone data analyst. They don't have anyone to talk to or they don't have the ability to hear what other people are working on and how they're approaching it. That's some of the most meaningful feedback I hear. When people are like, I'm alone and it's really nice to be able to hear how other people are working. That makes me feel really good.

I guess what I'm trying to say is that is the work. It's not like this isn't productive, if that makes sense.

And then feedback. Feedback. This one is definitely hard. A big part of in this creative curve book and just in general with design you actively are soliciting user feedback all the time. That's a huge part of it and you cannot be attached to what you've produced, essentially. You have to let the users tell you what to do. I think that that is extremely hard. Getting feedback is hard. Definitely not something that people are trained to do.

Maybe it's just curmudgeonly because I see it the most but I feel like, especially in academic stats and in data science in general, people do take it really personally. There's that quote that I kind of hate where it's like, photographers and statisticians both fall in love with their models. You see people really dig their heels in and it just gets personal.

What I like is that, at least with the user, I think in the design field there's a lot more focus on that accepting feedback.

Also, another thing, Hilary's personal corner, the one thing I did not expect from starting this kind of meditation practice and going down that path is that it genuinely made me better at getting empathy and getting feedback. The whole idea is dissociating yourself from who you think you are. It allows you to not feel personally threatened if someone challenges you. People take it personally when your whole identity is that you're a statistician and you're smart. If someone says your model is wrong, it's like, oh no, I'm not smart anymore. My whole identity is gone. Then you're defensive.

Figuring out ways, and for me, meditation, just figuring out ways to detach yourself from that makes you able to take feedback. Then that actually makes you better at the thing. It's kind of like a paradox. It's not like a magical thing. It's something that can be practiced. Exactly. The thing about the meditation practice that was a big paradigm shift for me was that these things aren't just fixed properties. It's not just like, oh, this person's good at getting feedback or this person has a thick skin and that's a set character trait and mine is not and that's done. It's like, no, there are ways to engage in practice that makes you more robust to that.

Again, looking at the neuroscience and Buddhism stuff, you have neuroplasticity and you can make new neural pathways. It's not just like, oh, yeah, I swear. There are ways that you can essentially change your brain. Actually, the Nigel Cross stuff goes into that. You look at people who are cab drivers and their spatial regions of their brains are much more developed.

Doing creative work is hard and it's very different than doing scientific work and it's also what we're doing. There are ways to get good at it. There's ways to practice it. There's ways to get good. Doing this whole podcast, being okay with reiterating and showing things that will be wrong in the future is like, it's okay and actually it'll make you better.

That's kind of why I feel like I've wanted to talk so much about the design thinking is because I want people to feel empowered to engage in that part of honing their craft and not feeling intimidated by it. I think probably a lot of people in this room don't think they're creative. I didn't think I was creative. It's just like identity. It's like I'm a scientist. Hillary does math and science. I want people to go through that same process I went through of opening up that side of myself and being like, this stuff all makes me better at this work and it's more fun because of it. You can do it too. Anyone can do it.

That's kind of why I feel like I've wanted to talk so much about the design thinking is because I want people to feel empowered to engage in that part of honing their craft and not feeling intimidated by it.