David Robinson | The unreasonable effectiveness of public work

Transcript#

This transcript was generated automatically and may contain errors.

So I appreciate the bit of history, because this talk is going to start with a story, going all the way back to the year 2012.

It was a time when I was a graduate student, I was programming a lot, and like a lot of people that program, I spent a lot of time on Stack Overflow. So there was a time that I ran into a question on Stack Overflow that hadn't been answered. It was a Python question, and I realized that I was able to answer it. So that was the first question answered on Stack Overflow. And I found in the years afterwards, I was doing my PhD work, I was writing some papers, and I was teaching, but I was also answering some questions on Stack Overflow, and I discovered a couple years later that one answer I left turned out to have a rather large impact on my life.

So I answered about, let me see, 450 Python questions and 450 R questions, and one question about statistics, which was, what is the intuition behind the beta distribution? This is a question I saw, and I realized I had been given an answer to this individually in a couple of courses that I taught and people that I talked to, and this was a chance to make it public. So I wrote an answer in response about comparing the beta distribution in baseball statistics.

And I found out a couple years later, an engineer at Stack Overflow, the same company that I was answering these questions on, discovered this question when he was working on improving AAB testing. So he ended up tweeting about that answer, and then he followed it shortly with, I don't know how much you're enjoying your PhD, but if you want an interview here, you can have one.

So a couple of weeks later, I interviewed with him, and that was my first data science job at Stack Overflow. I later learned the conversation that had gone on internally was the effect of, wow, what if we just hired that guy?

This is what I'd call a freak accident. It's not something I would necessarily want to learn too much from. I tell the full story in my blog post, One Year as a Data Scientist at Stack Overflow, but it is something that has profoundly affected my philosophy around public work. You see, when I was in graduate school, I thought of my goals like this. I thought, well, I'm going to start with an idea, and that's just the beginning. Then I'm going to start working on it, get some preliminary results. Then I'm going to draft a manuscript, and it'll be most of the way there. I'll complete the manuscript, and finally, one day, I'll have a valuable published paper.

What I realized is I should have been thinking of my goals a little differently. Anything still on your computer is, to a first approximation, useless. It's not going to be shared. It's not going to be used by anyone else. When I look back at grad school, the things I did that ended up not being publicized, things that stayed on my computer, even I've forgotten them. If there is anything out in the world, it could be a published paper. It could be a product, like a web product. It could be a blog source. It could be an open source contribution, or it could be as small as a tweet. All of those are way more valuable than anything that you didn't share.

Anything still on your computer is, to a first approximation, useless.

Fundamentally, I say it's a talk about public work, but it's really a talk about sharing. It's a talk about taking what we have and what we've gotten good at, taking our skills, and then sharing them with other people.

They say that 80% of success is showing up. And similarly, I'd say to some extent, 80% of success within public work is getting your work out there.

Thank you.

Q&A

I certainly hope that we have the fun throwable mics, maybe, out there. But we have definitely some time for some questions for David.

One of the things that I think can be really hard if you're trying to write blog posts about data science, especially if you're sort of throwing it out into the void, is getting sort of, like, editing advice or maybe, like, any kind of feedback on your post before or when it's sort of, like, in draft form. Is there any kind of way in the art community we can sort of support that?

I think that's a really interesting question. I generally recommend when you're writing your first few posts and maybe you're not sure that you could use some feedback, still getting them out into the world. The art community is generally, I've found it very welcoming and is a good place where people can give some feedback. But I think that's the kind of initiative. I also think you can certainly go to your local community. So when I was in grad school, my lab mates and my advisor were fantastic resources. Your manager or colleagues are also great resources. I think it's the kind of thing that's probably worth formalizing in some kind of program. I don't know much about it. I don't know of anyone that's tried that. But, like, I don't know, Feedback Fridays. Feedback Fridays. You heard it here first.

My name is Reina Harris. That was a beautiful talk. Thank you. The one thing you said, you need to have advice and experience to write a book. So what's maybe the harm in writing a book too soon?

I don't think I've ever found a case where someone wrote a book too soon. One of the reasons I'm so encouraging throughout the book of doing public work is I think it's very rare that people fall on the side of publishing too often. Obviously, there are exceptions like data breaches and such. But I don't find that people, say, are writing too many blog posts analyzing data when they're too early in their career. That's certainly the other direction. I think the risk of writing a book too early is that you wouldn't have enough to fill it. I think the kinds of people I find, once you start, when you want to write a book, it's generally the kind of people that could fill up a university course with material. That might even be a little bit too much, but could certainly give a long talk, maybe a two-day workshop, and can think of enough facets of the problem where they'd want to give useful advice.

I particularly think in this sense about both technical and professional help advice. I think if I imagine myself trying to write a book when I, let's say, just started grad school, I feel like I would run out of things to say.

One thing I found is I keep running into people that have published books before they started their PhD or during their PhD. It's really not a thing that has a hard age or career cutoff. I really recommend thinking in terms, if you say write a lot of blog posts, how they can be linked together in the long form documentation.

Thanks for the presentation. Do you have any opinion on the importance of updating older blog posts versus investing in writing new ones?

I'd say never do it. I'll tell you why you never update old blog posts. Because if you do, it stops you from writing future ones. It's not simply because you're spending time, you could be publishing the new ones. I would say it's because when you write a blog post, one of the amazing things about it is that it gets to be done. We call it the cult of done, is a popular description of this. So obviously you should fix errors or other problems in old blog posts. But if you find yourself treating a blog post as a resource that has to be maintained, for example, if I went back to my... I think I have a very old blog post that uses data.table. If I went back and used dplyr , if I went back and fixed the various things that have broken thanks to new dplyr releases, that would mean every new blog post that I wrote would become an obligation. And that would discourage me from publishing them in the first place. So it's a bit of an extreme answer, but I'd actually say I would not prioritize that very heavily, with clear exceptions such as if someone finds something truly wrong in it or dangerous or other things like that. I'd say put it out there and let it be done. Work on the next thing.

Thanks, David. I have a really important question for you. Did I pay you to promote the two books of mine?

Was that an option?

Alright, the real question. So I actually have a problem with tweeting, which is discoverability. I mean, many times I see there are excellent tweets sharing excellent ideas, but I feel like if I miss such a tweet, I feel it's just gone forever. So do you think that is a problem? And if you do, how would you solve that?

Yeah, I would say the advantages of Twitter are very ephemeral. It's very much, during this conference, see everything that's happening. It's not an ideal way to crystallize knowledge for future use, not the way a blog post or especially a book would be. Having said that, there are a few ways that it makes things permanent that I really enjoy. So if you haven't tried this, you should try it. I'm not going to do it on my computer, but you could go to Twitter and do from colon a username and search only within one person's tweets. So if you know, oh, I said something once, or Emily said something, or I want to find a tweet where Jenny shared this really good piece of advice, you can search just within one person's tweets. One way that I use that is for finding my own conference tweets. During a conference, I generally treat Twitter as a public diary. I'm not very good at note-taking, so instead I live-tweet each of the talks with what I find most important, and afterwards I can go back and discover the things that I wanted to share. So throughout this talk, I do include a few of my old tweets, and I did find them in this way.

So I think after hearing you talk, we all seem very motivated to write blog posts, but how would you recommend that people stay, like keep the momentum and continually publish blog posts, not just like after this conference right now, but maybe in like three months, six months, even a year?

You know, when I started my blog, it was late 2014, and it was around the time also that I started the Broom package, and I waited a few months, and I wrote four blog posts because I knew that once the blog was up, I assumed I'd be out of ideas, and I could never blog again, so I wanted to have four saved up that I could publish. And every single time I've published in a blog post since then, I've thought, yep, this is the last one, I'm out of ideas. What I'd say is the best way to keep motivated is to start that feedback loop, is once you put ideas out in the world, you get feedback on them, you get people excited about them, you get compliments on them, sometimes you get recognized, and these are ways that build your own excitement and keep you motivated to keep doing work online. So I'd say there isn't a shortcut to starting and putting your work online.

David Robinson | The unreasonable effectiveness of public work | RStudio (2019)

Transcript#

Types of public work and why they matter

Blogging

Twitter

Contributing to open source

Giving talks

Recording screencasts

Writing a book

Why work publicly

Q&A

Featured software#

rstudio