Resources

David Robinson | The unreasonable effectiveness of public work | RStudio (2019)

In this talk, I'll lay out the reasons that blogging, open source contribution, and other forms of public work are a critical part of a data science career. For beginners, a blog is a great accompaniment to data science coursework and tutorials, since it gives you experience applying practical data science skills to real problems. For data scientists at any stage of their careers, open source development offers practice in collaboration, documentation, and interface design that complement other kinds of software development. And for data scientists more advanced in their careers, writing a book is a great way to crystallize your expertise and ensure others can build on it. All of these practices build skills in communication and collaboration that form an essential component of data science work. Each also lets you build a public portfolio of your skills, get feedback from your peers, and network with the larger data science community. VIEW MATERIALS https://bit.ly/drob-rstudio-2019 About the Author David Robinson David is the Chief Data Scientist at DataCamp, an education company for teaching data science through interactive online courses. His interests include statistics, data analysis, education, and programming in R. David is co-author with Julia Silge of the tidytext package and the O’Reilly book Text Mining with R. He also the author of the broom, gganimate, and fuzzyjoin packages, and of the e-book Introduction to Empirical Bayes. David previously worked as a data scientist at Stack Overflow, and received a PhD in Quantitative and Computational Biology from Princeton University

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

So I appreciate the bit of history, because this talk is going to start with a story, going all the way back to the year 2012.

It was a time when I was a graduate student, I was programming a lot, and like a lot of people that program, I spent a lot of time on Stack Overflow. So there was a time that I ran into a question on Stack Overflow that hadn't been answered. It was a Python question, and I realized that I was able to answer it. So that was the first question answered on Stack Overflow. And I found in the years afterwards, I was doing my PhD work, I was writing some papers, and I was teaching, but I was also answering some questions on Stack Overflow, and I discovered a couple years later that one answer I left turned out to have a rather large impact on my life.

So I answered about, let me see, 450 Python questions and 450 R questions, and one question about statistics, which was, what is the intuition behind the beta distribution? This is a question I saw, and I realized I had been given an answer to this individually in a couple of courses that I taught and people that I talked to, and this was a chance to make it public. So I wrote an answer in response about comparing the beta distribution in baseball statistics.

And I found out a couple years later, an engineer at Stack Overflow, the same company that I was answering these questions on, discovered this question when he was working on improving AAB testing. So he ended up tweeting about that answer, and then he followed it shortly with, I don't know how much you're enjoying your PhD, but if you want an interview here, you can have one.

So a couple of weeks later, I interviewed with him, and that was my first data science job at Stack Overflow. I later learned the conversation that had gone on internally was the effect of, wow, what if we just hired that guy?

This is what I'd call a freak accident. It's not something I would necessarily want to learn too much from. I tell the full story in my blog post, One Year as a Data Scientist at Stack Overflow, but it is something that has profoundly affected my philosophy around public work. You see, when I was in graduate school, I thought of my goals like this. I thought, well, I'm going to start with an idea, and that's just the beginning. Then I'm going to start working on it, get some preliminary results. Then I'm going to draft a manuscript, and it'll be most of the way there. I'll complete the manuscript, and finally, one day, I'll have a valuable published paper.

What I realized is I should have been thinking of my goals a little differently. Anything still on your computer is, to a first approximation, useless. It's not going to be shared. It's not going to be used by anyone else. When I look back at grad school, the things I did that ended up not being publicized, things that stayed on my computer, even I've forgotten them. If there is anything out in the world, it could be a published paper. It could be a product, like a web product. It could be a blog source. It could be an open source contribution, or it could be as small as a tweet. All of those are way more valuable than anything that you didn't share.

Anything still on your computer is, to a first approximation, useless.

Fundamentally, I say it's a talk about public work, but it's really a talk about sharing. It's a talk about taking what we have and what we've gotten good at, taking our skills, and then sharing them with other people.

Types of public work and why they matter

In particular, I'm going to walk through a few types of sharing that I think are incredibly effective, everywhere from blogging to writing a book. I'm going to share my observation that no matter where you are in your career, these are probably helpful, though each of these media probably becomes more helpful at later stages of your career. So I'll start at areas where, even if you're an undergraduate student or an aspiring data scientist, you can still get into some of these media, like blogging and tweeting, and if you're someone with a lot of experience and expertise, some other technologies that you could use to share your work.

So I'll give three reasons for spending time in public work, and they're going to keep reappearing throughout this talk. The first is to advance your career, really to build a public portfolio of your work and let other people know about your skills. It led to a job for me, but it's also led to other opportunities and many other things that wouldn't have happened if I hadn't been sharing my work publicly. Certainly this keynote wouldn't have happened if I'd just been working without sharing anything that I did online.

Another is practicing good habits. I find that public work builds particular kinds of habits that you wouldn't build if you were, or at least more effective than you build them, working within one company, one team, or just on your own. I'll go through a few examples.

And finally, it helps you contribute to the community. There's selfless reasons that we want to share public work, to make sure that everything that we build, other people can use.

And a disclaimer. One is that, as I've already described in my own job story, a lot of this talk is what worked for me. And that means we need an obligatory XKCD about survivorship bias. The fact is, we do need to observe that if I talk about just what had been working for me and my particular successes, maybe you don't see people that try that and it didn't work for them. I think this is very much worth keeping in mind and maybe taking it with a grain of salt, but I'd also say that I've seen a lot of people who have had a lot of success by sharing their work online.

And throughout this talk, I'm going to share particular examples of other people who, through blogging or tweeting or other types of public work, made a difference for themselves and their community.

The second disclaimer is that this title is called The Unreasonable Effectiveness of Public Work, not The Unreasonable Necessity of Public Work. There are other ways to succeed and there are other ways to have a really fulfilling career. So, if you're someone who, some people just aren't going to have the time, maybe the ability or the interest in doing public work, but I'd like to convince as many people as possible that it might be interesting for you.

Blogging

Why start a blog? Well, you can see a couple of reasons. One of the most important is building a public portfolio. If you're someone early in your career who might start doing job interviews, a blog is not just a great way to get employers to hear about you, it's a way to put your best foot forward. You can take whatever kinds of analyses you're most interested in doing, whatever skills you're interested in showcasing, and put them online so that other people can use that to understand what you'd bring to a company.

Another is that blogging is an incredibly effective way to practice writing, visualization, and other kinds of communication. In Hadley and Gary Grelin's book, R for Data Science, they talk about the importance of communication within the data science workflow. And it can often be hard to practice communication within a typical data science pattern. Sometimes it's hard to fit into a usual job. And blogging is all about what visualizations can I create, how can I structure my argument, how can I structure these points, and I've found that blogging has been one of the ways I've most improved my own writing skill.

Third is that it's a way to teach or give advice in a way that scales, a way to change the community for the better by thousands of people at a time. So I've shared a thought that when you've written the same code three times, write a function. When you've given the same in-person advice three times, write a blog post. So the reasons for writing a blog post are similar for the reasons we'd write a functioning code, to avoid constant reuse. How often do you find yourself in a conversation, maybe at this conference, where you say to someone, oh, I have very strong opinions on that, let me try and walk you through them. And you meet someone else, and you walk them through those opinions as well. If you have a blog post, that's the opportunity for thousands or tens of thousands of people to read that opinion. And any time in the future you might want to change someone's mind, you can point them towards that.

So how do you start a blog? There's the extraordinary blogdown package by Yihui Hsieh, as well as Amber Thomas and Allison Hill, who wrote the blogdown book, about creating websites with R Markdown. So if you've used R Markdown for creating reports, you can now use one R Markdown file for each post in a blog. There's a really fantastic guide towards starting your own, and it's one that lets you integrate R, turn from R directly to code, results, and figures in your blog posts.

I give a lot more advice in terms of how you can start a blog in Naturally, my blog post, advice to aspiring data scientists, start a blog, which I'm now pointing all of you towards and thereby proving my point.

One of the points that I make in this blog post is that it can be very difficult to, it can feel like you're screaming into the void when you first publish online if you don't already have people that are following you. So my recommendation is if you start a data-related blog, tweet me a link, and I'll tweet about your first post. In the year plus since I made that offer, I've had the real privilege to get to retweet and tweet about a lot of people's work, a lot of people from all over the spectrum of a data science career. I'll share a couple examples of those in a moment.

So if you're wondering what could I blog about, the place I'd always start is analyzing a data set. I think this is the bread and butter of data science blogging, is I've opened, I've created a new post, I'm going to start with a data set, I'm going to share some things that I learned. And for that, I think one of the most remarkable projects that enables that is TidyTuesday, run by the RF for Data Science online learning community. So this, each week, TidyTuesday releases a new data set every Monday, and then on Tuesday, the community generally discusses it, shares the results. So a fantastic approach to starting a blog is go to TidyTuesday, find the latest data set, or even an older one that you're interested in, share what you can learn from it.

I think there's a lot of data sets you can use, though, and I want to highlight an example of someone who did it to remarkable effect. One of the people who shared their blog post with me was Jeff Kao, who was at the time an aspiring data scientist at the Metis boot camp. And he wrote an article discovering that more than a million net neutrality comments were likely faked. He used natural language processing to discover enough repetition in these comments to show they were created by bots. And that actually got national news coverage and a lot of really notable attention. So one good place for looking for analyzing data might be current events that you're interested in and have a data-driven perspective to add.

Another really good place to look for blog entries is if you're an academic, for every paper, turning it into a blog post. This is great advice from Philip Guo, where he shares that everything after his Ph.D., he's taken each of the papers, turned it into a blog post, and encourages students to do the same, because 10 to 100 more people will read blog posts than read a published paper. So peer-reviewed papers are a very important part of the public work ecosystem, but they're very well complemented by blog posts.

Another great thing to do in a blog post is to teach a concept. So Mohamed was an aspiring data scientist who shared this post with me, and it was a really excellent look through some topics that he'd been learning about performance metrics for machine learning. So as he was learning these concepts, as he was taking classes and mastering these skills, he was also publishing this work online. When I talk to people that are taking data camp courses and are learning to become data scientists and they're asking me what's next, that's almost always my advice, to try creating blog posts where they share what they've learned.

Here's another fantastic example of teaching a concept. This is a blog post by Heather and Jacqueline Nolis about creating APIs in R. So this is the kind of tutorial where you include code, sometimes backing it up with some open source work, and therefore enable lots of people to be able to take this same approach.

If you want to see loads of examples of people, everywhere from aspiring data scientists to experts, who ask me to tweet about their blog posts, you can check out the data blog hashtag on Twitter. Like I said, it's been really inspiring.

Twitter

Speaking of Twitter, the next kind of medium that I would recommend sharing your work in is Twitter. Who here has a Twitter account, let's say an active Twitter account? That's what I like to see. I think probably at least half. It's really exciting.

Twitter is a short-form medium, that means it's not necessarily where you'd publish your work in the first place, but it's an amazing place to promote and discuss your work. So each time I come out with a blog post, I generally share it on Twitter, along with a graph from it, and point people towards it. It's a good way to start discussion, get some comments, get some other opinions.

But it doesn't have to be your own work. It's also a fantastic way to promote the work of others. So Mara Averick, this is a really fantastic example from Mara Averick, where in December, she decided to run a dev advent calendar, where every day she highlighted another data scientist and some of their work. So this was a fantastic way to use this platform of sharing to better strengthen the community and promote some people. So Mara is certainly, if you follow one person in the Twitter-R world, you can certainly follow her.

It's also an amazing place to analyze datasets with the community. So I already mentioned the TidyTuesday project. Once people have finished analyzing a week's dataset, they can take that dataset, publish it, and discuss it, and share it on Twitter, and there are a few accounts, like the R4DS community account, that will generally retweet it. So if you're worried, I'm doing an analysis, is anyone going to see it, or is anyone going to think this is a topic worth analyzing, this is a really fantastic place to start.

Another great example is sharing what you've learned at a conference. Who's been live-tweeting this conference? Solid, solid, yeah. So I'm going to share my favorite example of someone live-tweeting RStudioConf, who's Brooke Watson.

Brooke's been doing these amazing posts where she takes notes, does a drawing of a speaker, and then shares it online. So this is definitely another fantastic person to follow. But there's lots of other creative ways to share the talks you're seeing, take a picture of a slide, share something you've learned, a particular funny joke, and somewhere else Twitter is really useful. I think it's especially amazing because we have to remember, not everyone is going to this conference. Even people that aren't even watching the stream of it, they still have the opportunity to learn something from it.

Finally, just because Twitter is a short-form medium doesn't mean you can't share work on it. Here's a great example from Paige Bailey, who's at Google, who shares sets of code and then visualizations that come from it. So this happened right after Twitter got 280 characters and this started being a little more possible. So it's a really good place to share an example of code, an R tip, maybe just a link to a really good data set, something that is useful in itself.

As we'll see later, I think there's a lot of relationship between being able to communicate well in a short-form like this and being able to work in longer-form media. So in fact, one thing I discover is that sometimes if you have a really short post, even the fastest thing you do, sometimes, unfortunately, it can be much more popular than anything you put a lot more time into.

Contributing to open source

So move on to the next area. This is something that once you've gotten a little bit of comfort, at least you have some comfort programming in R, you certainly don't have to be an expert. But once you're ready, an amazing point is to contribute to open source. So this is a slide from Mara Averick's talk, Contributing to Tidyverse Packages, that points out the motivation for open source is that it helps both yourself and the larger community. So when you find there's something you're working on, that's an amazing time, that you generally find useful, that's an amazing time to turn it open source. And the reason it's so helpful is that when you contribute to the community, you'll find the community contributes back.

So this was an example of some code that I wrote while I was in graduate school that I discovered that I often work with a linear model and I'd want to turn the coefficients into a data frame. So I had this code and I was finding it pretty useful for a few different problems I was working on. I decided to release the open source package, Broom. So I ended up promoting it on Twitter, as described before, it's a great way to share projects like this. And I found within the following years, it has gotten more than 200 pull requests. Other people improving this package. It's now grown, last year, in fact, it was taken over by Alex Hayes as a full-time maintainer. So this project has gotten so much bigger than I could have ever built, thanks to the fact that I was able to share it online.

This kind of contribution can start just with a pull request. So here's a great article from Nick Crane on 10 Steps to Becoming a Tidyverse Contributor. So these are examples of how you can go into Tidyverse packages, discover some issues that you need resolving, and help out. It could be a bug fix, it could be a new feature, it could be as simple as a typo, but it's a great way to start working in open source. I think a particularly good initiative done by some Tidyverse packages, including Alex Hayes on the Broom package, is to create a list of beginner-friendly issues. These are issues that we would say, if you're just starting on GitHub, try going to this list, finding one that you can approach, and work your way up from there.

If you're somewhat more comfortable, it's a great time to create a package. Who here has created an R package? Nice. Who's published it on GitHub? On CRAN? Still pretty fantastic. Creating a package is an amazing way to get a lot of people to use your code, but it's also a great way to practice skills in documentation, unit testing, and creating intuitive user interfaces. So you can have a lot of experience coding for yourself on your own projects, but you'll still be missing the kinds of practice that open source provides when you're writing code design for other people.

So Hadley's book, R Packages, is a fantastic resource for learning all the way from creating the code to writing the tests and documentation and publishing on CRAN.

Giving talks

Moving on to the next recommendation is to give talks. So why should you give a conference talk? Well, you're all listening to me now, aren't you?

But seriously, consider all the amazing things we've learned at this conference. So the last two days, these are a few of the amazing talks I've been to. I learned a lot about Shiny. I learned a lot about teaching. I learned about ggAnimate. I learned about R and APIs. I learned about what we've been doing in the ACLU. And this kind of conference is an opportunity to share what you're working on. Conference isn't a tutorial. It's not a resource. It's not an opportunity to go in front of people and make sure you change everything about them. It's not a chance to take all your expertise and put it in. It's a sales pitch. It's an opportunity to convince people that something is worth looking into.

Some great advice for giving a data science talk comes from my sister Emily's blog post, Giving Your First Data Science Talk. Some of the useful recommendations that she adds include that you should imagine giving the talk to yourself from three to six months ago. So if you're thinking, I don't have anything I'd want to talk about, think, what do I know that six months ago I wish I'd known? And then try turning that into a talk.

Another important aspect is to practice giving the talk out loud. It's even better if you can practice giving it to other people, but even just giving it to yourself out loud lets you notice where transitions might be awkward. And finally, she recommends you should not rely on bullet points. Oops. Sorry, Emily.

You might think talking's not for you. You might say, what if I'm an introvert? Well, there's some really useful advice that's here from Kevin Goldsmith, which is one of the reasons speaking at conferences is great is it makes networking a lot easier. People know who you are and have something to talk to you about. So recommendation for introverts is to try becoming speakers. I should know. I'm an introvert myself, and now I have something to talk to you all about. But if you're still nervous, Emily has some great advice. Just imagine that your audience uses p-values of 0.25 for significance.

There are a lot of fantastic places that you can speak. Some of them include local meetups. So you can find those often on Meetup. The R Ladies organization is a global organization that promotes gender diversity within the R community and has meetups all around the world. And here's a list of more than 350 R conferences and meetings in all kinds of countries that's really worth examining. So you can find a local meetup and give a talk there, but you can also give a talk at a larger conference like this one. Often, it's a chance to reach a larger audience and also a chance to network with the larger R community. When I've given talks at conferences like these, it's been really fantastic to meet people that are working on similar problems and are interested in talking to me about them.

Finally, just because you're giving a talk doesn't necessarily qualify as public work unless there's a permanent artifact. This is something to really avoid, is that if you give a talk, you feel like you've got it out there into the world, but you never share it online. That makes it fundamentally ephemeral. You want to make sure you publish your slides. You can find this talk at this link. I'll be sharing it on Twitter afterwards as well.

Recording screencasts

This next media of public work is one that I only became familiar with and started really thinking about in the last few months, but it's one that I'm really excited about, particularly recording screencasts. This is an example of a screencast where I've been taking tidy Tuesday data sets each week and opening them without ever having looked at the data before, going into them, exploring it, making graphs, trying statistical models, thinking about my conclusions for about an hour without ever having seen the data before.

So I found this a really exciting way to teach and to generally narrate my thought process and share it, particularly because it takes minimal preparation and a fixed time window. So if I'm at a point in my career where it can be difficult to schedule a lot of time to write, but I can always schedule an hour and a half to set up and record a screencast.

It's also really great for letting you teach tricks that you wouldn't have thought of teaching. So for example, I use functions from the 4CATS package for working with factors a lot during these analyses, and I discovered that a lot of viewers, I hadn't necessarily thought, let me go teach the 4CATS package, but people watching them pick up on these tricks and these small things that they can do in R.

Other things like the style of how I assign and name objects or tricks like manipulating variables within group by, again, these are things that I wouldn't write a blog post or a tweet about. I wouldn't think of making it a tip, but get to fit within a screencast.

The limitation of screencast is that you need to be pretty capable and confident enough to improvise your way through a screencast. This isn't because it'll stop you from making mistakes. You will make mistakes. It's about being comfortable enough, embarrassing yourself sometimes. So for example, in one screencast, I find myself crashing RStudio four times in an hour. In another, I forgot about the lubricatesMdy function, I find myself rewriting it from scratch. And in probably the most embarrassing one, I started using ggAnimate and realized I didn't know how to use the new version, which since I was the one who published the initial version, is a little bit embarrassing.

So this is something I don't know a lot of people who are doing screencasts. I've been encouraging some people I know to try out the format. One person who's doing a great job of it is Rachel Tapman from Kaggle, who doesn't just do screencasts, she does live coding on Twitch. So that's something I've thought about is, should I try doing the coding live? People can ask questions, can try and respond to feedback, they can point out the mdy function to me, and so on.

So I'm really excited to see where screencasts might go in the future as a medium of teaching.

Writing a book

Finally, I shared this advice before, when you've written the same code three times, write a function. When you've written the same in-person advice three times, write a blog post. Or, as Hadley advises, try writing a book.

So writing a book is generally something that you'd want to be more advanced in a career. Who here has written a book? E-books count as long as you feel like they're finished. Who's written a book?

Who thinks they plan to write a book sometime in their career? I'd encourage everyone to think about it seriously, because writing a book can seem somewhat intimidating, but it really fits as part of a continuum with a lot of these other media of communication. I think to write a book, the two things you need are a good amount to say, so you need to be a little farther in your career, enough to have a lot of advice you'd want to give on a technical or professional topic, and you'd have to have some practice in saying it. And that second part, the practice in communicating, is what all of this public work has been about.

So we talked about short-form communication, like a tweet. So in a tweet, I might be able to do, this is a small bit of R code that's designed to demonstrate a concept of updating a beta distribution, and it's enough to fit in a tweet. And then there's medium-form communication, and this can be everything from blog posts to package documentation, like vignettes, and building more practice doing this kind of writing helps one to get to long-form documentation. So in this case, I took a series of blog posts that I'd written, each teaching concepts around empirical Bayes estimation, and turned them into an ebook, Introduction to Empirical Bayes.

So another great example of taking public work and building on it to create a book is from Julia Silge. So Julia started by doing some blog posts about natural language processing, particularly sentiment analysis on Jane Austen novels. So she did some really fantastic blogging and was invited to our OpenSci hackathon, where she met me, and we together built the TidyText package. So she did blog posts, then maintained an open-source package. She followed this up with more blog posts that explained the philosophy of the package and how to use it, as well as a sequence of vignettes that go with the package that are medium-form documentation. And in 2017, we turned that into a book, Text Mining with R, available wherever books are sold as well as at that link.

Do you notice how I plugged my two books, two slides in a row? Take notes, kids.

How do you write a book? Well, luckily, Yihui is on the case, because there's the Bookdown package. So just as Blogdown allowed each RMD file to be one blog post, Bookdown has one RMD for each chapter. And Bookdown handles the enormous amount of formatting headaches and the nitty-gritty of references and indexes and all these problems, and it handles it for you so you can focus on the content and write something awesome. I find when I'm working with Bookdown, it feels more like I'm writing a sequence of blog posts than as if I have an enormous book that I have to contend with. In fact, all the books that I've described during this talk, all were developed in Bookdown.

Why work publicly

So why work publicly? Earlier I showed this slide, and I'm somewhat aggressive when I say work that's still on your computer is useless. Of course it's not useless. It could be something that you're working on within your company. It could be something that's helping you within your life. But why do I say it's so important to share your work? Well, imagine if nobody did.

Throughout this entire series of slides, I've been highlighting a lot of people that have done great examples of public work, as well as people that have shared guidelines and resources and tutorials that will help other people do public work. And every time along the way, I've been putting the link on the bottom right so that afterwards you can go through and try some of these resources yourself. Can you imagine if these hadn't been made public? I would have had to call my talk, The Unreasonable Effectiveness of Keeping Stuff on Your Computer. Later you would have watched it. It would have had slides like, my friend Bill did something cool, and he just told me about it at lunch last week and said, maybe I'll publish it someday.

Think about those conversations you have. Think about when you talk to someone, you say, oh, I've got something really cool I've been working on. I've realized this interesting thing about this machine learning method, or I've found a really cool data set. If you never publish it, where does that end up? Can you say, my colleague Rachel made a great graph. She showed it to me on her computer. I wish you'd been there.

They say that 80% of success is showing up. And similarly, I'd say to some extent, 80% of success within public work is getting your work out there. So whatever you have, I really recommend thinking through these approaches and sharing your work.

They say that 80% of success is showing up. And similarly, I'd say to some extent, 80% of success within public work is getting your work out there.

Thank you.

Q&A

I certainly hope that we have the fun throwable mics, maybe, out there. But we have definitely some time for some questions for David.

One of the things that I think can be really hard if you're trying to write blog posts about data science, especially if you're sort of throwing it out into the void, is getting sort of, like, editing advice or maybe, like, any kind of feedback on your post before or when it's sort of, like, in draft form. Is there any kind of way in the art community we can sort of support that?

I think that's a really interesting question. I generally recommend when you're writing your first few posts and maybe you're not sure that you could use some feedback, still getting them out into the world. The art community is generally, I've found it very welcoming and is a good place where people can give some feedback. But I think that's the kind of initiative. I also think you can certainly go to your local community. So when I was in grad school, my lab mates and my advisor were fantastic resources. Your manager or colleagues are also great resources. I think it's the kind of thing that's probably worth formalizing in some kind of program. I don't know much about it. I don't know of anyone that's tried that. But, like, I don't know, Feedback Fridays. Feedback Fridays. You heard it here first.

My name is Reina Harris. That was a beautiful talk. Thank you. The one thing you said, you need to have advice and experience to write a book. So what's maybe the harm in writing a book too soon?

I don't think I've ever found a case where someone wrote a book too soon. One of the reasons I'm so encouraging throughout the book of doing public work is I think it's very rare that people fall on the side of publishing too often. Obviously, there are exceptions like data breaches and such. But I don't find that people, say, are writing too many blog posts analyzing data when they're too early in their career. That's certainly the other direction. I think the risk of writing a book too early is that you wouldn't have enough to fill it. I think the kinds of people I find, once you start, when you want to write a book, it's generally the kind of people that could fill up a university course with material. That might even be a little bit too much, but could certainly give a long talk, maybe a two-day workshop, and can think of enough facets of the problem where they'd want to give useful advice.

I particularly think in this sense about both technical and professional help advice. I think if I imagine myself trying to write a book when I, let's say, just started grad school, I feel like I would run out of things to say.

One thing I found is I keep running into people that have published books before they started their PhD or during their PhD. It's really not a thing that has a hard age or career cutoff. I really recommend thinking in terms, if you say write a lot of blog posts, how they can be linked together in the long form documentation.

Thanks for the presentation. Do you have any opinion on the importance of updating older blog posts versus investing in writing new ones?

I'd say never do it. I'll tell you why you never update old blog posts. Because if you do, it stops you from writing future ones. It's not simply because you're spending time, you could be publishing the new ones. I would say it's because when you write a blog post, one of the amazing things about it is that it gets to be done. We call it the cult of done, is a popular description of this. So obviously you should fix errors or other problems in old blog posts. But if you find yourself treating a blog post as a resource that has to be maintained, for example, if I went back to my... I think I have a very old blog post that uses data.table. If I went back and used dplyr, if I went back and fixed the various things that have broken thanks to new dplyr releases, that would mean every new blog post that I wrote would become an obligation. And that would discourage me from publishing them in the first place. So it's a bit of an extreme answer, but I'd actually say I would not prioritize that very heavily, with clear exceptions such as if someone finds something truly wrong in it or dangerous or other things like that. I'd say put it out there and let it be done. Work on the next thing.

Thanks, David. I have a really important question for you. Did I pay you to promote the two books of mine?

Was that an option?

Alright, the real question. So I actually have a problem with tweeting, which is discoverability. I mean, many times I see there are excellent tweets sharing excellent ideas, but I feel like if I miss such a tweet, I feel it's just gone forever. So do you think that is a problem? And if you do, how would you solve that?

Yeah, I would say the advantages of Twitter are very ephemeral. It's very much, during this conference, see everything that's happening. It's not an ideal way to crystallize knowledge for future use, not the way a blog post or especially a book would be. Having said that, there are a few ways that it makes things permanent that I really enjoy. So if you haven't tried this, you should try it. I'm not going to do it on my computer, but you could go to Twitter and do from colon a username and search only within one person's tweets. So if you know, oh, I said something once, or Emily said something, or I want to find a tweet where Jenny shared this really good piece of advice, you can search just within one person's tweets. One way that I use that is for finding my own conference tweets. During a conference, I generally treat Twitter as a public diary. I'm not very good at note-taking, so instead I live-tweet each of the talks with what I find most important, and afterwards I can go back and discover the things that I wanted to share. So throughout this talk, I do include a few of my old tweets, and I did find them in this way.

So I think after hearing you talk, we all seem very motivated to write blog posts, but how would you recommend that people stay, like keep the momentum and continually publish blog posts, not just like after this conference right now, but maybe in like three months, six months, even a year?

You know, when I started my blog, it was late 2014, and it was around the time also that I started the Broom package, and I waited a few months, and I wrote four blog posts because I knew that once the blog was up, I assumed I'd be out of ideas, and I could never blog again, so I wanted to have four saved up that I could publish. And every single time I've published in a blog post since then, I've thought, yep, this is the last one, I'm out of ideas. What I'd say is the best way to keep motivated is to start that feedback loop, is once you put ideas out in the world, you get feedback on them, you get people excited about them, you get compliments on them, sometimes you get recognized, and these are ways that build your own excitement and keep you motivated to keep doing work online. So I'd say there isn't a shortcut to starting and putting your work online.