Resources

Jesse Mostipak | R4DS online learning community: self-taught data science & DEI | RStudio (2019)

The first iteration of the R4DS Online Learning Community was created as an online space for learners and mentors to gather and work through the "R for Data Science" text in a collaborative and supportive environment. The creation of this group was inspired by my own success in transitioning to a career in data science coupled with the resources that I wanted to see in the R programming space. This talk will go through the learnings of creating an online learning space focused on R programming for data science, and how future iterations of similar groups can more proactively center on bringing about diversity, equity, and inclusion to data science spaces. About Jesse Mostipak: Molecular biologist turned public school teacher who eventually fell in love with non-profit data science. Harvard SDP, Datanaut, and perpetual #rstats noob

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

It is an absolute trip to be here at RStudio conference talking to all of you about the R4DataScience online learning community, sharing a little bit about my personal story, and then talking about diversity, equity, and inclusion in data science. And that's a lot to cover in the next 15-20 minutes, but we can do it.

The reason this is so kind of wild to be here is that this is not something that kids like me traditionally grow up and do. I grew up to a poor working-class family outside of Buffalo, New York. I am the first and only person in my family to both go to and graduate from college. I really liked college so much that I ended up going to graduate school twice. I did college at the education-at-any-cost model.

To give you some perspective, I graduated from high school in 1999, but I took eight years to finish my undergrad, which means I graduated into the recession of 2008. There were lots of jobs for people with degrees. I have a ton of student loans. I went through a variety of careers. I did academia for a little while, and I ultimately ended up teaching high school science, and I loved it. But I'm not sure if you know this. Teaching doesn't pay the bills.

It is an amazing career. I recommend it for everybody, but if you are carrying six figures worth of student loan debt, teaching is probably not going to help you make ends meet.

Becoming a data scientist

So I was in this position in about 2014 where I was evaluating my life, and I knew a couple of things. One, I really liked video games. And to be very specific, I really liked World of Warcraft. And to be even more specific in the spirit of vulnerability, I could play World of Warcraft for like 12 to 15 hours a day. So there are some things going on in my life, but I really liked video games.

I really liked video games, and data science was becoming a thing. And it didn't seem like a fad, but data science was like, okay, I have some background that could maybe put me in a position where I could do data science, and guess what? I don't think I need to go back to school to become a data scientist. So I think that maybe I could do this thing, I could be a data scientist, and I could change my financial future.

So through a variety of kind of weird coincidences and situations, I found myself interviewing for this nebulously defined data-ish position with an esports startup based out of Dallas, Texas, called PVP Live. And to this day, I don't know where I got the courage to tell them I was a data scientist.

I was registered in the Johns Hopkins Data Science Specialization from Coursera, but I took none of the classes. Like most of you here, I'm sure. And I had taken a graduate-level statistics course for one semester, and I knew a little bit about this language called R. But the fact remained that I knew more about data science than anybody at the company, so nobody knew that maybe I wasn't a data scientist. So I said, I'm a data scientist, and they hired me.

So I worked remotely for a while, and then they told me to move to Dallas or find a new job, so I moved to Dallas. And my first day on the job, they were like, congratulations, you're a manager. And I was like, cool, maybe I don't even need to learn data science. I'll just manage data scientists. So that was really cool. And then six months on the job, I'm spending more and more of my time managing data scientists and helping them learn data science. And I'm learning a little bit as I go, but really, I'm learning a lot about managing and being at a startup, and this is six months in the job, and it's new, and it's exciting. And this is kind of amazing. I told this company I was a data scientist, and they hired me as a data scientist, and then they promoted me as a manager of data scientists. I was like, this is so easy. Like, you just tell people you're a data scientist, and they hire you.

So 12 months into the job, without warning and without notice, they let us go without pay. And that is something of a misnomer, because they told me I could have equity. But if your company makes zero revenue, your equity is still zero. Like, I knew enough math to know that this was not going to work out in my favor. So I'm in Dallas. I've renewed my lease. I've gambled everything on data science and this esports startup, and I've kind of lost.

So what do I do? Do I go back to teaching? I know teaching is not going to pay the bills, but I know I love it. But then I've got to get certified in Texas, and that's a process. Or do I try to find something data-related or data-adjacent? And I decided to see what I could do with data, and maybe let go of the data scientist title. Does it matter if I'm called a data scientist, or does it matter if I'm working with data? And I was like, well, I guess it matters that I'm working with data. But this time, I'm going to work for a company that I believe in.

I did believe in the esports startup for a while. But I was like, I need someone with a little more clout. So I ended up working for the Girl Scouts of Northeast Texas, and ultimately at Teaching Trust, which is where I am now. And I do strongly recommend working for nonprofits to kind of start getting your feet wet with data-related careers. I've had all sorts of titles. I've been an analyst. I've been in research and evaluation. I've been in outcomes. And as long as you let go of that data scientist title, I think that you can get started with data science just about anywhere.

Creating the R4DS online learning community

So I was in this position where I'm working in nonprofits, and I'm doing analytical work, but I'm moving into management. It turns out once you're a manager, it's hard to not be a manager ever again, because people are like, oh, no, we just want you to manage. So I'm here, and I'm leveling up on the side. And I'm like, I want to get better at R. I kind of know this language. I think it's really useful. I think it can make a lot of what I do at work better.

So as an educator, I was like, I'm going to teach myself R. I'm going to get better at this, but I'm going to be transparent about my process, and I'm going to share that process with other people. So I took to social media and Twitter, and I was tweeting about things, and I was writing blog posts. And my first blog posts were helping people navigate that gap between using your computer for Netflix and using your computer to do data science. And forget the gap between Excel and R. The gap between using your computer for Netflix and the gap between navigating RStudio is very, very real. And people need those resources.

So as I was doing this, two things happen. Someone on Twitter was like, there's this really good book called R for Data Science, and it's free. And I was like, oh, cool. I will read the book. But then Hadley retweeted me. And this is the most popular I've ever been on Twitter. It's probably the most popular I will ever be on Twitter, and that's OK. But I was benefiting from something that, for some people, we informally refer to as the Hadley bump, right? For 15 minutes, this is my 15 minutes of fame, right? And I'm like, I can take this and run with it, or I can forget about it. People are interested in what I have to say for a limited amount of time. So I have the opportunity to take this book that I'm going to read anyway, and why not invite other people to read it with me?

So that was how the R for Data Science online learning community, the concept, started. But when Hadley retweets you, you have a limited amount of time to get people still interested in what you're saying, right? I can't wait two weeks and come back and be like, hey, I have this cool idea. You have to move. So it's all about finding your opportunities, people.

So I have to come up with this plan. And my idea is like, well, I don't want to spend any money on this, and I don't want anybody else to have to spend any money on this. We shouldn't have to spend money to learn how to do data science. We're repurposing things we already know how to do. So it's like, it's going to be free. It's going to be open to whoever wants to be a part of it. And so the original idea was, let's create a Slack group, and let's follow this curriculum map that goes through maybe two chapters of the textbook every week. And we'll finish in three to four months. And we'll all be data scientists at the end. We'll be pros. We'll be total experts. It's going to be amazing.

And one thing we did with this group that I think is different from some other online learning communities is that we made it very clear that you had to be nice to people. And you would be kicked out if you weren't nice. This was going to be the kind of place where you could come, and you could say, I only have two panes in RStudio, and I know I'm supposed to have four, and I don't know what to do. And instead of someone being like, loser, you're so new at this, people would be like, hey, you know what? I can help you with that, and I'm going to stick with you until we figure that out. And so that was the idea for the kind of people you were going to encounter.

And then last week, we had this idea of there being mentors and learners. So you self-identified. Am I here to mentor other people in their learning journey, or am I here to learn? And maybe you were a learner for the first three chapters, and then you became a mentor for other things. So it was this very fluid definition. We weren't going to say, oh, we've done this survey, and you are a mentor, and you are a learner. You got to identify. You got to take responsibility for your role in the community.

Expectations versus reality

So expectations versus reality. I could go on about it. I've got three hours of material on this. I will keep it brief. Essentially, I was like, no one's going to want to do this with me. This is an awful idea. But why not try? I figured at the end of two weeks, I'd be begging my friends to do this with me. There'd be 12 of us, maybe 25, maybe 50 in this online learning community. And I'm like, that's fine.

At the end of two weeks, when the invites went out, we had over 600 members, which for me meant that I'm not there as a learner anymore. I'm there as a community manager. And that was a surprise. That was something I wasn't expecting.

But as we moved into things and I started thinking about what worked and what didn't work, there was a lot. But I would say of the 600 people, within the first two weeks, we lost half of our members. And we lost our members for one of two reasons. One, Slack was overwhelming. Most of us probably use Slack and are like, oh, I'm so tired of this tool. It's amazing. But stop sending me stuff. But you know how to navigate it. Not everybody knows how to navigate Slack. Not everybody knows how to turn their notifications off. And when you log in and all of a sudden there's 450 messages, you're done. You are out.

But we also had people who couldn't install or navigate R in RStudio. And that's where we lost 50% of our members was because we couldn't help them get over those hurdles. And we didn't even think of those as necessarily being the first hurdles. So again, this gap from your computer for Netflix to your computer for data science is very, very real.

There was also a misalignment and probably some miscommunication on my part about what we were doing. So some of the biggest things was people showed up and they were like, what time do classes start? I was like, they don't. We're just going to read the book together. And like, you're on your own. And I'm not going to tell you when to do things or how to do things. So that was a big thing. We had people who were like, what book? Like, what is the book? So that was a little bit lost in translation. And then another thing was that not everybody was tidy versus aligned. Now, listen, if you're having trouble installing R and RStudio on your computer, I don't expect you to know the difference between base R and the tidyverse. But if you're not there to help people learn the tidyverse, which is kind of a central organizing principle of the book, it's going to be some rough sledding.

So there was a lot to work out. And I didn't provide any training for mentors. I was like, yeah, you know how to be a mentor. Forget the fact that we send teachers to school to be certified in teaching. I was like, you're fine. Just be nice. It'll be good. It'll all work out.

So at the end of round one, I was exhausted. This was like, I just want to say people are like, data science is so hard. If you want a hard job, go be a community manager, because that is exhausting. I, like, if you know a community manager in your life, give them a hug. Thank them for what they do. That is a thankless career. And it is incredibly difficult.

So there is, this happened, right? And I was exhausted. I didn't read the book. I hadn't finished the book. I was spending all of my time answering Twitter DMs, creating content, answering Slack messages, helping connect people. So I had really moved into a community manager role. And it wasn't quite what I expected.

I will say that we did have one person who copped to finishing the entire curriculum. The guys at Hopkins tell me that is well within the margin of error and that I should be very, very proud of that one person. And I am. I'm also proud of myself for doing something before I was ready. And I'm really, really proud of the people who are like, yeah, this is a crazy idea, but I'm going to do it. Like, everybody who came along on that first adventure, I am incredibly proud of for taking a risk and trying something new.

Round two and community growth

So then people are like, hey, when's round two? So in round two, I was like, I can't do this by myself. Something needs to change. And I said, what do you all want to see in this community? And people came back with all sorts of ideas. And when they said, I really want to see this, I said, that is a fantastic idea. How can I support you in making that a reality? And what that did is it resulted in people being like, oh, no, I don't want to do the thing. And okay, that's great. Then it's not going to get done. But it also resulted in people saying, well, I need you to help me to research software. Or I need you to retweet something. Or I need you to just kind of like help me talk in the group and recruit people for this. And those were things that I could do.

So in round two, we had a lot of amazing things happen. We had commitments to office hours, where people would give up an hour of their week and just show up and answer questions. We did GitHub. We did the chapters out of the book. We had YouTube tutorial videos that community members made. And then three things that you're probably most familiar with is we have a website, we have a Twitter account, and we have Tidy Tuesday. These are not things that I built. I made sure everybody got in Slack and that everybody was happy and well behaved and nice to each other. These are things that community members have built and that community members to this day maintained. If you want to get involved in this online learning community, you can go to that website and sign up. And I strongly encourage you to do so.

So when we're getting through round two, I'm realizing that the group isn't what I set out to do, right? We're not going through a text together and necessarily learning at a specific pace. But it's also an amazing community, and people are doing amazing things. They're just not quite the things that I want to do. So I am very excited to say that the group was handed off to a new set of leaders who are enthusiastic and incredibly talented at what they do in keeping this group running. And they are just, I think they were saying that there were over 2,500 members today, right? So the community continues to grow. It continues to be active. It continues to be a place where you can go and give and get help and R.

Data science, diversity, and vulnerability

So in my time away from the group, I started to think about a lot of things. And I thought a lot about my life and how data science has changed my life, right? Just doing this community, just learning data science, just being a data scientist has changed my life. It's changed the people that I have access to. It has given me opportunities that I would never before have had in my life. And it has completely changed my socioeconomic status. Instead of living in a financial crisis every single day of my life, I'm like two emergencies away from a financial crisis, right? And that's a big deal when you grow up in poverty and you can't put food on the table, like to have that security.

So I was like, if data science has done this for me, then it can do it for other people. And it's my responsibility to help make it easier for other people move into data science. And I think we can all agree that data is used, data and data science is used every single day about, by, and for every single one of us to make choices and decisions for us. Data is integral to our lives and it will be for the foreseeable future.

So as I started to look around data science as a whole, and I was like data science affects each and every one of us. So each and every one of us deserves a seat at the table.

So each and every one of us deserves a seat at the table.

So what I'm asking people to do, there's like a laundry list of a million things I could ask you to do to start getting more people into data science and making it easier to bring people into data science and removing some of the barriers. But I'm only going to ask you to do one thing. I'm going to ask you to be vulnerable. I'm going to ask you to help me demystify data science. Share publicly on Twitter, on your blog posts, in person, the things that you don't know. Share the things that you're learning and share the mistakes that you've made.

Data science isn't special. Data science isn't even necessarily a difficult career. I think the more, there's no, there's no checklist for data science where we say, oh, great, I have done every item on this list and then Hadley Wickham descends from the heavens and knights you a data scientist and everything, now you are free to practice data science. Go do data science. And share with people, because what data science is, is it's about constantly learning and growing and taking risks. I would argue that that is one of the defining characteristics of being in data science, right, that you're constantly learning, you're constantly making mistakes, and you're constantly learning from mistakes. And the more that we normalize that and we say, hey, this is what us, the people who get paid to be data scientists, do, I think the more that we can start bringing in diverse voices and making data science as a community more equitable and inclusive.

And the more that we normalize that and we say, hey, this is what us, the people who get paid to be data scientists, do, I think the more that we can start bringing in diverse voices and making data science as a community more equitable and inclusive.

With that, I think I might be able to take one question. I don't know, Amina is my timekeeper. Okay. Thank you.

Or you don't have any questions, and that's fine, too. You can always reach out to me on Twitter.

Hi, thanks for your talk. Yeah. Thanks for being brave and vulnerable and all those things. So when you stepped away from the community, who was it that took over and what was a part of that decision? Yeah. So two of the people are in this room. One is a gentleman named Dennis, who is, I can't remember, I think he's in Kenya. And then we have Thomas Mock in the back. He has got Tidy Tuesday stickers. I'm just putting that out there. And then John Harmon, also over here on the end. So these are the gentlemen that are currently running the community and keeping it alive and making sure everything is going as planned.