Tan Ho | Project Immortality: Using GitHub To Make Your Work Live Forever

Transcript#

This transcript was generated automatically and may contain errors.

My talk today is going to be about Project Immortality. It's about using GitHub to make your work live forever. You can follow me on Twitter, at underscore Tan Ho. Slides are, this is actually a GitHub repository, but the slides are uploaded to that.

So I'd like to introduce you to the life and times of a data science project, okay? This might sound familiar to most of you, certainly the story of most of my projects if not all of them. You start by importing some data, okay, whether that's from scraping, APIs, a database, pre-existing data, stuff that your company has, whatever it is, okay? You start by importing some data. Then you get to know this data. You do some wrangling, you do some feature engineering, you start doing some exploratory data analysis, trying to understand how messy the data is and how bad your data is, shout out to Jim.

And then once you've kind of done with that, you kind of understand that data a little bit as from a human perspective, you start trying to teach your computer to do something with it. You try to model, you try to get some clustering done or regressions or try to predict a number. And then you study the results of that output, you plot it, and then you tweet the plot, okay? So, you know, you share it with the world, you've learned something, here's what I've learned, here's, you know, get some feedback on it, validation, 15 minutes of fame, hopefully.

Hopefully everyone likes it, you get some feedback, and then most projects die here, right? And it's rightfully so, because you've learned something, you've done what you set out to do, you've learned something from that project and you're ready to move on.

The problem of abandoned projects

Sometime later, this happens. Relevant XKCD as ever? Never have I ever felt so close to another soul and yet so helplessly alone as when I Googled this problem and there's one result, a thread by someone who studied the exact same problem that I'm interested in and was last posted to in 2020, 2020 feels like 2003, Who were you, Denver Coder 9? Who were you, Tan? What did you see? What did you learn?

Never have I ever felt so close to another soul and yet so helplessly alone as when I Googled this problem and there's one result, a thread by someone who studied the exact same problem that I'm interested in and was last posted to in 2020, 2020 feels like 2003,

This story happens a lot, right? And whether it's a Stack Overflow answer, a problem, a GitHub, sometimes now you'll Google and you'll find a Twitter thread. How do you help this person? Actually, I'm not that interested in how you help that person, because the answer is they'll reach out to you, if you're available, you'll spend time and energy and you'll give it back to them if you have time, right? What I'm actually more interested in is how can your project help this person? Because at the end of the day, that's not always possible. Whether you're constrained, your time has limits, your energy has limits, your project should be able to help them and get what they need from it and move on.

So my question, my talk today is going to be about how can you help your project help this person? And I think that there's, oh, skipped a slide on me, that GitHub has a bunch of resources that can really help you along this path.

So who this talk is for? This talk is for people who are doing personal hobby projects, it's for people who are doing academic research, it's for people who are interested in public and open source work. More broadly, it's for people who don't have a budget when they're doing their work and the people who are interested in helping others use their work and helping their projects live on.

So why do I care so much about this? This is me. This is literally my profile picture on every single Twitter and Twitch and everything else about me. In a past life, I was a property manager, grew up in the family business, absolutely hated that job, okay? So on the side, I started doing fantasy football, started analyzing it, started teaching myself some data analysis to go with it, started with Excel, Power Query, eventually learned R, taught myself basically football analysis, NFL, fantasy football, et cetera. Eventually that led to a data science career. I got a job in a home building company doing data science and doing programming, and recently I've just accepted a position with Zealous Analytics to work on pro soccer. So you can make that career switch if you're interested by doing this sort of thing.

And then today, along the way, I've become a maintainer of public NFL data, and I maintain NFL-verse and FF-verse art packages for NFL and fantasy football, respectively. So why do I care so much about this? I've been the hobbyist programmer. I've been the person who's broke and wanting to do stuff without a budget. And I care today, even still, about making sure that people can build on my work and can use my work in their own projects.

The real solution, again, pairing with the GitHub theme, GitHub releases. And GitHub releases, it's kind of a lesser known thing. But you can make a new release. And releases are stored right next to your repo.

So, to recap, today I presented three tools for project immortality, talked about R package infrastructure to help make things easy to install and also to communicate what you want people to do with your work, talked about GitHub actions to schedule and automate things, and then GitHub releases as an awesome and wonderful way to store data. So with that, with those three tools under your belt, I'd like to leave you with one last question. How can you help your projects help others now and into the future? Thank you.

Tan Ho | Project Immortality: Using GitHub To Make Your Work Live Forever | Posit (2022)

Transcript#

The problem of abandoned projects

FF Opportunity: a case study

Making your model reusable: R package infrastructure

Automating predictions with GitHub Actions

Storing data with GitHub Releases