Getting the Most Out of Git - posit::conf(2023)

Transcript#

This transcript was generated automatically and may contain errors.

Excellent. Right. Thank you very much for coming along. So, as introduced, my name's Colin Gillespie. I'm one of the co-founders of Jumping Rivers. A few years ago, I got roped into writing a book. The way that happens is publishers say you can write a book and you've got 18 months, and you go, yeah, I'll do that, because in 18 months' time I'll have lots of time. 18 months comes around flipping quickly.

Jumping Rivers, clearly the name tells you everything about us, but just in case it doesn't, we do lots of data science, all the data science stuff you think of. You know, R, Python, shiny apps. We do lots of machine learning, managed posit services. Come and have a chat round by the exhibitors. We've got a stall there.

Keep it simple, right? You know, you've got the main branch. You create a feature. You do some commits. You merge it back into the main. It's called the GitHub workflow, right? You know, it's nice and simple. It's not that hard. It just sort of works.

So that's what I'd say to start. So here, a little Downton Abbey guy. Well, he's now tries to push to main, and he gets a very sad face. So he's not very happy now. But he can make a little dev branch. So here I'm pretending it's a little Shiny app. So I've made a little drop-down menu. I push to a branch. And then a CI passes. That's good. It can get merged into main.

And when you're doing this in GitHub and GitLab, you can do things like if the CI passes, it automatically gets merged. Right, you can sort of do that sort of stuff. So you don't have to sort of go to GitHub and then start clicking lots of buttons, because if you're clicking lots of buttons, you're probably doing it wrong. But you can set it up so you can actually push from the command line. It creates that branch. It runs a CI. If the CI passes, it automatically merges. And everything just disappears. So you're sort of still at that nice, simple step. But you've got that safety net of not shooting yourself and then digging a big hole for yourself and then throwing it in head first. You're a bit more safe.

Code owners and scheduled CI

So let's suppose we're getting a bit more important. So we've got a larger team. So you might have two, three, four, five people. So there's more than one developer. And we're wanting to keep track of who owns what.

And we're also, so I used to work in academia. And I used to work with a professor writing a paper. And we used to use Git. And his idea of using Git, he would phone me and say, right, I'm about to do some commits. And then he would hang up the phone, and he would commit. He was more senior to me, so you just have to agree. He was like, yes, OK. So that was our Git workflow, was a phone call. Not recommending that workflow in case anyone's not paying attention. Don't do the phoning part.

So keeping track of who owns what. So there's a code owner file. This is a really simple idea. And I think lots of your Git repos should have it. So it's a very simple text file. GitHub and GitLab understand this natively. So that means that GitHub actually understands what this file does and means and all that sort of thing. You basically create a little file called .github slash code owners. Just Google it. And it looks something like this. So star just means the group notes admin, so that would be a group of users, can do what they want. They can merge, join, and all that sort of stuff. And this person, Amy, has also got those superpowers. Then we've got another directory. So website might be a directory inside my Git repo. And website admins and Tim is only allowed to change that website, but they're not allowed to change other stuff. There's a few more options, but keep it simple.

And this is great because now we can start locking down this workflow. And also, if you're working in a team, so we deal a lot with governments and those sorts of organizations where people move around teams. Because this is a file, it's a programmatic file, you can now write queries such as, what Git repos is Tim involved in? So Tim is moving from organization A to over here. So before Tim leaves, we can say, Tim, what repos are you involved in? And we've now got a programmatic way of pulling that information out. And then we can rearrange responsibilities. And that's nice.

And so essentially, we've got exactly the same workflow, except we've got code owners who are people doing the merging part. Not that hard, quite simple.

Next one, so we're moving along this line. Scheduled CI. So you've got your CI up and running. So scheduled CI, hopefully you can figure this part out. It's a CI that's scheduled. So what we do is we've got a whole bunch of internal R packages, probably like most of the people here. And they're built once a month. It's on the 10th, and I've got them scheduled to run from sometime between midnight and 8 o'clock. So when things break, you just get all these random messages through this period of eight hours. And so we've got internal R packages. Any errors are sent to a Slack channel. So everything passes. So some months, not a problem. Other months, things start to fail.

So in the 10th of every month, we get all these checks. And then crucially, we assign someone at Jumping Rivers to fix that. We give them actual time in their workload. We say three hours a month, you've got to fix anything that comes up. Some months, not a problem. They don't have anything, so don't know what to do. Go to the pub or something. Other months, it's a bit more painful. But there's that time where things are just kept routine.

Also, a useful thing is when something breaks, we then have to think about, do we care about this? Or should we just get rid of this stuff? It's very good for doing legacy software. If something breaks, and you actually have to fix a silly thing, you think a lot more carefully of, does anyone actually use it? Do I really care about this? Or could I just archive it? So that's also just a nice way. Especially, I think, in many organizations, it's quite easy to create this stuff. And you look a year back, and you've just got stuff everywhere. It keeps you honest.

Continuous deployment and pedantic CI

Right. Next thing. So we're moving on. Continuous deployment. That tends to go along with continuous integration. So here, what I'm wanting is I've created a new feature. And when I create that new feature and push that to Git, I want something deployed without me having to press buttons. I want magic to happen. Okay? So what I'm meaning here is we've got the setup as before. Code owner, can't push to main. When I push this new drop-down menu to my app, I automatically launch that Shiny app with that new feature.

So I'm pushing to the branch. I've created this branch. And the Shiny app is automatically created for me. And then that means that my wonderful code owner can then go in, can review the code, and can also start clicking on things in the Shiny app. You're taking that little bit of effort away from them. And then if everything passes, the code owner is happy, it passes the CI, then it would automatically deploy onto the production server. So no one is having to press deploy, go, move forward. And then when I've merged that, the dev app would just sort of magically disappear without anyone having to worry about it.

This part is starting to take a bit more infrastructure. This part is starting to take a bit more maintenance. Doing a code owner's file is dead easy. It's ten minutes of Googling, five minutes of writing. This stuff you're having to think about, where you're launching a Shiny app, for example, it's going to connect or it's going to wherever you want. And then you have to try and destroy it when that branch is then merged. And so it does get a little bit more tricky. But it's a really nice workflow. And that's what we do internally. So it works really well.

The last part, and I'm just going to just sort of lump this into sort of pedantic CI. And this is just sort of wherever you want to go. Full honesty, I did this one first seven years ago and really annoyed the whole company. But that's a side problem. So a news file, you can have a little CI job that checks as a news file formatted properly. Does it have, has it been updated? So one of our CI jobs is whenever you update the package, it checks that the package version has been updated. So you've made a change to the package. So the version must be updated. Otherwise installed packages just breaks. So it checks that. And it goes, if you've updated the version number, have you updated the news file? Because it's really easy to forget this stuff. It's a little CI job that just checks this. So it checks as a description tidy as well. It does other things like our file names, lowercase.

You could, we could do some stuff and commit messages, talk about later. But the world's your oyster, right? You could start thinking about, actually, we've got 50% unit test coverage. If you make any changes, it must be at least 50% or greater unit test. You're not allowed to degrade that experience. So you can start adding on all these hoops to go through.

This stuff here starts to make it hard to onboard team members onto. All right? So all this stuff will say, well, you can't just get someone in jumping to start doing commits because you've got all this stuff to sort of make things proper. But it depends where you are in that line. All right? You know, if you're making software, the effort breaks and lots of people are going to be upset, then, well, that's what you have to do. You don't want to mistake.

So, I mean, it's where you are in that line. Something that, you know, so commit lint, that's not an R package, by the way. So if you Google it, it's a, it's an additional sort of add on. Essentially, whenever you do a commit, it checks your commit message. It follows particular standards. Right? So the first part, for example, we have a bunch of keywords. So it might be chore, fix, feature, CI, docs, you know, those sorts of words. If you do anything other than those words, you'll get a little shifty message. They don't do some stuff in the right hand side. So does it start with a capital letter? Is it too long? Is it too short? So again, it does that.

This can be really annoying, right? This works well when you've got this mythical place, which I always live in, where your first commit just works. Because your first commit you do, I am feature, you know, adding a new dev, a new dropdown menu. But then you've got another 15 commits after that in order to fix a thing that didn't quite work. And so I tend to then write, fix, bump, fix, bump, and then you're into rebasing and all that sort of stuff. So it can be really annoying, but it can also be useful. I really do mean it can be really annoying.

Summary

Right. So just to summarize, so if you're at the far left, and it's only you, who cares? Right. Basically, you know, use something simple, get stuff done. Right. But then as you sort of make your way along, you know, people are actually using your software, and that could be future you in a year's time. Right. I hate past me. He's a complete, right? Then you want to start thinking about protecting main, adding in CI, thinking about code owners, if there's more than one person, thinking about scheduled CI. Scheduled CI, again, five-minute job to set up. Right. You can stick in a slack hook, or stick in an email. Not that hard. You know, once you've got that, quite easy. Not so much fun when you get 15 slack messages, telling you all your packages are broken, but side point. Continuous deployment, absolutely wonderful, but that's not a five-minute job. That also, you know, from my experience, takes maintenance. It's not a sort of, you just do it, and then you walk away, because you're deploying it to a server, and there's authentication, and credentials, and stuff. Right. There's stuff there. Worthwhile, but there's stuff. And then after that, you can just, you know, go where you want.

So, I think we can trust Jenny Bryan. So, that's good. I'm safe. Not so sure she's right about set WD, if I'm perfectly honest. You know, so she's got quite strong opinions on set working direct. So, I'll think about that for next year. But, thank you very much for listening, and I hope you found it enjoyable.

We do have time for a little question. So, I do find that it is easiest to maintain good git ways in life, if there are multiple people working on a project. Often the exact opposite, but oh well. I find it hard to, to be my own, like, boss when it comes to that. Is there any, like, good ways when you're alone to like keep, keep the momentum going there?

So, something I didn't touch on is you can start templating this stuff. So, rather than constantly adding little GitHub YAML files with your CI part, your GitLab YAMLs, you can put it in one place and essentially template it all the way through. So, then when you're setting up this stuff, you're essentially just sort of pointing, say, use this stuff over here. So, then your CI, you know, you can have a very simple CI file, which is just use stuff over there. And then essentially the computer is a person keeping you on the straight and narrow because the computer will say no, you're not allowed to push, no matter how much you swear at the computer.

And then essentially the computer is a person keeping you on the straight and narrow because the computer will say no, you're not allowed to push, no matter how much you swear at the computer.

Okay. Fantastic. From the Burma Bihar, do you like Git? I do. But that's because everybody else is using it wrong now. It's just, yeah. But yeah, I do, but it's taken so long to understand what's going on. You know, it's taking a lot of time. Fantastic. Thank you, Colin.

Getting the Most Out of Git - posit::conf(2023)

Transcript#

The problem with Git workflows

How important is your code?

The GitHub workflow: keep it simple

Code owners and scheduled CI

Continuous deployment and pedantic CI

Summary