Resources

Using paired programming to have fun & sell your solutions (Kris Fabick & Kristin Carr)

Self-sufficient deployment: using pair programming to have fun and sell your solutions Speaker(s): Kris Fabick; Kristin Carr Abstract: Are you on a business team lacking resources to get data science projects actually deployed for your non-technical end users? Join us on a case study journey involving Posit Connect, Vetiver, and Streamlit. We will discuss how to successfully deploy "department-level", bilingual data science solutions quickly enough to solve problems before they become obsolete by focusing on how to harness the power of pair programming. Come see how to make your work life simpler, more fulfilling, and more fun! Materials - https://github.com/kfabick03/positconf2025 posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

All right, great, thanks for coming to our talk today. We're gonna talk about self-sufficient deployment and how you can use paired programming to have fun.

Okay, I know we just had a good lunch. We'll have a slow start. Let's start with our imagination a little bit. All right, so you hear the crunch of the gravel as you pull into a makeshift lot in the foothills of Appalachian Mountains.

You look out and you see the billboard. There it is, the course map. 25 obstacles over 12 miles. Seven of those obstacles are mystery obstacles. And the course is gonna take place over 1,600 feet of elevation change. This is exactly where Kris and I, two data scientists from Nissan, found ourselves one crisp fall morning. We came ready to take on an intense obstacle course race, but what we actually got was a real-world lesson in paired programming.

How? There's actually a lot more in common than you think. So the 25 obstacles over 12 miles took a lot of preparation. We call this stability in our data science projects. The mystery obstacles, arguably more important, it's that adaptability to those problems or obstacles that come up. The elevation change, that's the backdrop of every race, of every problem. It's the difficulty, and you need to come with the right mindset.

So sometimes that's just getting started, like we did with our wristbands. So at the end of the day, we found that projects at work, as well as obstacle course races outside of work, we worked better together than alone. I'm gonna hand it over to Kris to talk about our Spartan races at work.

The case study: self-sufficient deployment at Nissan

So I'm gonna tell you about a case study that we did at work at Nissan. But the details of the case study aren't important. We're just using it to show you that we actually really did use paired programming in a real project. So like Kristin said, we work at Nissan, and we both work in supply chain. So sometimes we have to bring in parts, we have to bring in shipping containers, and we were approached with a problem of predicting a thing called monthly container fill ratio.

And container fill ratio is just the KPI we use in supply chain, and it's the simple ratio of the space the package takes up divided by the full space in the container.

So we needed to deploy this self-sufficiently. We are a business team of data scientists. We are not an IS team. We were not gonna hand this off to an IS team. We're deploying this solution to some other people in supply chain, and we needed to be able to do that self-sufficiently. If you've ever tried to deploy a project self-sufficiently, you may have encountered, there are sort of these two competing project components that you have to figure out.

There's reactivity. You need to be able to respond quickly enough to solve the problem before the problem becomes obsolete. And you also need to have stability. So you kind of need to go slow enough so that when you deploy it, it can last over time.

You need to be able to respond quickly enough to solve the problem before the problem becomes obsolete.

Let's take a look at this first problem. Solutions need to be reactive. So when we're developing projects, we need to not develop like this. We need to make sure that we have, we don't wanna wait until the very end to have a working solution. We wanna make sure that we're actually having little tiny working solutions as we go. I think everybody agrees this is the right way to develop. I'm gonna argue this is also a deployment type.

So each part of this deployment, we wanna have something working from the beginning, right? So we have a skateboard and it can actually work. It may not be your full project. You're gonna add a component to it and then it's gonna be working and you're gonna keep going like that.

In our case study, right, that simple machine learning problem that we were gonna predict, this may have looked like in the first stage we have developed a few models and then we go back to the business and say, hey, do you want the model that's most interpretable or the model that is the most accurate? And then we can go forward and now we know that piece and we can add some handlebars to it and go forward with the project. So you may be starting to see how working with a second programmer is starting to benefit your projects.

The second problem that deployments often have is that solutions need to be stable over time. So I'm currently teaching my daughter how to ride a bike. If you've ever done this, you understand the importance of support to develop that stability.

At the end of the process of teaching her how to ride a bike, the goal is that she can ride by herself, right? I know how to ride a bike, she can ride a bike, either one of us can ride the bike.

Stability requires sufficient support or else everyone is angry crying. She was very happy she could be in my presentation and she did not mind me sharing this unflattering photo of her.

This is true not only in learning to ride a bike, it's also true in data science projects, right? So for our project to be deployed, this little simple CFR, we wanna predict something, we needed to be able to walk away from it and go on vacation or come to Posit Conf and speak to you. And so we needed stability in that deployment so that we could do that.

Balancing reactivity and stability

I'm gonna take a minute and help you see the interaction of these two. You have to hold them in tension, right? You can't go quickly and go slowly at the same time. Across the bottom here, we have different deployment types, the simplest being on the left, the most complex deployment type being on the right. And we are, again, a business team, so we are only in control of sort of these three on the left here. These are self-sufficient deployment types.

So a manual ad hoc analysis, I know you may be arguing that's not really deployment, but remember we're doing these quick iterations, so we're counting that as a deployment type. A local deployment is where you have it on your laptop, maybe the business tells me, hey, can you go get those predictions again? And I'm like, sure, tick, tick, tick, tick, and I email them back the predictions. And then the third level here is a team server deployment. That's where maybe I'm sending predictions up to a database or to a server, and the end users can actually go access that without having to come through me.

So what does reactivity look like on these different deployment types? It's gonna start out really high, right? You're gonna be able to respond really quickly to changes that the business needs. So side note, as it turns out, that CFR calculation, that really simple one, they actually changed it on us once we had it deployed, and so we had to go back and quickly update that, and we were able to do that because we had a self-sufficient deployment.

So reactivity's gonna start really high, it's gonna decrease as you go to the right. But support, stability is gonna do just the opposite, right? It's gonna start out really low, you're probably the only one working on the project, but as you get to more complex deployments, you need to make sure that you have more people involved.

So the goal is you wanna retain maximum reactivity with sufficient support to ensure stability. So back to our case study, right? The simple machine learning prediction. We started here, it was pretty easy to get to this local deployment. I had it on my laptop, I could email back and forth with people, but I needed to move it up to a team server deployment so I could come speak to you. But I couldn't do that by myself. I physically can't have the support that I need to have stability if it's just me.

Thankfully, we had just read an article about pair programming, and it gave us the tools we needed to sell our leaders on using pair programming to solve this problem. I'm gonna hand it back to Kristen now to tell you about that article.

Making the case for pair programming

Great, so we now have a product that's both reactive and stable, but to get there, we need to justify to our leadership two resources on one project. So we started doing some research, and this review article is one that we're gonna quote today came up, and it really helped us strengthen our argument.

The first thing is 15% more time is gonna save you 15% less in defects. So we're not Geico, but the concept applies. 15% more time and defects are very different. That's not only defects in your code and your project, but that's any downstream process that you have hooked up to the product that you delivered. So that's really valuable.

The second thing is, is it fun? Yeah, after programmers tried pair programming, 90% preferred it. Does it work? Solo developers only passed 75% of test cases. Paired developers passed about 90. So this was a really strong argument, to come in and say, yep, we're gonna have a more reliable solution to the business and create that faith in our products. We're gonna have fun while we do it, and we're gonna get better and more creative solutions.

Pair programming strategies

Okay, so we were convinced, and we're ready to tackle this. So how do we do pair programming? Before we get into strategies for pair programming, let's talk about what it's not, in case you've heard this term before. So the first thing is, pair programming is not backseat driving. The second thing is, it's not 24-7 grind. This isn't sitting next to each other, looking at the screen for eight hours a day. It's also not keyboard hogging. So it's sharing the ideas, sharing the coding, all of it. And then, lastly, it's not nap time. So this is coming to the problem, engaged with that mindset, ready to take on the difficulty. So now that you know what it's not, I'm gonna hand it back to Chris to talk about some of the strategies that you can use to use pair programming in your projects.

So there are a lot of approaches to pair programming. The most common one is called driver navigator. It is just what it sounds like. One person is driving at the keyboard. The other person is helping navigate, looking for speed bumps, left turns ahead, helping you get through that.

A second approach is called ping pong. This is more common in software engineering, but I think it's useful for data science as well, especially if you have a lot of functions in your code. Basically, one person starts with the idea for the function, the second person will implement the code for that function, and then the first person will go back and refactor that function, and then the next time they switch roles.

I wanna introduce a really important term here called refactoring. This is a process I've sort of always done, but I didn't actually know it had a name for it. It's the process of making the code more readable, maintainable, and efficient. So you've probably done it too, but it's really important to call out here because it's a hugely critical component of pair programming.

A third approach is called unstructured synchronous. Unstructured here does not mean like just do whatever you feel like, and like there's no plan. It just means that it's not so formal. You don't have to like your turn, my turn, your turn. You can lean into your strengths and your interests a little bit more. Synchronous means you're sharing time and space. So a program, a problem being solved with this approach might look something like this, where you are working on different project components together, you're sharing time and space, and you are doing that refactoring together.

A fourth approach is called unstructured asynchronous. So same meaning of unstructured, asynchronous here just meaning you're not sharing time and space. And I affectionately call this parallel programming because I am a dad and I have to have my dad jokes.

So this approach could look like this, where you have one person doing maybe these two project components, another person doing this project component, and maybe they're doing them at the same time, but they are not doing them together. They're not sharing time and space. And I know you think that I'm totally cheating now because now I'm not even talking about pair programming. You're like, that's just regular programming, except we're not done with the project, right? Because there's this really important step of refactoring. And if you will pause and go back and refactor together, maybe even all switching who like takes the lead, you're gonna achieve the pairing that you need to make this successful. The goal at the end of whichever approach or approaches you choose is that everybody can ride the bike.

Applying pair programming to the case study

So we're gonna go back to that case study. Remember that simple machine learning model we're trying to predict for Nissan. We used both of these approaches, driver navigator and unstructured asynchronous. We're gonna show you how we did that to solve this problem. If you came to this talk for the technical details, the raw code about how we did this, we have that in, you can access that in a QR code at the end of the presentation. That's not what I'm doing right now.

I'm gonna give you a high level architecture of how we solved this problem to show you the tools we used and how we used pair programming. So we used tidy models to build the machine learning model in R. Tidy models, if you don't know, is a framework, a package built and maintained by Posit that helps you create machine learning models and then build all the workflows you need to deploy them.

The most important piece of architecture that we had to deploy this self-sufficiently was Posit Connect. If you haven't heard enough about Posit Connect yet at conference, it is one of Posit's commercial offerings. We are so thankful that Nissan has a Posit Connect server that we pay for. And so this allows us to sort of, I think of it as a bulletin board, right? And I can make all my cool data science stuff and I can like pin it to my bulletin board and other people can walk by and like do stuff with it. And so we have a Posit Connect server.

We use a package called Vetiver, also a Posit maintained package that allows us to take that model from a laptop, right? And put it up on Posit Connect. It's like the push pin in the bulletin board. And it also created an API for us so we could interact with that model really nicely. Finally, we used Streamlit, which is a Python package for building web applications. It is not a Posit product, but it works really, really nicely with Posit products. And so we use that Streamlit app to call the API and display the predictions to the end user.

So how did we do this with pair programming, right? So I took the lead on developing the model while Kristen was developing the UI in Streamlit. And so this is that unstructured asynchronous approach. We came back together and refactored each of those code sets together. And so we both knew what was going on at the end.

We used the driver navigator approach for the Vetiver portion because it was new to us. We had gone to PositConf 2024 and went to some really good workshops with Vetiver. So we knew what it could do, but we had never actually done it in the wild. So we felt a little more comfortable doing that like together, sharing brainpower together.

So in the end, we were able to solve our problem at Nissan. We were able to successfully deploy this machine learning algorithm self-sufficiently through these tools, and we achieved the reactivity that we needed because we worked together, and we achieved the stability that we needed because of pair programming. So now that we have talked about how we solved our problem with pair programming, Kristen is gonna talk you through a few lessons learned so that you can solve a problem with pair programming.

Lessons learned

Right, so it worked really well for us. How does it work for you? So the biggest thing to take away is pairing is a mindset. It's not a technique. There is no recipe, there's no textbook that's gonna tell you exactly how to do it because everyone's different, and the dynamic between two people is gonna be different every time. Follow the vibe. What gets you traction? Roll with it.

So the biggest thing to take away is pairing is a mindset. It's not a technique.

The next thing is if I could do every project with my work bestie, of course I would, but friction will find you no matter what, and that's not bad. We've heard in other talks that that is what helps. Creativity makes better solutions. So embrace it, roll with it. I mean, of course, be respectful, but it's a good thing.

And finally, slow down to speed up. So you've heard a lot about refactoring. This is also like when you're working with your teammate and you're talking through problems, take that time so you both fully understand it and move forward with the progress.

So the second is know your strength and adapt for your weaknesses. So for our particular project, our skills were very isolated, and in fact, Vetiver was outside both of our circles of knowledge. After the project, we were both experts on everything. So now you see where that reactivity, that stability, that paradox of delivering a solid project comes in.

And then finally, the final lesson is create this checkpoint. So you've made this progress. How do you save it? And so refactoring is really important. This is your code-level checkpoint. Make sure that your code is good. Take out all the leaks so that you don't have that catastrophic damage way down the line.

So let's take that code and let's expand it to our team, our pair programming team, right? So let's document this together. We're gonna create standards. For us, this looked like a readme. We created a standard readme, and that's all those quick facts that you need to know about the project, all of the things that you've discussed documented in one place. So if I'm on vacation, the other person can work on it.

And in fact, let's expand that to our team. So now that we have a readme, not just the two of us that worked on the project, but our whole team can support this. So now you see where those iterative approaches in development can happen quickly, and they're also reliable.

All right, so have we convinced you? If not, here's some more resources. So the slides that we went through today are here in our QR code. At the end of those slides is the actual raw code, in case that is what you came here for, to look at the actual app that we built or the model we built behind it using tidy models and better. We also have links to the article about pairware. So the review article we talked about earlier, as well as some other articles that we used to help bolster our argument for pair programming. The readme template that you saw briefly, it's there on this site as well. In the technical details, we've kind of called out a few different steps that we've broken down, in case any of these are of interest to you.

So at the end of the day, it might get a little bit messy, but we really encourage you to give pair programming a try for any of your self-sufficient deployment projects. Thanks.

Q&A

We have time for a few questions before our next speaker. So the first question from Anonymous. Why is paired programming only 15% more time? My intuition is it would be 100% more time.

Well, when you're working with someone else, it's not, you're not, you're not, you're able to go quicker, right? Like somebody else is gonna know maybe something that you don't know, and you're able to debug a little bit quicker. So it's not, that's an understandable question.

You know, and we didn't do the research. We're just reporting what the research said. But yeah, it's a good question. But I do think, we do definitely experience that you can code quicker. It's kind of like having an AI assistant, right? Makes your code quicker. Having like a real life assistant also makes your code a little quicker.

Did that hold true for you, the 15% savings? It definitely did. I mean, I think this whole project was actually, I don't even think it took us 15% more time because we did that unstructured asynchronous approach. We did have to come back and do that time to refactor, but to be honest, the refactoring doesn't take as long as the initial coding, right? So I actually think we were quicker than that. But yeah, I would say it definitely held true for us.