Data Science in Production Has Never Been So Easy | Feat: Posit Connect (Adam Wang)

Transcript#

This transcript was generated automatically and may contain errors.

Thank you everyone for coming. I'm really excited to follow up on the spotlight video we saw yesterday morning where we got to see Pierce's story and his battle against blood cancer.

And so every year there are tens of thousands of patients like Pierce where their best hope for a cure is often through an unrelated donor transplant who at that point in time is basically a total stranger. And so it is our job as data scientists to use our data superpowers to help identify these donors to better serve patients like Pierce. But one of the challenges that I find data science typically get themselves into is how do we get these insights into production into the hands of decision makers, in our case physicians who are selecting donors, and how do we get the insights to them when they need them.

And so over a couple of years we've had a journey of trying to really make this process much easier and reduce the friction between local development and production environments. And so it all starts around five years ago we were a young data team where we were doing lots of ad hoc analyses in R. And it was kind of a cautionary tale of be careful what you wish for because we did a lot of great analyses. They were so great that stakeholders wanted them refreshed every morning. So I would set an 8 o'clock alarm, run a series of 20 reports, make sure everything's working, and rinse and repeat.

And so that got very, very tedious and so as one did back in the day we were feeling lucky and we searched the web for how can we automate the running of these R scripts. And the first result was Windows task scheduler. And it actually made a lot of sense conceptually, so it replaced me and my alarm clock with a Windows server managed by our IT department running on a task scheduler. But it had quite a few pain points, one being we didn't always know when reports failed, which is a big problem to have. And as we started having more and more infrastructure and reports on this server, syncing files became quite difficult. So pushing from our laptops to our Git repository is straightforward, but then we had to build out a whole process on the Windows task scheduler service to sync with Git and inevitably there are fire drills where people are hot fixing on the server itself. You have to go the other direction, which is quite a pain to manage.

And so we were wondering that there's got to be a better way to do this, right? And so that's when we realized we reinvented the wheel, only our wheel was a little bit more squarish than we would have liked, whereas software like Posit Connect just works plug and play and you get high-quality enterprise-grade software. And so part of the reason why I'm giving this talk is because even in 2025 with AI, the very first result for our automation is still Windows task scheduler, and so this is my contribution to fight back against their monopoly on our automation.

our wheel was a little bit more squarish than we would have liked, whereas software like Posit Connect just works plug and play and you get high-quality enterprise-grade software.

we're able to put all the pieces together to create this, you know, architecture diagram that we use for data science that really focuses on development experience for the data scientist, so that we can bring all our awesome insights that we develop locally and have a really clean process to push them into production, but also without compromising or missing features.

Q&A

All right. Thank you, Adam. We have time for a few questions. So, the first question for you. Could you elaborate more on Git-backed pushing to Posit Connect? Does the report or app update on commit? Yes. So, the short answer is that yes. It basically updates on commit to the branch within 15 minutes. And so, you can think about Posit Connect is running every 15 minutes, checking to see if there's been changes to any piece of content you have on the server. And if it picks up any new changes, it's going to fold those in and automatically deploy a new version without you having to go in and click buttons yourself. If you want it to be immediate, you can go in and click a button to do it faster, but otherwise, we generally have a workflow of push to Git, merge, check our changes, and once they're merged into the main branch, then we know it'll update in Posit Connect.

Okay. Another question. What features are missing from Posit Connect? So, I have one nitpicking one that comes to mind. So, when you schedule reports on Posit Connect, there's the common use cases of every day or every month or every 15th of the month. But sometimes we get weird requests from people that are like, can you run this on the 19th business day of this month or at least four times of the day that aren't nicely every hour or some periodic interval. And so, there's ways to work around doing that by making a parameterized report, but it would be nice if you could just point and click and be like, this is the schedule that I want. But that's kind of a nitpicking. Broader, I'll have to think about that. Connect with me and we'll talk more. I'll try to think of something. Okay. That's all the questions. Thank you so much. Adam, another round of applause, please.

Data Science in Production Has Never Been So Easy | Feat: Posit Connect (Adam Wang)

Transcript#

Why the team loves Posit Connect

Architecture and the data science lifecycle

Connecting to the database

Internal packages and connecting data holistically

Q&A