Oops! I accidentally made a production dashboard (Jonathan Keane, Posit)

Transcript#

This transcript was generated automatically and may contain errors.

Thank you, everyone. And I'm going to talk to you about a time that I accidentally made a production dashboard. I think this is something that will be familiar to many people in this room.

So a bit of a story. I was working at a job, and I was working on a dashboard. We were looking at benchmarking reports, and it ran every night. I thought it was pretty cool at first. We shared it with the engineering team. We started getting some questions. People seemed interested in it. And I thought that was that. And then one morning, I was in a meeting, and the CEO of the company said, oh, yeah, that thing, that thing's great. I look at it every morning before I even had my coffee. And that hit me like a ton of bricks. This was not a proof of concept anymore. This was something that I needed to maintain. I needed to make sure it had accurate numbers, or else I was going to get a call before I had my coffee, and they were one time zone ahead of me, so that would be even worse.

of the company said, oh, yeah, that thing, that thing's great. I look at it every morning before I even had my coffee. And that hit me like a ton of bricks. This was not a proof of concept anymore.

This is very common for data science. Data science projects transition from something that's a proof of concept, like is this even possible, let's try it out, to something that is suddenly actually production, where it is something that people depend on day to day, very quickly, and sometimes in ways that we don't realize, and that's where things get kind of dangerous. And a lot of the things that we've talked about in this whole session about how to productionize and what it means to be production are super important for things that are production.

virtuous cycle of testing and modularity and something that you can cleanly test is generally cleanly factored.

Continuous integration

Okay. Now that you've got your tests, you can run them locally. That's great. That will confirm that you don't have a bug. And that's what this person is doing, running a bunch of tests all the time. They make a change, they run a test. But if you have code that's going to be shipped off somewhere, hosted on a service and run somewhere else, you can't just test on your local laptop because you might have installed specific dependencies or something and not realize that's actually important to how you're actually running the code in production. And so, one of the first things that you need to do when you're testing is have your tests be run on some other computer. And so, you could call a colleague and be like, hey, can you run these tests for me? And that will work a few times. But if you're wanting to do that every single time you make a change, you're going to turn a friend into an enemy very, very quickly. And so, ultimately, what you want to do is find a way to run that test on some computer that's somewhere else. It's not your computer. And that's what cloud laptops are for. And that's exactly what continuous integration is. It's a container, which is a laptop in the cloud somewhere. It's not actually a laptop, but it runs your tests on another computer and you don't have to bug your colleague to do that for you.

And that is effectively what continuous integration is. It is running your tests every time you commit to your GitHub repository or other Git repository, spinning up in a container that's totally isolated, and it runs it constantly. And that is fantastic. So, continuous integration, typically these days, is connected to your source control system. So, if you're using GitHub, GitHub Actions is very integrated. It will spin up cloud laptops for you to run your tests. If you're using GitLab, GitLab runners are the same thing. They're very, very similar. They have different syntaxes in the way that they work out. But if you're in a larger organization, you might see other things like Travis CI or Circle CI or Jenkins. And other teams might already have these set up. And you can hook into these. And they do effectively the same thing. They're not quite as integrated with your version control system, but they can do basically the same thing as GitHub Actions or GitLab runners.

Keeping dependencies up to date

Okay. So, now that we've got our tests going, we have a dashboard that's running. We've got a bunch of dependencies like ggplot2 and dplyr and plumber , et cetera. What do we do as time marches on? One of the things that I have always found really arduous when I have a long lived project is making sure that I am staying up to date with my dependencies. One of the one approach you could take is to just constantly upgrade the dependencies bit by bit. Like every day check and see if there's something new. Bump it if there is. And this is really fantastic because when you do small steps with your dependencies, frequently there's nothing that you have to change in your code. There are no breaking changes. You can just move on with your life. But if there are breaking changes, and you're only going from version 1 to version 2, that's usually a pretty easy there's one or two things that are changed. It's quick and easy to actually change that and keep up with the upgrade. But if you're jumping version 1 to version 8, that can be a huge pain. Because you don't there are a bunch of things that changed in your dependency. Your code is interacting with that. You have to parse through the change log of the dependency.

But the nice thing about that is you're not sitting every day. I don't even need to update the dependencies. I need to update dependencies. With modern CI, like GitHub actions, you can use what's called Dependabot. It scans your dependencies and sends you a PR every time there's an update or there's a critical bug. You can just accept it. If you got tests that are testing whether or not your code is running, the tests run on the Dependabot PR, if everything is good, you can just merge it. And so that's basically having a bot do your constant updating for you, and you don't have to think about it. This is just one of the many, many fabulous things that you can do with modern CI. But it is fantastic, and it means that you're not worrying about dependency upgrades except for when you actually have to get in if you're out what changed.

And there are many other things that you can do once you realize you've got a production app. Many of the topics that we talked about here, and Joe will talk about what to do when you've got REST APIs that are slowing down because you've got a lot of people that are poking at them suddenly.

And so ultimately, what I want you to walk away from this is go from oops, I made a production dashboard, to yay, I made a production dashboard, and actually get to be happy about it, because you know you've got tools that you can maintain it, and you're not spending time doing that on dashboards that aren't actually production, aren't actually popular. So with these tools, you can monitor the dashboards that you're building, know whether or not they're popular, and then for the ones that are actually popular, you can keep them running stably with modulization, testing, keep them up to date with things like dependency updates. But again, most importantly, you only really need to do that if it truly is production. If you've got a proof of concept that was cool, answered some question, but nobody is looking at it, you can go home. You're good to go. Thank you.

Q&A

Thank you. I think we have time for a couple of questions. So first one, do you have any advice for someone who's trying to start modulizing their data code, but is trying to balance speed with best practices? Oh, that's complicated. I actually kind of want to know, speed there, is that speed of your energy or speed of the code itself? But I know you can't clarify. Yeah, that's how I would interpret it. Yeah, yeah, yeah. I would say start small. And especially, it sounds a little religious in some ways, but writing, doing your modularization and then writing your tests and using your tests as a confirmation that you've modularized well is super powerful. And once you kind of fall into that circle, I have found it helps a lot.

Awesome. Do you have a favorite tracker for web analytics and why? Favorite tracker for web analytics? Yeah, for web analytics. Or do you tend to use open source tools like Shiny Logger and stuff? Yeah, it really depends on the context that you are, where you're deploying your app to. So if it's something that's public on the internet, I think Google Analytics is really fantastic. It's basically free. It works with a bunch of different things. If you've got a dashboard that's kind of internal to an organization, that can be more difficult and might be against security practices. And so using some of these things where you're actually having to build it out yourself, or like I said, if you have Posit Connect, that comes built in with Posit Connect. But it really depends on, is it something that's public or is it something that is internal to an organization? Right. And I just wanted to say thank you, because honestly, I use Posit Connect every day and I do use the built-in features for that. But I didn't know that Shiny package and that Quarto package for logging existed. So thanks a lot for letting me know about that. Awesome. Yeah. Thank you. Thank you.

Oops! I accidentally made a production dashboard (Jonathan Keane, Posit) | posit::conf(2025)

Transcript#

Living in the slush

Usage monitoring

Modularity and testing

Continuous integration

Keeping dependencies up to date

Q&A