Election Night Reporting Using R & Quarto (Andrew Heiss & Gabe Osterhout)

Transcript#

This transcript was generated automatically and may contain errors.

Okay, hello everybody. My name is Gabe Osterhout. I'm the data viz guy for the Idaho Secretary of State. So as my title suggests, I work for the Secretary of State, like him himself, and by his title he's the chief election official for the state of Idaho. So we have many roles in our office that we provide to the state, but one of those is on election night, you know, when you're getting your results from CNN, the New York Times, if you're going to the state's direct website, you know, we're the ones who aggregate that data. So usually when you vote, at least in a state like Idaho, the county administers the election, kind of at the precinct of the polling place, they report the results to the county and the county's reporting it to us. So on election night, we're the ones responsible for aggregating that information and then publishing it so the public can see it and then the media can grab that data as well.

So we tried something new for the 2024 elections where we used R & Quarto for that process.

Context: the 2020 shadow and taking office

Remember the 2020 election? It was not a fun time, you know. I was, we were not in office during the 2020 election, thankfully, but, you know, my boss took office in 2023 and my my position actually didn't exist yet. There was no data visualization person, there was no data analyst person in our office, believe it or not, but my boss, thankfully, is a data nerd. I don't know how many of you work for data nerds, but it's very nice to have allies in powerful places, but this was kind of the world that we took office in, right? There were still a lot of the the consequences of the 2020 election, a lot of conspiracy theories, misinformation, you know, valid or not, this was kind of the world we are working in.

So everything election related in my world is extremely scrutinized. So going into the 2024 election, we have kind of these ghosts of 2020 in our head, right? Even though we didn't work the election. And so at the beginning of 2024, we launched this great new voteidaho.gov website. It recently won an award. It was highly acclaimed at the time. Voters really liked it. It became a one-stop shop for all kinds of voter information. Check your polling place, view your sample ballot, and, you know, we thought, hey, this is 2024, we're entering the cycle, they have all these great resources. Why not also make it a one-stop shop for election results on election night?

The challenge of improving election results

So my kind of role in all of this, obviously, is, all right, so now I'm tasked with improving our election results for 2024. Can't be that difficult. So my boss, he wanted to kind of remind me, you know, we have a lot of things going on in the data visualization world. As we all kind of know, there's competing priorities, there's lots of questions that are super interesting to me. Now that we've kind of got a bunch of dashboards and visuals and insight into past elections, since my position never existed, there's a ton of outside interest in media questions, questions in the legislature, like, what about this? What about that? And unfortunately, one of my weaknesses is, well, that sounds really interesting, I want to pursue that question, too.

But we had this real time constraint, right? We're not operating in a vacuum where we have time to do everything. So, you know, my boss is worried about me getting distracted, so he put this priority list on my desk. Election results were the second most important thing I could be working on at all times. It was also the third most important thing, the fourth most important thing, and fifth, sixth, and seventh. So it's helpful for me. I just kind of show you this as a reminder of, all right, this is kind of an important thing. We can't screw it up, obviously, right, given what happened in 2020.

So as I'm thinking about ways to improve our election results, you know, we're looking at what other states are doing. Now, election night reporting, I'll call it ENR as an acronym, it, you know, every state does it. It's always been secure. It's always been accurate, as a lot of us know, but the data visualization part of it for other states isn't usually something they're focused on. They want to make sure they get the numbers out there, and they might have a JSON file or our state has an XML file. The media grabs that, and then the media does the really cool visualization part. You know, I've gone to the New York Times for my election results. I've gone to CNN. I'm sure a lot of you have as well, because they have really cool results, and they have interactive maps. A lot of states don't.

Idaho had never used maps for official election results. You know, being an election junkie myself, in this current job and previously, I had looked at a lot of historical results and visualized them and created really fun interactive maps, but I had never done it, you know, we wanted to incorporate that in the actual official ones, and obviously we have access in R to all kinds of cool mapping packages, things that create beautiful tables, and so I wanted to leverage that expertise, but then also have it update in real time and be a resource for people on election night.

Idaho had never used maps for official election results.

So the only problem with the New York Times, I already mentioned I'm a data team of one, right? My position never existed before. I'm about to double my number by bringing in Dr. Andrew Heiss. The only problem with the New York Times is they have more than one person. I'm sure that's a huge surprise to hear, but this is 2020. This is just the Idaho election results page, and if you scroll to the bottom of this page, they credit over 40 people on working on just Idaho's election results, and I didn't know it at the time, but 2024 obviously is what we're gonna end up competing against. They had 63 people listed as working on that, which tells me a couple things, but most importantly, you know, it shows the New York Times was also taking 2024 more seriously and realizing there's a lot more attention and scrutiny, and that's just for Idaho, right? Not a lot of people go to New York Times for Idaho's election results, especially in the presidential. It's usually a foregone conclusion. Idaho hasn't voted for a Democratic presidential candidate since 1964.

The magic behind all of this is a package called targets, which you should all be using.

So what I want to do really quick is just highlight some of the cool things we had to do to make these pipelines work really efficiently. The first is getting the data from the database. This was tricky initially because, again, it had to be in the capital, and I wasn't there, so helping develop this was hard, and so initially for the May primary, we were just working with like RDS-based extracts from the database, and then using raw SQL commands to grab stuff from there. Raw SQL is gross, and so for November, what we ended up doing was switching to dbplyr , because that's better, and then creating kind of a mirrored version of the schema of the actual database using all of the tables they had internally, but with like our own simulated data, just so we could practice connecting to the real database, and then on election night, we just had to change one environment variable from like use fake data or fake database to use real database, and then it worked, and so it was kind of cool and magical. So we were able to fix that.

Another trick was getting these two pipelines to talk to each other, especially because they were in like different locations. In the May primary, we saved every one of those objects as an RDS file, and then using network drives, got it communicating, and that was fine. In November, though, targets can actually write to Amazon AWS S3 buckets and meet like as part of the pipeline, and so we just send it off to S3, and then grabbed it from S3 in the website pipeline, and that was actually a lot faster than the network drive thing.

So doing stuff with the results from the database, this was also tricky, and we were not lazy, but we were constrained for time in May, and so in May, what we ended up doing is every X minutes, however long it took to pull the database, we would just take whatever the latest database results that were and then rebuild every single map, even though we were using targets, where you technically don't need to do that, just for the sake of expediency, we just told it like even if nothing changed, rebuild all the maps, rebuild all the tables, and then rebuild the website. For November, though, we learned or I learned how to use dynamic branching in targets, which lets you create like individual on-the-fly targets, and so what that let us do is take this pipeline here, if you look at the ETL pipeline at the top, instead of just saying have any of these results changed, if so, build the whole thing again, we were able to say if any of the precinct level results for any candidate have changed, then only upload or update those tables or those maps, and so it changed that top pipeline from this to this, which was intense. There's thousands of possible targets there, but it was able to go really quickly, and again, this let us shrink down to only a couple minutes of running the pipeline here.

And then finally, building the actual Quarto site for the May election, we had all of those panel tab sets with like the tables and then the maps, and we just like copied and pasted hundreds of times, just because we were trying to go fast. For November, though, we figured out that you can dynamically generate markdown chunks using purr, and so we were able to use these 19 lines of code to insert each of the tables and maps into this template, and then it just ran and generated all 500-ish elections in the whole state really, really fast, which was magical.

Deploying with Quarto and election night

So that got us to finally the web developer part here, where we have to build the site and deploy it, and we're able to do that magically through Quarto here. Okay, really fast, the way we can sum up the web developer part, and why we landed on Quarto, you know, we love Shiny , it would be really fun to filter by races and stuff, but Shiny probably wouldn't scale to the people we're talking about. We're talking about thousands of people refreshing at once. This kept me up at night, even once we landed on Quarto, being a static page, where that's less of a problem, and we had it hosted on Netlify, and I have to give their engineers credit, I don't know if any of them are here, but I had emailed with them, and I'm like, hey, here's our site, you think this thing's gonna crash on election night? Because if it does, you know, all the stuff we've talked about and worked hard on isn't gonna matter. We're gonna have a couple problems on our hands. I probably wouldn't be up here speaking to you, because I'd be dead.

But, you know, so let's fast forward. It's election night, it's nine o'clock, the polls have closed, we've built all this out, our site is live with zeroed results, and now people are starting to hit it, because polls have just closed, and the results are about to come in. You know, I'm sitting in the Capitol, this is me, this is Andrew, counting down the results coming out, and we hit the button, and the site does not crash. The results are accurate. It ends up, November ends up being the largest election in Idaho's history. We had almost a million ballots cast. We got a ton of credit for the website that we had, and like any other engaged citizen, I ended up getting to just enjoy the election results, rather than worrying about whether the site would work or not.

And like any other engaged citizen, I ended up getting to just enjoy the election results, rather than worrying about whether the site would work or not.

We will go through the Secretary of State's website. They have these great tools. Gabe and the great team over at the Secretary of State's office has built these maps, so we'll be able to go through the state of Idaho, and see county to county, precinct to precinct, what areas were voting in favor of Proposition 1. What were some of the splits like in those legislative districts that could be swing areas. Something really, it was really cool that they got to use our website, not the New York Times.

So here's that, the QR code's not showing on here. It's on my computer. There it is. So QR code link if you want to check it out. You can check out the actual website, the code, and then Andrew and I will stick around if you guys have any questions for us. Thank you.

Election Night Reporting Using R & Quarto (Andrew Heiss & Gabe Osterhout) | posit::conf(2025)

Transcript#

Context: the 2020 shadow and taking office

The challenge of improving election results

Constraints: accuracy, timeline, and scale

Preview of the final product

The data pipeline

Deploying with Quarto and election night

Featured software#

dbplyr

leaflet

Quarto