Resources

Election Night Reporting Using R & Quarto (Andrew Heiss & Gabe Osterhout) | posit::conf(2025)

Election Night Reporting Using R & Quarto Speaker(s): Gabe Osterhout; Andrew Heiss Abstract: Election night reporting (ENR) is often clunky, outdated, and overpriced. The Idaho Secretary of State’s office leveraged R and Quarto to create a better ENR product for the end user while driving down costs using the open-source software we all know and love. With help from Dr. Andrew Heiss, R was used in every step of the process—from {dbplyr} backend to visualizing the results using {reactable} tables and {leaflet} maps, combining the output into a visually appealing Quarto website. Quarto was the ideal solution due to its scalability, quick deployment, responsive design, and easy navigation. In addition, Dr. Heiss will discuss the advantages of using a {targets} pipeline and creating programmatic code chunks in Quarto. GitHub Repo - https://github.com/andrewheiss/election-desk posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, hello everybody. My name is Gabe Osterhout. I'm the data viz guy for the Idaho Secretary of State. So as my title suggests, I work for the Secretary of State, like him himself, and by his title he's the chief election official for the state of Idaho. So we have many roles in our office that we provide to the state, but one of those is on election night, you know, when you're getting your results from CNN, the New York Times, if you're going to the state's direct website, you know, we're the ones who aggregate that data. So usually when you vote, at least in a state like Idaho, the county administers the election, kind of at the precinct of the polling place, they report the results to the county and the county's reporting it to us. So on election night, we're the ones responsible for aggregating that information and then publishing it so the public can see it and then the media can grab that data as well.

So we tried something new for the 2024 elections where we used R & Quarto for that process.

Context: the 2020 shadow and taking office

Remember the 2020 election? It was not a fun time, you know. I was, we were not in office during the 2020 election, thankfully, but, you know, my boss took office in 2023 and my my position actually didn't exist yet. There was no data visualization person, there was no data analyst person in our office, believe it or not, but my boss, thankfully, is a data nerd. I don't know how many of you work for data nerds, but it's very nice to have allies in powerful places, but this was kind of the world that we took office in, right? There were still a lot of the the consequences of the 2020 election, a lot of conspiracy theories, misinformation, you know, valid or not, this was kind of the world we are working in.

So everything election related in my world is extremely scrutinized. So going into the 2024 election, we have kind of these ghosts of 2020 in our head, right? Even though we didn't work the election. And so at the beginning of 2024, we launched this great new voteidaho.gov website. It recently won an award. It was highly acclaimed at the time. Voters really liked it. It became a one-stop shop for all kinds of voter information. Check your polling place, view your sample ballot, and, you know, we thought, hey, this is 2024, we're entering the cycle, they have all these great resources. Why not also make it a one-stop shop for election results on election night?

The challenge of improving election results

So my kind of role in all of this, obviously, is, all right, so now I'm tasked with improving our election results for 2024. Can't be that difficult. So my boss, he wanted to kind of remind me, you know, we have a lot of things going on in the data visualization world. As we all kind of know, there's competing priorities, there's lots of questions that are super interesting to me. Now that we've kind of got a bunch of dashboards and visuals and insight into past elections, since my position never existed, there's a ton of outside interest in media questions, questions in the legislature, like, what about this? What about that? And unfortunately, one of my weaknesses is, well, that sounds really interesting, I want to pursue that question, too.

But we had this real time constraint, right? We're not operating in a vacuum where we have time to do everything. So, you know, my boss is worried about me getting distracted, so he put this priority list on my desk. Election results were the second most important thing I could be working on at all times. It was also the third most important thing, the fourth most important thing, and fifth, sixth, and seventh. So it's helpful for me. I just kind of show you this as a reminder of, all right, this is kind of an important thing. We can't screw it up, obviously, right, given what happened in 2020.

So as I'm thinking about ways to improve our election results, you know, we're looking at what other states are doing. Now, election night reporting, I'll call it ENR as an acronym, it, you know, every state does it. It's always been secure. It's always been accurate, as a lot of us know, but the data visualization part of it for other states isn't usually something they're focused on. They want to make sure they get the numbers out there, and they might have a JSON file or our state has an XML file. The media grabs that, and then the media does the really cool visualization part. You know, I've gone to the New York Times for my election results. I've gone to CNN. I'm sure a lot of you have as well, because they have really cool results, and they have interactive maps. A lot of states don't.

Idaho had never used maps for official election results. You know, being an election junkie myself, in this current job and previously, I had looked at a lot of historical results and visualized them and created really fun interactive maps, but I had never done it, you know, we wanted to incorporate that in the actual official ones, and obviously we have access in R to all kinds of cool mapping packages, things that create beautiful tables, and so I wanted to leverage that expertise, but then also have it update in real time and be a resource for people on election night.

Idaho had never used maps for official election results.

So the only problem with the New York Times, I already mentioned I'm a data team of one, right? My position never existed before. I'm about to double my number by bringing in Dr. Andrew Heiss. The only problem with the New York Times is they have more than one person. I'm sure that's a huge surprise to hear, but this is 2020. This is just the Idaho election results page, and if you scroll to the bottom of this page, they credit over 40 people on working on just Idaho's election results, and I didn't know it at the time, but 2024 obviously is what we're gonna end up competing against. They had 63 people listed as working on that, which tells me a couple things, but most importantly, you know, it shows the New York Times was also taking 2024 more seriously and realizing there's a lot more attention and scrutiny, and that's just for Idaho, right? Not a lot of people go to New York Times for Idaho's election results, especially in the presidential. It's usually a foregone conclusion. Idaho hasn't voted for a Democratic presidential candidate since 1964.

Constraints: accuracy, timeline, and scale

Okay, in terms of timeline, so I've already mentioned it needs to be perfectly accurate, right? Otherwise, it's January 6th at my house. I already mentioned, you know, it needs to be visually appealing because we want people to go to our site, go to the source, rather than just go to, you know, other organizations, and we want election results that Idahoans can be proud of and confident in, but there's this other constraint, which is the timeline. So our NewVoteIdaho.gov website launched in February of 2024. The election's not till November, so I've got quite a few months to work on this, right?

Now, the general election for our non-U.S. friends, this is the one you always see in the news where president is up. The only issue is, I keep saying the only issue is, there's lots of issues here, as you can tell. In Idaho, we have the primary election on May 21st, and so a primary election is where Democrats face off against Democrats, Republicans face off against Republicans, and then the ones who win within their party then go on the ballot for November. The primary election in Idaho is more important than the November election because the Republicans in Republican districts are likely to win in November, and the Democrats in Democratic districts are likely to win in November. So that's where all the attention is, that's where all the money is.

Another challenge we have is, you know, the New York Times is showing kind of the nationally relevant races, like President and Congress. We have to show those and the legislative ones, the ballot measures, and then all the local stuff, because as the official place to go for election results, we can't just show the ones that seem interesting. We have to show them all. So we have over 400 races in May that we have to show, and I mentioned the accuracy part, right? You can't say 99.9 percent uptime like you guys usually do in the data science world. You know, if 399 are right and one's wrong, like that person that's in that race is watching that page and they're gonna notice, and they're gonna take a screenshot.

So our results, they also have to update in real time. So unlike all the data analysis that a lot of us have done in the room, I know my experience working with our data, you know, you have a static data set, it may be super messy, it may be clean, whatever, but it's like static. Or we've had data that updates daily, but never every few minutes, right? So as the results are coming in, we can't just wait till the end of the night to show it. That's also how conspiracy theories start. Where are all the results?

So we need it to update in real time. So that's kind of the real challenge at this point, you know, kind of figuring out all these other things. How do we get it to update in real time? And so I bring on Dr. Andrew Heiss, professor at Georgia State here in Atlanta. A lot of you probably know him from his great blog posts and YouTube videos. He's just really cool dude, so come talk to him. So I brought him on. We met 40 days before the primary election.

So we're given these timeline constraints and all the other constraints I've mentioned. We don't necessarily have time to learn new tools to reincorporate this workflow. You know, we're really comfortable working in R. If you've used Quarto, you know it's a great product for publishing your visuals. We wanted to stay with something that's familiar. So, you know, I'm proud to report we were able to do this whole project without ever leaving RStudio. We didn't, you know, we didn't need a separate process for SQL, and we didn't need a separate process for like publishing the website, and then of course all the data wrangling and visualization in the middle. We all were able to do within the R ecosystem.

Preview of the final product

Okay, so I've listed a bunch of challenges. If you're not freaked out yet, before I turn it over to Andrew to kind of show some of the cool stuff he figured out for the data pipeline, I just wanted to show you a quick preview of the final product so you're not sitting there wondering, man, did these guys actually pull this thing off?

So this was the November 24 election. This was our most basic page, which is just the presidential race, but one thing that's, there's a few things that were really exciting about using Quarto for this project. Having the different district types at the top for navigation was a huge win for our state, and a lot of other states, you know, you don't usually don't have that kind of ease of navigation. You're usually having all the races on one page and having to scroll through, which is also why people seek out the media who have kind of thought these things through a little better. You know, having the status bar at the top to show whether results were official or unofficial. I'm not sure if you can see it, but there's a call-out block that's in green that says 44 of 44 counties are reported. That updated throughout the night, so every few minutes as the results are coming in, people were able to use that drop-down, figure out who's still coming and not, and then once the bar is green, you know there's not going to be any more results coming in.

So real quick, the, you know, you have kind of the race breakdown here. It's a Reactable, Reactable with a nice spark bar, and then for every race you have kind of the raw results summed, like you can see there, and then a tab that has a map. So for every one of the 400 races, there was a map, and we love leaflet maps and they're interactive, and you know, we know all the exciting things about them, but it was really cool to have those kind of updating throughout the night. And then another thing that's great about Quarto, as you guys know, is the table of contents. They were able to throw on the left, so being able to jump between district without having to scroll up and down is huge, especially with as many races as we had.

Real quick, so the, you know, leaflet, I learned a really cool thing, and we'll have a repo at the end where you can check it out, but the tooltips for leaflet, you can actually use custom HTML within a column and then just call to that for the pop-up, and so that was able to kind of programmatically regenerate every time the results came in as well.

All right, finally, on the website, this was something new we did November, not for May, which was really cool. If we had not done any of the other stuff I showed you, but just did this, we would have gotten a lot of credit in our state. We created this close races page that dynamically, every time the site updated, was checking to see which races were close, as the name implies. You know, what's funny about this is my boss and I, you know, the Secretary of State being a data nerd as he is, we went back and forth on what the threshold should be, what's a close race. I looked at some historical races and I thought, you know, three percentage points would be a great margin. I thought, you know, a nice balance of not having too many races, but it's not empty, and he thought, no, it really should be like eight percentage points. We need a wider net, and so we went back and forth on this, and I, you know, is any great compromise with your boss? We ended up going with his number, the eight percentage points, but he was right. If we had gone with my threshold, there only would have been one race on this page at the end of the night, and we ended up having about half a dozen, but when we're kind of following the analytics on election night, this is the page everybody was looking at, which was really cool, and we got a lot of credit for it.

The data pipeline

Okay, so before I turn it over to Andrew, if we're thinking about, you know, like an organization like the New York Times, where these people are doing really cool stuff. Yeah, they have 63 people, but they're really kind of divided into three roles. You got the data engineers that are kind of processing the data and doing the ETL processes. You have the data analysts that are then taking that data, visualizing it in tables and maps, getting insights into the election, and then you have your web developer that's really publishing the website. So those are kind of the three roles that Andrew and I are seeking to replicate, and I'll let him talk about the data pipeline component.

Yeah, so between the two of us, we had to turn into three separate teams, which was intense, and we had to do it all really quickly and have live updating data and all of that, and so what we settled on was creating a pipeline, two different pipelines that took all of the data and processed it and then made the maps. For all of you, we've actually made a fake state with fake elections with fake candidates. If you want to see the process, we have a whole GitHub repository with this all working, and you can download it and have your own fake election, and so this QR code will be out at the end as well if you want to see it, but so the way we did this is we had the ETL to extract, transform, and load, just getting the data, doing stuff with it. This pipeline had to run in the Idaho State Capitol, which is tricky for me because I don't live in Idaho. I live here, and so we will talk about some workarounds we had for that, but in general, it started by grabbing the most recent results from the official Idaho State vendor-provided database, and then we had to clean and process that, make it all work together nicely. To speed up the building of the website, we pre-built all of those 400 maps and tables and then had to store those objects as something so that it could then go on to building the website, which could happen anywhere in the world. It wasn't locked to the state capitol. It happened to be running in the state capitol, but that wasn't necessary, so with the website pipeline, we had to grab those objects, build the Quarto site, and then deploy, and we ended up using Netlify because they're cool, and so with these pipelines, the way we automated this is we just had them running with cron jobs, or for the May one, we were the cron jobs because we didn't trust the computers, but for November, we did have it go kind of on a regular cadence. For the May one, it was going every 10 to 15-ish minutes, but because of some really cool performance gains that we'll look at really quick, we were able to shrink this down to about every 2 minutes 90 seconds is how long it took to update the website, and so it was just going consistently all throughout election night, which is really cool.

The magic behind all of this is a package called targets, which you should all be using. It even works with analysis as packages, as we saw in the first talk here, so you should all use targets, so we basically, the magic of targets is that you can build that type of workflow here where you have different objects that are being created, and if anything needs to update upstream, like if you make a change downstream or upstream, and stuff has to change later, it's smart enough to know, and so if nothing needs to change, like if the data doesn't change, then nothing later on needs to rebuild, and so it goes really quickly, and so it manages dependencies and all sorts of stuff like that.

The magic behind all of this is a package called targets, which you should all be using.

So what I want to do really quick is just highlight some of the cool things we had to do to make these pipelines work really efficiently. The first is getting the data from the database. This was tricky initially because, again, it had to be in the capital, and I wasn't there, so helping develop this was hard, and so initially for the May primary, we were just working with like RDS-based extracts from the database, and then using raw SQL commands to grab stuff from there. Raw SQL is gross, and so for November, what we ended up doing was switching to dbplyr, because that's better, and then creating kind of a mirrored version of the schema of the actual database using all of the tables they had internally, but with like our own simulated data, just so we could practice connecting to the real database, and then on election night, we just had to change one environment variable from like use fake data or fake database to use real database, and then it worked, and so it was kind of cool and magical. So we were able to fix that.

Another trick was getting these two pipelines to talk to each other, especially because they were in like different locations. In the May primary, we saved every one of those objects as an RDS file, and then using network drives, got it communicating, and that was fine. In November, though, targets can actually write to Amazon AWS S3 buckets and meet like as part of the pipeline, and so we just send it off to S3, and then grabbed it from S3 in the website pipeline, and that was actually a lot faster than the network drive thing.

So doing stuff with the results from the database, this was also tricky, and we were not lazy, but we were constrained for time in May, and so in May, what we ended up doing is every X minutes, however long it took to pull the database, we would just take whatever the latest database results that were and then rebuild every single map, even though we were using targets, where you technically don't need to do that, just for the sake of expediency, we just told it like even if nothing changed, rebuild all the maps, rebuild all the tables, and then rebuild the website. For November, though, we learned or I learned how to use dynamic branching in targets, which lets you create like individual on-the-fly targets, and so what that let us do is take this pipeline here, if you look at the ETL pipeline at the top, instead of just saying have any of these results changed, if so, build the whole thing again, we were able to say if any of the precinct level results for any candidate have changed, then only upload or update those tables or those maps, and so it changed that top pipeline from this to this, which was intense. There's thousands of possible targets there, but it was able to go really quickly, and again, this let us shrink down to only a couple minutes of running the pipeline here.

And then finally, building the actual Quarto site for the May election, we had all of those panel tab sets with like the tables and then the maps, and we just like copied and pasted hundreds of times, just because we were trying to go fast. For November, though, we figured out that you can dynamically generate markdown chunks using purr, and so we were able to use these 19 lines of code to insert each of the tables and maps into this template, and then it just ran and generated all 500-ish elections in the whole state really, really fast, which was magical.

Deploying with Quarto and election night

So that got us to finally the web developer part here, where we have to build the site and deploy it, and we're able to do that magically through Quarto here. Okay, really fast, the way we can sum up the web developer part, and why we landed on Quarto, you know, we love Shiny, it would be really fun to filter by races and stuff, but Shiny probably wouldn't scale to the people we're talking about. We're talking about thousands of people refreshing at once. This kept me up at night, even once we landed on Quarto, being a static page, where that's less of a problem, and we had it hosted on Netlify, and I have to give their engineers credit, I don't know if any of them are here, but I had emailed with them, and I'm like, hey, here's our site, you think this thing's gonna crash on election night? Because if it does, you know, all the stuff we've talked about and worked hard on isn't gonna matter. We're gonna have a couple problems on our hands. I probably wouldn't be up here speaking to you, because I'd be dead.

But, you know, so let's fast forward. It's election night, it's nine o'clock, the polls have closed, we've built all this out, our site is live with zeroed results, and now people are starting to hit it, because polls have just closed, and the results are about to come in. You know, I'm sitting in the Capitol, this is me, this is Andrew, counting down the results coming out, and we hit the button, and the site does not crash. The results are accurate. It ends up, November ends up being the largest election in Idaho's history. We had almost a million ballots cast. We got a ton of credit for the website that we had, and like any other engaged citizen, I ended up getting to just enjoy the election results, rather than worrying about whether the site would work or not.

And like any other engaged citizen, I ended up getting to just enjoy the election results, rather than worrying about whether the site would work or not.

We will go through the Secretary of State's website. They have these great tools. Gabe and the great team over at the Secretary of State's office has built these maps, so we'll be able to go through the state of Idaho, and see county to county, precinct to precinct, what areas were voting in favor of Proposition 1. What were some of the splits like in those legislative districts that could be swing areas. Something really, it was really cool that they got to use our website, not the New York Times.

So here's that, the QR code's not showing on here. It's on my computer. There it is. So QR code link if you want to check it out. You can check out the actual website, the code, and then Andrew and I will stick around if you guys have any questions for us. Thank you.