Resources

Data Wrangling for Advocacy: Tidy Data to Support the Affordable Connectivity Program - posit conf

We sought to create a dashboard to highlight some frequently asked statistics about the Affordable Connectivity Program. Although the program data were "available" to the public, they were not accessible. Wrangling messy datasets is not new to data scientists, and our work included pretty straightforward summary statistics, yet this work became a leading resource for advocacy groups, academics, and policymakers, and ultimately led us to a meeting where we shared guidance about the program funding needs with the White House. While I give a shout-out to a few of my favorite R packages, in this talk, I focus on the strategies you can employ to make your data tool have an impact. Talk by Christine Parker Slides: https://www.canva.com/design/DAGKFckt2-Y/Lkf8VC3nCfYfwD1hYOydDg/view?utm_content=DAGKFckt2-Y&utm_campaign=designshare&utm_medium=link&utm_source=editor ACP Dashboard: https://acpdashboard.com/

Oct 31, 2024
18 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, thank you everyone for being here today. I know there were a lot of good talks in this time slot, so I really appreciate you coming to this one. I'm Christine, and today I'm going to tell you kind of a data fairy tale of sorts. And it starts on not a dark and stormy night, but just a regular day in the day of a data wrangler and sometime last fall. And I'm like doing some regular work and drinking some microwaved coffee that's now cold. And I get this email from a work colleague asking if I could take a call with the White House today.

And I immediately had some imposter syndrome as I'm in the pocket office in the basement of my rural main home, wondering how this could be happening. And before I knew it, I was in fact on a call with the White House that afternoon, informing them that they probably underestimated how much funding would be needed to get the Affordable Care Connectivity Program through the end of 2024. And I'll explain in just a minute what that program is.

So it was a very weird situation to be in. And you might be wondering, like, Christine, how did you find yourself in this moment? And it's because of this dashboard we built. No, this isn't going to be a talk about how to build a dashboard. There are many excellent talks about that that I've seen this week and they're online. Rather, this is going to be a talk about how you can take or how you should take whatever data viz tool thing that you are making and how to make it as impactful as possible.

Background on the Affordable Connectivity Program

And we're going to rewind time a little bit further back to 2021. So the pandemic is still hanging on. People are isolated, quarantining, and it's never been more apparent around the world how important it is to be able to get online to access the Internet. And in the U.S., the Federal Communications Commission, which is the government entity responsible for all things telecommunications, including the Internet, recognized that a lot of people couldn't afford to get online. And this was a big problem because at this point, we're still, you know, a lot of people are working remotely, they're schooling remotely, telehealth is a rising technology. And so for folks not to be able to afford to get online was an issue.

And so they started the Affordable Connectivity Program, which had a bank account of $14.2 billion. And it didn't have a timeline for when it would end because it felt like a very kind of a pandemic-centric program. But it also had no plan for when more money would be added. There were a lot of questions about this program, but they got it stood up quickly and helped a lot of people afford their Internet bill each month. It gave them a $30 discount for eligible household unless you lived in tribal lands, and then it was $75.

Now, because this was a federal program, the enrollment and claims data, so how much was being spent and like how many people were enrolling, was made publicly available. But much like A Princess in the Tower, the data were technically available, but they weren't really accessible. And what I mean by that is there were national data in a web table, there were more national data in little web tables on a separate web page, and then this whole list of spreadsheets that were split up by geography and year.

But much like A Princess in the Tower, the data were technically available, but they weren't really accessible.

So for any given person to be curious about what enrollment rates looked like in their community, they would have to somehow navigate all this. And also all these different data sets were updated on different update schedules. So you also had to keep that in mind as you're navigating all this. And around this time, as things are kind of getting going with the program, I started to get a lot of questions about, you know, what is enrollment like in Cleveland or LA or New York State? And initially I was kind of doing these like one-of things, and then it started to become like majority of my job. And I thought, perhaps this is a great time to figure out how to build a dashboard.

Choosing the right tools

I was still kind of new in my like data science job. And at the time, I was very proficient in data wrangling in R. But then when I thought about trying to create a dashboard in R, it gave me a bit of anxiety. I don't know how to use Shiny yet. It's part of my learning plan for this year, and I'm very excited to learn Quarto. But at that time, those things weren't accessible to me. We did have some experience on the team working with Tableau, which is like another data visualization tool. And I had played around with it a little. So we decided to use that. So we combined data wrangling in R and then visualized those elements that we created in Tableau.

And so this is my first point. And I've seen this in other talks this week, and I think it's a really, it's simple, but you know, use what is available and accessible to you now. It's fine if you have the time and bandwidth to learn that new tool, learn Quarto and Shiny and other like really cool things that are out there. But if you don't have the bandwidth, make a plan to learn it later. You don't have to do it now. Give yourself a break.

Cleaning the data

Since we decided on that plan, we first had to like clean up all this data. It was really a hot mess. If anyone's played with federal data, you can probably attest to this. Even state data, really. And so this is just like a tiny little clip from one of the spreadsheets. And you can see on the far right, well, each record is supposed to be a single state line, but we've got data grouped under a header indicating month and year on the far right. So we had to like pull out all this header data to create additional fields and, you know, clean up all the field names and just a lot of like labor going into like getting these data into a usable, workable state.

And Tidyverse has been, you know, I'm a big fan of Tidyverse. So like all the tools, underlying tools and functions in there did a lot of the hefty heavy lifting here. And for anyone that's involved in like spatial data wrangling in here, this was something that was new to me and it was a very big pain. But as I had gotten the data cleaned up and I was joining it with another set of data, I found that for these nine states, for some reason it wasn't working, but this is one of those unfortunate errors in R that is not giving you an error. So you don't necessarily know what's happening until you see like this weirdness where you have NAs occurring. And it's because of, in the left-hand column, you'll see each of those states, the ID number starts with a zero. These leading zeros tend to get gobbled up in Excel files and that was causing the mismatch.

So it was really frustrating. But once I figured it out, I also found this neat little function in Stringer that allowed me to say, you know, all these ideas should have two digits and where they don't had the left side with a zero. And it was like magic. But ultimately we didn't do anything magical in this dashboard as a whole. And you'll see it along the way, but it's a lot of just sums, totals of data. We had one linear model that was like the fanciest thing we had in here. And this is my next point. When you're creating these things, it doesn't need to be super fancy. You want to keep your end user in mind and what they will need it for and what, how they can use it. You want it to be something that is accessible to them without you having to go in and explain it every time. That will make it more easily shareable.

Estimating eligibility and collaborating

And then next, once we had all of our data cleaned up and put together, we started to work on our eligibility estimate. So in order to calculate enrollment rates across the country, we had to determine how many people were actually eligible for this program. So we used tidy census to pull in data from the American Community Survey and using the guidelines from the FCC for which households were eligible, we came up with this kind of basic calculation.

From there, we, you know, as the lone data person on my team, I often work in isolation, but in this situation, this was intended to be a very like public facing tool. And I really needed feedback on what I'd done so far and how I had worked with the data, things I had done. And so we started reaching out to anyone that we heard had been working with this data set. And so it included other advocacy groups like ourselves, other research teams, academic folks, whoever we could get a meeting with. And we shared with them how we came up with this and this eligibility estimate, ultimately for any folks creating tools like this became like the secret sauce. So it was really exciting that people were actually welcoming us to talk with them and give us like insights to how they calculated their eligibility estimates. And so we were able to take those insights back and improve our calculation and make it much more robust.

And this is my next point. I know a lot of us work in isolation and I know I've talked with people about it this week, but think about how nice it feels to be here this week and talking about data issues and package things this week. Like it feels really good and you get such good feedback on things. So keep that in mind as you're working on these data products. It can only really make it better ultimately.

Publishing and promoting the dashboard

And so finally we had all of this information put together. We put it into Tableau, fine-tuned and published it. We didn't stop there though. Once you get something online, you should never just stop there because it's ultimately a tool. It's not going to get itself in front of people's eyes. So there's a lot left to be done from here.

And we had a few natural avenues that allow us to do this. So we have a couple of podcasts. Our team has a website. We also have an overarching org website. So we wrote up an article on our page about the dashboard. It also included the GitHub where we have all the cleaned up data as well as the code that I use to calculate all these different elements. And that went out, this article was sent through all of our listservs, all of the collaborators that we had previously worked with. We, in some cases, met with them again and walked through the dashboard and asked for feedback on all the different elements. Like, is there anything weird here? Is there something that's like not intuitive or not working?

And we got a lot of good feedback and requests for more things. And so not too long after we had a huge, like a major update and conveniently following this major update, we were all attending a big national conference where we printed off these little postcard sized handouts of the dashboard and had a little info in the back. But we had like a booth just like this conference. And ACP, this program was a big topic at the conference. There were whole sessions dedicated to it. So it came up a lot. And anytime someone would stop by to talk to us, and that would come up, have you heard of our dashboard? And so we would hand them the postcard and we would explain to them what is in the dashboard and how it can be useful to them.

And so it was a lot of like this word of mouth in person networking, networking with the dashboard and kind of the central train here is like, just keep talking about it all the time. In fact, my husband at one point, jokingly kind of banned the word dashboard in our house for over the time because it was a constant in my life for quite a while.

Impact and outcomes

And unfortunately, this fairy tale kind of has a bittersweet end. Congress did not reallocate funding for the program. And so as of June of this year, the program ended and 23 million or more households lost that financial assistance to be able to get online.

And you might be wondering, well, Christine, like how successful really was this dashboard then? And it was really successful because we that was not our intention. Our intention was not to support this program and keep it going. Our intention with this dashboard and our users were other advocacy groups from folks that are doing research and more policy advocacy to folks on the ground that are actually trying to enroll people in the dashboard. We have a zip code map and we can hand this to people and say, you know, in your city, these zip codes have really low enrollment rates. This is where you should spend your limited time, funds, people effort on getting people enrolled there. And so in that way, it was super effective.

We were also cited in letters between congressional members trying to get the program refunded, Senate hearings, academic literature, a few state digital equity plans or proposals for federal funding. So it was really impactful in the ways that we had hoped.

We were also cited in letters between congressional members trying to get the program refunded, Senate hearings, academic literature, a few state digital equity plans or proposals for federal funding. So it was really impactful in the ways that we had hoped.

Key takeaways

And just to kind of wrap up and remind you of the things that you can do to make sure that your tool is really impactful. Again, use the things that are available and accessible to you now, unless you really want to like learn that new thing along the way. If not, you can always use the thing that you do create and use that as a template. Now I have this dashboard, I can use it as a template for when I want to create in Quarto. When you're designing the dashboard itself, make sure or other tools, make sure you keep in mind the audience and their use case. Having like a lot of like intense information and all the possible widgets is not necessarily what your end users need. They need the information that can help them do their best work.

So keep it simple. Collaborate, make sure you're reaching out to other folks in the space, sharing information with them, asking questions, improving, getting feedback to improve your work. And finally, make sure you're getting it out there. Now I shared some ways that we went about it. Another idea is to, you know, kind of do some like cold emails. If you know of people that are like reporting regularly on a particular topic that you're working on, you can reach out to them and say, hey, I noticed you report on this a lot. We created this thing. And if you ever need a subject matter expert or want to talk about this thing we created, you know, I'm available. That's another way you can kind of get your product out there and in front of more eyes.

And finally, I just want to thank you all again. I really appreciate being here. And if you want to check out the dashboard, it's at acpdashboard.com. That's my final tip. Come up with a great URL. And that's also like really handy.

And thank you. Happy to take any questions if there's time.

Q&A

Thank you so much. And we do have time for a few questions. The first one is, how did you get involved with this type of work? Good question. So my background was actually in ecology. I studied birds, but I really enjoyed the spatial data work. And so my focus is on internet access. And there's a lot of internet data that's inherently spatial, geographic. And I focused on my job search on GIS and data work. That's how I got here.

And in addition to tidy, what other data wrangling or data cleaning packages do you use? Janitor, the clean names function is one of my favorites.

I think you answered where the dashboard is hosted. And with government data, what is your process for like data imputation for missing data and things like that? That is a tough one. Because in this data set, and I didn't touch on this, but they did redact data in some places where they have to kind of ensure that like privacy for folks where there's like very few or like limited population sizes. And that's, it's kind of tough. And it kind of depends on what we're doing. In some cases, we include like an asterisk or some like explanation that, you know, in cases, in some cases, there's going to be missing data. And we just, you know, are upfront about it, like, and explain like, they redact these data. And so these are not available. It doesn't mean there isn't something happening here, but we're not inferring what we don't know, really.

And although funding for the program ended this year, do you know if there's a push to refund the program in the future or now? I have heard some rumors. They are still trying to get it through somehow. While I think it would be great, I'm not entirely optimistic, just because of the way Congress is kind of going right now. Fair enough. Well, I think that's all the questions that we have for now. So I'll go ahead, give it up for Christine. Thank you.