Take it in Bits: Using R to Make Eviction Data Accessible to the Legal Aid Community

Transcript#

This transcript was generated automatically and may contain errors.

My name is Logan Pratico. I'm a data engineer at the Legal Services Corporation, where I work on a relatively new project called the Civil Court Data Initiative. So just to sort of start out and center this talk a little bit on the issues that I'm going to be discussing for the next 15 to 20 minutes, let's just start with, you know, what is the civil justice system?

Civil court is somewhat different from criminal court, which I'm not sure, but I would imagine a lot of you all in this room are probably much more familiar with. That's going to be the things where, you know, you commit a crime like, you know, robbery or something like that. You're often arrested by a police officer. You go to court. You plead guilty or not guilty. You're handed a verdict and then you go to jail. Civil court is a different side of that, right? So civil court is going to be the things that you're going to appear in court for, but where you didn't necessarily commit a crime. So this is going to be things like eviction, which is going to be the focus of what I'm talking about today, but also things like debt collection, guardianship, you know, cases like that, right?

These areas are slightly different, but in a lot of ways they can be equally as harrowing. So for eviction, for example, what happens in these cases can uproot a fundamental human right for a lot of individuals, their right to shelter, their right to a home. It can leave folks with forcing them to leave their sense of community, their sense of belonging. A lot of their possessions are often left behind into sort of a world of uncertainty where you don't necessarily know where you're going to be sleeping next or where you're going to be just at home next.

We're, we have this really complex pipeline going from court website all the way to standardized data sets. It's getting 95% of the way there. But then we just weren't going the last 5%.

Switching to RMarkdown weekly reports

And that's when we sort of revised. And this was a little bit before the Quarto phase. So I know that this is a little bit crazy. So this is a little bit out of date. But we used RMarkdown to essentially generate weekly reports that took the same data from the spreadsheets that I was showing you on the last page and just put it into a really easily readable, easily digestible format that answered surface-level questions, in our eyes, sort of the lowest common denominator of questions that you could answer across all of our court data. And we provided that to our end users, to the legal aid organizations.

You can see here are just a couple of examples. For Vermont, you can see, you know, it has a little bit, I'm not sure if you can read that text, but essentially what it says up there at the top is just a brief introduction. We collect data across 14 counties in Vermont. And this graph generated in ggplot just sort of shows, you know, that's what the landscape kind of looks like across the board. That's what eviction filings look like over the past year. Similarly, you have an outline for Connecticut there that basically just shows, it's a heat map that just shows eviction filing hotspots. So areas in darker red are going to be areas that have the most eviction filings or more eviction filings than the areas in a lighter color red.

So yeah, so I kind of alluded to this before, but the questions that we were answering were pretty basic or trying to answer with these memos that were only, you know, two pages in length and didn't really have a lot of text. It was just, you know, what do eviction filing trends look like over the past year in a particular state? Or, you know, what counties have the most eviction filings? As well, you know, how does representation differ between landlords and tenants in these cases? The answer might surprise you. It's wide. There is a huge disparity between representation for tenants and landlords.

And really just, you know, in answering the, or in creating these memos, I just want to hammer the point home that we weren't trying to answer every question that they might have. Instead, which is something that we were doing with our previous data sets. Instead, we were just providing a taste of what we had with the data and welcoming them to come back to us, the organizations to come back to us and ask us questions and say things like, I had no, or I had some idea that my county had the highest rate of eviction filings in the entire state, but this confirms it. And I'd really like to, you know, know who the top filers are, who the top companies are that are evicting folks in my state, or what is the, what are the judgments? What are the outcomes of these cases? And what does that look like?

Outcomes and impact

And so it took us from this data pipeline to a slightly revised version where we still had the data source and the data lake, but rather than going from standardized data sets directly to the end user, essentially what we were doing was just going by way of, you know, data memos for RMarkdown and passing that along. Again, this diagram is a little bit oversimplistic in a way, because I think that it implies that we stopped sending data or spreadsheets, which isn't true. We had data memos, but we really used those as just sort of a jumping off point, as a way to get the ball rolling, to get the conversation going around sharing data.

And what ended up happening is, like I said, folks came back to us and asked us, you know, can I have this data? And then the spreadsheets that we were sending them were things that were actually important, things that they actually wanted, which is obviously always key when you're sending off data. We found that, you know, we went from sending off data that nobody had asked for and nobody was looking at, to folks actually, like, not being able to send data off fast enough. We had to grow our team because we couldn't respond to all of the specific data requests that we were getting, which is always a great thing to do.

We went from sending off data that nobody had asked for and nobody was looking at, to folks actually, like, not being able to send data off fast enough.

And so some of the outcomes from this work, there were a couple of different things. Principally, we had improved data accessibility and understanding. So, again, folks were no longer having to, you know, work in Excel to find an answer to their specific question. They could just have that analysis done for them. This also improved access via just sharing the information directly.

Beyond that, we had time efficiency. And this is both time efficiency on our end, because we weren't coming up with custom data sets to send off in anticipation of a specific legal aid organization's needs. Instead, we were zooming out to the state level and automatically generating these reports. You know, we put them on, for those of you familiar with Amazon Web Services, we put them on an EC2 instance in Amazon Web Services and kind of just let it run in the cloud. And it does that every day, or excuse me, every week. So, every Monday, I log on to my computer and just sort of double check the output of the Word documents and make sure that, you know, everything looks good. We also have more efficient tests that are running programmatically. But as a final check, that's really what I do every Monday is just over my cup of coffee, I look at each individual document and I make sure that it's looking good. And that's the extent of it. And then we can just sort of let them live there and share them and, you know, revisit things on a quarterly basis, rather than having to regenerate new documents every week or for every individual occasion.

On the part of the legal aid organizations, again, not having to do the analysis with our data, that's a huge time saver. And then the last thing is just increased conversation, which is really what our end goal is, is to be able to talk to folks and, you know, get their insight of how the data can best help them. And that's definitely a downstream of this.

So, as a recap, aggregating, cleaning, standardizing this data was really sort of the first step in the process. But we thought that it was the whole process. We thought that just if we were going to be able to collect all of this disaggregate information and, you know, like people would naturally be interested in it. And that was something that we learned just isn't the case. Instead, we had to take a less is more approach to answering more just superficial questions initially. But seeing that, you know, that would lead to increased conversation and more data sharing down the road. And in the end, this really improved engagement and understanding of the data. So, thank you very much.