Resources

Carrie Wright - Supporting Social Good Through Community-Based Data Science Education

In this data-centric era, the demand for responsible data science practitioners is more crucial than ever. However many data science education programs don’t adequately emphasize data ethics. To address this need, my colleagues, Ava Hoffman, Michael Rosenblum, and I have developed a course at Johns Hopkins, offering students hands-on experiences collaborating with community-based organizations on diverse data science projects. We've partnered with organizations championing various causes, including youth leadership, voting rights, transportation advocacy, and community tool banks. We've gained valuable insights about hands-on data ethics education and demonstrated that even data science education itself can support social good. Talk by Carrie Wright Slides: https://bit.ly/posit_community_data_science

Oct 31, 2024
20 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, everyone. I'm a data science educator at Fred Hutchinson Cancer Center in Johns Hopkins, and today I'm going to talk about supporting social good through community-based data science education. So I think this crowd can agree that data science is very empowering and we're all excited about that. But something that we may not think about as often is how society wants us to do responsible data science. An example of this is the executive order on safe AI from October 3rd, 2023, which said that AI holds extraordinary potential for promise and for peril.

And so for me, data science is really a multifaceted thing and ethics is one of the core components of that. But despite that, studies show that we don't actually support learners that much when it comes to ethics. So this study was looking at undergraduate programs that support data science and very few programs have any, if at all, training on ethics.

And on top of that, there's very little hands-on data ethics training.

So this creates a chasm in terms of where learners start and where we actually need them to go. And I believe that training is really how we can get them there. So one of the great powers of data science is to tell people compelling stories.

And community organizations have great stories to tell, but they don't necessarily have the data science skills or personnel to support them to do that in an optimal way.

So for example, a program that supports youth can say simple statements about how they do that, but backing it up with quantitative and qualitative data, like giving $200k to youth or hard smiles saved my life, that tells a better story.

And data stories are empowering to community-based organizations because they can help them justify additional funding, encourage more participants to be a part of the organization, advocate for the change that they're trying to support, and they can help them to identify internal improvements.

So CBOs similarly start with this chasm where they need to be. And data storytelling can go wrong. We know that we can distort data and tell things in a way that we maybe shouldn't. So here's an example of a comparison between two towns looking at the percentage of population with access to state drinking water. And this truncated axis on the y-axis makes it seem like there's a much bigger change or difference between these two towns.

The Baltimore Community Data Science course

So we were left with maybe wanting to try an experiment. How could we train students in more ethical conscious practice and also empower community organizations? How could we get community organizations with data science students to work together to tell data-empowered stories?

So we embarked on this experiment, myself and Ava Hoffman, who's in the audience, and another colleague, Michael Rosenblum, to create a course called the Baltimore Community Data Science Course, or lovingly BCDS, at the Johns Hopkins Bloomberg School of Public Health. And we did this in partnership with SOURCE, which is the Community Engagement and Service Learning Center at Hopkins, which is focused on doing engagement that actually strives for social change, authentic relationships, and redistributing power.

So Michael and I got specialized training that we continue to get from SOURCE to really learn how to engage with the community in a mindful way. So the idea was we wanted to do good while teaching the students to do good. So work with a community-based organization to help them with a data science project, but also have the students learn to perform data science in a more intentional and reflective way.

So our philosophy was that the community would come first, and data science would come second. And ultimately, we would empower those organizations to tell their own data science story.

So our philosophy was that the community would come first, and data science would come second. And ultimately, we would empower those organizations to tell their own data science story.

And we wanted this to be a reciprocal exchange where students respected and recognized the CBO's specialized knowledge and experience, and to challenge our own assumptions when working with them. There were a lot of benefits to the students, the idea would be that they would contribute to some meaningful project, they'd have a data science product that they could put on their CV, they'd have hands-on experience in a different context, and they'd get training not only in data science, but also in data ethics.

So this was not your typical data science class in many ways. So James really nicely described the threshold concept, and indeed this is one of those times where people had to experience this. They needed to learn some lessons. It would cause some discomfort, and it would create a lasting shift in their understanding that they could not unlearn. So we had them at the edge of their comfort zone, and they did get uncomfortable.

The structure was typically one instructor with three to four students in a group working with one community-based organization. So we had a lot of oversight from the instructor to make sure that things went appropriately. And we had a variety of different kinds of students. We had biomedical engineering students, applied math and statistics, public health, and biostatistics students. And the students were vetted. We had to find out what their data science expertise were before. And of course, people had different types of expertise, so that also posed a challenge.

And again, this was not your typical data science class. So we had them do a lot of writing and a lot of discussion, which was uncomfortable for some of the students.

Course structure and preparation

So this is an actual slide from one of our slide decks, where on the first day, we tell them that this will feel very different. And the class structure looks like this. It's a 16-week course, and we spend a lot of time preparing the students to work with the community before we let them do so. Then we have them work on the project and have them think about sustainability.

So the first week, we talk about the historical and current context of Baltimore and Johns Hopkins and the community. It's important that the students learn about the context of who they're working with. Then we have them learn about critical service learning, the idea that we're not doing typical community service. This is a different type where we're really reflective on our potential impact long-term. And this involves critical reflection. So they need to learn this process of considering our perspective and assumptions and where it comes from. And sustainable design. We want to create change that lasts. We don't want to create a product for a CBO, leave them, and they can't use it.

And also think about things like data privacy, security, transparency, so practices that would support the students to do this in a safe manner.

And then, and only then, are the students allowed to work on the projects. So this is really about making a commitment because we have to produce something that's useful for the community organizations. And so we spend a lot of time talking to the organizations ahead of time to make sure that this is actually feasible. And we've done this for two years. We've worked with a variety of organizations. The first year, we worked with a program called HeartSmiles, which does leadership opportunities for underserved youth. The No Boundaries Coalition, which was working on a narrative on gun violence. And the Baltimore Transit Equity Coalition, which works for equitable access to public transit. In the second year, we again worked with HeartSmiles. We also worked with the Tool Bank in Baltimore, which provides access for tools for community projects. And the League of Women Voters, which works to enhance voter participation.

So through this process, students got to learn not only hard skills, but also soft skills. They worked on teamwork, organization, problem solving. They worked on a variety of different hard skills like text extraction, sentiment analysis, data visualization, shiny boards, dashboards, you name it.

HeartSmiles example

So as an example, with the HeartSmiles program, the woman that runs the program gets a lot of feedback from the HeartBeats, which are the youths in the program. And she has a lot of testimonials in those text messages and was describing this to me, but didn't necessarily realize that this was a potential source for data. So we discussed that she could take screenshots of them, and we could extract the text and do sentiment analysis.

And through that process, we created a testimonial database, which is very useful for them. And so now they can easily find the information that they're looking for. And so now they can easily find the testimonials that they might want to share. And they can have numbers like this to show how many positive testimonials they have. And we didn't tell them to add this to their website. This is something they did on their own, which was really exciting.

Five major lessons

So through this process, we learned five major lessons. The first one is that context really matters, especially in a place like Baltimore. There's a long history of structural racism, and that impacts the way things work today. And our students need to know that. Many of our students, it's the first time that they're learning about it. They may be from outside the United States, or they may be from some part of the U.S. that has a different experience. And this context leads them to more appropriately work with our CBOs. So never assume that your students don't necessarily need to learn information.

That's the takeaway here. Number two is that people come first. So data science products need to meet people where they are, and they need to meet their actual need. And it's about what's useful to the CBO. It's not about what's cool for the students to work on, which was also a source of tension. So students often wanted to create something more exciting that would involve more infrastructure that the CBOs would need to learn. So for example, developing an LLM is not necessarily the first thing that's a priority for the CBOs.

So this requires flexibility, and we have to be flexible and ready for changes.

Next is critical reflection. So we taught the students to reconsider, question, challenge why things are the way that they are. Why do we need these CBOs to even be working on the problems that they're working on? And consider how our perspective came to be what it is, how CBO's perspective is what it is. And ultimately recognizing and respecting that we're never fully going to understand one another, but striving to is beneficial.

So because we're working with STEM students, and we ourselves are also from STEM, we like things to be kind of concrete. And we were working with some people that had trained us about critical reflection in a much more mushy kind of way. And so we decided to make a framework that would be more comfortable for the STEM students. And it's here if you're interested in using it. The idea is that we would first have the students think about what was resonating with them. What do they feel curious about? Second step is to think about the context. That could be from time, space. Consider different viewpoints. How might different people think about this? Assess how changes may have happened over time. Challenge assumptions. And then finally evaluate both yourself and how others are experiencing this.

And as an example of one thing that we did is we have them reflect on their reflections after we learn this process with them. And so really learning critical reflection is actually a process. And they find that activity of reflecting on how they initially did such a thing very meaningful.

Number four is that challenges will happen. Mistakes will happen. Anytime you're working on anything particularly sensitive, mistakes will happen. And it's about mitigating them and preventing them. And of course, taking responsibility and teaching those lessons to students is really valuable. So one of our examples is we are working with these underserved youth and teaching them about data science. And we had an experience where some jargon came up that could be unexpectedly offensive, which we did not realize. But luckily, we were working with the CBO, and they were checking all of our materials ahead of time. And we learned that the abbreviation for POE, which is one of the platforms for AI, could be an offensive term.

Next is sustainability and impact matters. So much like a trail, we want to leave things better than how we found it. So we want to enable people to best continue without us as possible. So as an example, we helped make a mock-up website and said, you know, this is something you could do. Now we have all this data that you could work with. And then the organization took it and did a really incredible job. And so we want to leave space for the CBO to use that data themselves.

Resources and essential requirements

So if you're excited about this and you're interested in this idea, we have a lot of resources that can help you. But I'd also like to leave you with some essential things that are really required. So first, we need to take a lot of time to work with the CBOs to identify their data needs and figure out where their data is, because sometimes they don't necessarily know. And it takes time getting that data. Sometimes there's data privacy or security issues, and you need to be ready to adapt to adjustments.

Next, we always have to work with CBOs consciously and remember what they're doing comes first. They have a mission, and we don't want to interfere. We want to support that. So their schedule might change based on what they need to do. And we need to be flexible with that and honor that. We also need to honor their perspective and sometimes gently challenge when we recognize that there's a problem, like for data privacy, for example. And we need to, again, remind students who did this several times about being flexible.

It's also important to have instructor oversight. So we had one person for each of our teams, and I highly recommend that. And most importantly, instructors need specialized training like from source. It needs to be continued because we need to be reminded ourselves. And we need those additional experts at CBOs to help check our work.

The most exciting things were our reflections. The CBOs recently that I've been talking to who might work with us have said, I can't believe I'm so excited about a conversation about data. And the students have said, this class has broadened my academic perspective in spades. I've done more reflection in this class than I've probably done in my entire academic career to date, which I find both wonderful and horrifying.

I've done more reflection in this class than I've probably done in my entire academic career to date, which I find both wonderful and horrifying.

So my call to action here is that we need more responsible data scientists. And so we can all do more. We can consider how our data science teaching can inform students more broadly. We can use examples and data sets that inform our students. We can consider using community data to help with our teaching. We can consider teaching about reflection for our students. And we can always reflect more on our own work.

Q&A

So we definitely have questions from the virtual audience. So the first one, this seems like work you care a lot about and also a lot of work for you organizers. So how do you protect yourself from overextension and burnout? Oh my gosh, starting with the hardest question. Absolutely. We really definitely care about this work. It's probably one of the most meaningful things I've ever worked on. And so in that respect, it gives back to me. I feel energized by it, but I do have to be careful and think like, oh, because sometimes I'm like, I can make this so much better. I could spend six hours on this. That's fine. And I have to remind myself, no, I let the students do their thing and the CBOs, I have to have a time limit and that's as good as we get.

Awesome. And it does look like a lot of content for 16 weeks. Do you have any tips on how to decide what you include and what you don't include in that time period? Yeah, that's a great question too. It is very challenging. Certainly we prioritize the reflection and the learning exactly how we want to engage with the community. The data science components, we have a big resource on our page, which you can see if you go to our website. So people can access resources specific to their project. So not everyone is making a shiny app, for example. And if they are, they can find the resources for that.

Right. A bit about like the operational aspect about students entering the class. So how do you assess students' data science skills before the class? Yeah. So we have to talk to each one of them individually. Many times it's not that hard though, because they can send their transcript or their CV and we can see the type of work that they've been doing. And it's easy to mark them off as, yeah, you're good to go. But there are a few students where you have to discuss with them and find out a little bit deeper.

One more question here. So you mentioned talking to CBOs about delivering something useful to them. That's an important aspect of what you're doing. So these students, I'm assuming that they don't always have too much industry experience when it comes to data science. So how do you take that concept of delivering something that's useful to the stakeholder and really get that message across? Yeah. So Ava has industry experience and we teach them about the minimum viable product. That's one of our early lessons. And explain that, you know, we focus on that. And then once we have that, we can maybe add things later, but, you know, we have to just focus on first getting that minimum thing done.

Yeah. It's really, it really resonates with me. It's something I'm always learning and, yeah, continue to learn throughout the career. So, yeah. Thank you so much. Great talk.