Resources

Mary Rudis | How R and Posit are revolutionizing Stats Education in Community Colleges | Posit

Talk from rstudio::conf(2019) There is no doubt that Posit has had an impact on how introductory statistics is taught in colleges today. When we consider the sheer dominance that giants like Texas Instruments, IBM, and Pearson Publishing have had in academic curriculum development, it’s no small wonder that tools like R and Python have been able to gain a foothold. Projects like DataCamp, ModernDive.com, “Introductory Statistics with Randomization and Simulation” courtesy of openintro.org, Wickham’s “R for Data Science” and Peng’s “R Programming for Data Science” are great resources for the student who has already some fundamental math or statistical background and has become comfortable around computing and applications-driven computational exercises. But many of us know that Data Science cannot simply be relegated to the privileged few that stumble into it by virtue of circumstance. My passion, and the purpose of my talk, is to provide educators with a digestible guidebook that would be appropriate for introduction to statistical concepts in high school, college, and under-resourced schools looking for ways to increase diversity in STEM. Organized in small, adaptable activities designed to be the amuse-esprit enticing both the timid and the skeptical to the proverbial banquet table that is Posit, this exploration into the world of statistics education should be of interest to a wide audience. My hope is to increase data literacy in real world context – with primary emphasis on descriptive statistics and distributions. About Mary Rudis: After graduating from Lehigh University in Bethlehem, PA with a degree in Mathematics, Mary began as a high school mathematics and computer science teacher, developing technology infrastructure for a small, private high school in Pennsylvania. Throughout her career, she brings innovative approaches and enjoys the role of trailblazer. Mary's most recent accomplishments as math department chair included developing mathematics curriculum and coordinating engineering, bioengineering and data science degrees at Great Bay Community College in Portsmouth, NH. Mary's primary interests are learning and instruction, developing data science curricula for two-year colleges and 4-year liberal arts colleges, and working with area high school students in STEM at the University of NH Tech Camp each summer

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Okay, good afternoon and welcome to Catching the R Wave, how R and RStudio are revolutionizing statistics education at community colleges and beyond.

So a little bit about me, I've been teaching for 25 years plus at various hats that I've worn throughout those years, I've taught K through 12, I've taught at a four-year college, I've taught at two-year college, I've taught computer science, I've taught mathematics at all of the introductory undergraduate levels, so it's a lot of teaching.

I spent five years at Great Bay Community College in Portsmouth, New Hampshire, and while there I was the chair of the math department and in charge of the engineering program, and I also created one of the nation's first certificates in practical data science at a community college.

Also I'm married to Harbormaster if you know him on Twitter.

So I have created a GitHub that contains all of the materials that are relevant to this talk, it's also a work in progress, I'm actually trying to put together some materials that can turn into a one cohesive thing that a teacher can then just use as a handbook for any stats class and any textbook that they might choose to use.

So just a quick survey, raise your hand if you've either attended a community college or have taught at a community college. Wow, it's more than I was expecting. That's really cool.

Why community colleges matter

It was about half the room, so let's do just a real quick overview of just kind of a history of why I became passionate about community college education. Why is this such a big deal?

There are over a thousand community colleges in the United States. That's a map of them, made in R. Only 20 of them, maybe more, maybe a few more, we keep adding a couple every year, but it's under 20 that actually have a program right now, whether it's a certificate or an associate's degree or something, that's either called data science or data analytics. Compared to the over a thousand, we have a little bit of a ways to go.

As a comparison, we did not make this graph in R. This is copied off of a report that was put out by the government, but compared two-year to four-year colleges, we have about 36% of all students who were undergraduates in the year 2016, 36% were in two-year colleges, which is more than a third, and it's kind of surprising to hear that number. Most people think that it's not quite that much, because we think about all the undergraduate colleges out there, you're like, well, it can't be, but it is.

As far as the numbers of students who receive bachelor's degrees, I think this was 2015-16. Of all the bachelor's degrees awarded in that year, 49% of those students had started out at a community college, 49%, almost half.

State by state, it does differ. The highest percentage is Texas, where we are right now, at 75%. They're really pushing community college education. The lowest is Rhode Island, 24%.

Why did I become passionate about this? Because what happens is a student comes out of high school, and even adults who might want to go back for a credential, but especially high school students, let's focus on them for a second. They start out at a community college, and they get a list of programs that they could sign up for.

Very few of them, or not many of them, are really comfortable with math or science. You have few engineering people who just are determined to do engineering. That's kind of a unique crowd. Most students pick a program, and then they sort of stick with that. They're not going to diverge from that, hence the road.

What I would like to see is I would like to see more students exposed to data science in the community college setting, so that then when they transfer to a four-year college, they're sort of on a track to continue that kind of work, and maybe do research-driven work, and maybe choose a STEM major.

The practical data science certificate

I just wanted to show you the program sheet for the practical data science as it exists right now at Great Bay. Great Bay Community College is located in Portsmouth, New Hampshire. It's about an hour north of Boston. So I was able to use the fact that it's close enough to Boston as a sort of justification for creating the program.

All of those courses are liberal arts courses. They can transfer to any four-year college as a liberal arts course. I deliberately got them, you know, categorized as that.

So along with this work that I was doing to create the certificate, I was also thinking, you know, how can we make it easier for students to make the transition from Excel or from their Texas Instruments, whatever technology they're used to, how do we get them comfortable with R, especially in that the first course that I teach in the data science curriculum is called Elements of Data Science. And typically it takes three weeks out of a 15-week semester just to get students up and running in R, believe it or not.

So I'm seeing a lot of nods. Well, so what if we started introducing R in some of our other coursework, like even in a biology class or in a chemistry class or a physics class? And in my case, since I was in charge of the math department, I could point to the statistics class and tell all my statistics instructors, this is what we're going to do. And this is how we're going to do it.

And I'm with you, I'm supporting you, I'm going to teach you, we're going to work on this together. It's going to be, we're going to stumble a little bit. You know, the first couple of times we tried it, it was terrible. And especially here, because in 2014, that's when we started doing it. And the feedback we were getting from students was, oh man, you know, you're asking me to learn programming and statistics at the same time, and I can't do it. It's too hard.

But fast forward to 2018, 2019, and now I'm like, wow, how can we get R into the classroom? How can we get it into more classrooms? It's really cool. It's really working.

And just a shout out for RStudio Cloud here, RStudio Cloud has been invaluable for the Elements of Data Science class. I don't use it in the statistics class, and after the talk, if you would like to know why, I can explain.

But RStudio Cloud is used in our Elements of Data Science, and most of the program outcomes for that, or I mean the course outcomes, involve wrangling data, finding the data sets that will answer a question that you want to answer, and then coming up with that tidy data set at the end. And it takes a whole semester to do that, we'll just be honest.

Why change how we teach stats

Before we talk about how to incorporate R into the statistics classroom, let's talk a little bit about why. So here's an example that actually, this was a slide from a lecture that Pearson Publishing produces for instructors of just intro stats. So this is just a page out of a lecture for intro stats.

And you know, yes, you're going to learn something if you answer this question correctly, you've learned something. But have you learned what we want you to learn? I mean, what have you learned? You've learned how to use a formula, how to convert from an X to a Z value, so I know how to use a formula. And you've learned how to use the Z table. That's it.

There's no kinds of questioning here that involves any kind of deep learning, or any kind of like analysis, why are we getting what we're getting? You know, just nothing like that.

So the math department at Great Bay Community College, namely myself, and just so you know, I'm a collaborative leader. I was not just somebody who said, we're going to do this. We all met together, I met with all of my adjuncts, and I said, okay, here's what we're faced with, this is what I'm thinking about, what do you think? And just getting everybody on board with it is a big, big, big deal. Getting your stakeholders and the people that you're going to be asking to do all of the work, it's important to get them on board and to get them enthusiastic about it, or you're not going to go very far.

Getting your stakeholders and the people that you're going to be asking to do all of the work, it's important to get them on board and to get them enthusiastic about it, or you're not going to go very far.

But anyway, we decided to make technology, period, a big part of the curriculum throughout the math department. So one of my favorite courses to teach is finite math. You do input-output analysis. There's these things that you can actually do in R, in a finite math class, it doesn't even have to be statistics. And we were doing matrices in R. I mean, you know, it makes sense.

Whenever a student comes up to me and says, oh, it's too hard, I just point to this. I'm like, and that's easy? Tell me what about this is easy? And what about it is, okay, I've got this tiny little thing. If I happen to be sight-impaired, there's no way that I'm going to be able to get to read anything off of that tiny little screen on that Casio. If you've never seen one of those Casio calculators, I should have put a picture up for you.

Whenever I show instructors, when I am in a room with a bunch of community college instructors and I show them this, and usually, I'm going to be honest with you, I usually go to that RDIO. I don't know, it's a snippets thing. So I usually just go there, and I copy and paste this in, boom, here comes the output. And I'm like, you can do this. You can do this.

And usually, the instructors are, wow, can I do that on my phone? And I say, well, yeah, because you have a browser on your phone. If it's a smartphone, you have a browser. So you bring up a window that actually has that executable code, a window in it. Just copy and paste, and yeah, of course. So they're amazed. And they begin to buy into this idea.

And I'm also going to admit, this is an actual image of a spreadsheet template that I used in my intro stats class. Yeah. Apologies to anybody who had me at that time, because yeah, so I mean, you know, it was designed so that you could just put in the values that you needed to, and out would come all of the value, right? So you didn't have to use a table.

And I'm sorry, but after that talk this morning, you heard the keynote, right? This is programming, OK? I just want to put that out there. This is programming, all right?

The same output in R, with only a few lines of code, basically is just, and you'll see this in my GitHub, the code for this is in the GitHub. It's just amazing when I show the contrast with educators about how much more useful you can get, or more use you can get with this kind of output.

Shiny apps and StatPrep

And then, so here I was, at that point, you know, when we had the cloud, and I was using just these little snippets of code, I was all excited, and now all of a sudden Shiny is here, and now I have to admit, I haven't played with Shiny yet, but I'm getting the point across from many people that I need to. So everyone's telling me, yeah, yeah, you really need to.

But it's actually going to be helping in this whole effort to try to get community college instructors on the board with us, because Cal Poly, Penn State, and I'm sure there's more, they have published Shiny apps that are specifically for the IntroStats class.

And then, in addition to that, this is a, StatPrep is an effort that began as a collaboration between the Mathematical Association of America, and then this organization called AmStat, which is American, no, sorry, Amatic, which is American Mathematical Association of Two-Year Colleges, Amatic. And then also NSF is supporting the funding for this, and ASA as well is involved.

And what are they doing? They are, they have these traveling workshops, or workshops that are at various locations around the country. Anyone can apply if you are an undergraduate educator. You don't have to be full-time, you can be part-time, you can just apply. And look to see if there's a workshop that's somewhat close to you, but, you know, definitely, you can actually get a whole group of people to go, like, that's what they want. They want, they want, like, colleges to send, like, a whole team of people, of educators.

And I sort of put my own text over there so you could read it. The next webinar is coming up. It's this Friday, January 25th, and it's about multivariable data, introducing students to multivariable data, students who've never seen it before. So, you know, it's a really good effort that Katie is doing, and, you know, if you go under the about and the people, like, it's all on my GitHub as far as who's involved and how you can get involved to help as well.

One of the shiny apps that the stat prep team uses is, this is just an example for you, where they use the shiny apps for their training. So they're actually showing instructors what a function is, what parameters are. You know, what are these things inside of the parentheses? And, like, seriously, if you've ever used Excel, like, there's a natural connection between what goes on with functions in Excel and what goes on with functions in R. It's not that difficult to make that transition.

Also, the American Statistical Association has provided materials as well. There's data. If you're looking for data, go there.

A call to action

Here's what we need. We need your help. Community colleges need your help. If you've ever been in a community college, you know. We can make a difference, and we can make data literacy a reality in our culture. We want to make smarter, better educators. Go out there to your local community college, make a relationship with the instructors there. Start a relationship. Start a conversation. Find out what their needs are.

We can make a difference, and we can make data literacy a reality in our culture.

Whole lot of educating going on. We want to make all these DOTS programs with something relating to data science or some kind of intro. And if you're a developer, make apps that are available for any device, any device.

Bottom line, is this the R experience we want or this?

Thank you for having me. I want to thank all of these people that reached out to me, and special thanks to my husband who seriously really helped me prepare this talk. There's my GitHub link again if you want to take a picture of it. And here's where you can find me. So thank you.