Resources

Mine Çetinkaya-Rundel | Feedback at scale | RStudio

As enrollments in statistics and data science courses grow and as these courses become more computational, educators are faced with an interesting challenge -- providing timely and meaningful feedback, particularly with online delivery of courses. The simplest solution is using assignments that are easier to auto-grade, e.g. multiple-choice questions, simplistic coding exercises, but it is impossible to assess mastery of the entire data science cycle using only these types of exercises. In this talk I will discuss writing effective learnr exercises, providing useful and motivating feedback with gradethis, distributing them at scale online and as an R package, and collecting student data for formative assessment with learnrhash. About Mine: Mine Çetinkaya-Rundel is Professional Educator and Data Scientist at RStudio as well as Senior Lecturer in the School of Mathematics at University of Edinburgh (on leave from Department of Statistical Science at Duke University). Mine’s work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education as well as pedagogical approaches for enhancing retention of women and under-represented minorities in STEM. Mine works on integrating computation into the undergraduate statistics curriculum, using reproducible research methodologies and analysis of real and complex datasets. She also organizes ASA DataFest and works on the OpenIntro project. She is also the creator and maintainer of datasciencebox.org and she teaches the popular Statistics with R MOOC on Coursera

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hello, I'm Mina Çetinkaya-Rundel. I'm a data scientist and professional educator at RStudio, as well as faculty at the University of Edinburgh and Duke University. If you're also an educator teaching data science, you might find yourself in the following scenario, particularly this year.

Your department chair says, your data science course is going spectacularly. We'd love to expand it, and you say a big, yay. But then they say, oh, and you're going to need to do it all online, and you say, yay? And they follow up with, and we can't provide any additional support, you get the same number of TAs as before. And you start thinking to yourself, am I supposed to say yay?

As enrolments in statistics and data science courses grow, and as these courses become more and more computational, educators are faced with an interesting challenge, providing timely and meaningful feedback, particularly with online delivery of courses. I'm sure we all agree that feedback should be meaningful, but traditionally, meaningful, helpful, constructive feedback requires human effort and can take a significant amount of time for large courses, especially if they're under-resourced, which, let's face it, they tend to be. Timeliness of feedback is just as important as meaningfulness. The longer time passes between when a student turns in an assignment and when they get feedback on it, the lower the utility of that feedback.

For certain assignments, like an open-ended project, this trade-off is absolutely reasonable, because it's really difficult, if not impossible, to replace human feedback with something else in such an assignment. But for others, there are alternatives.

Introducing learnr tutorials

One such option is a learnr tutorial, and if you've ever used learnr before, the tutorial probably looked a little bit different than this, and that is just to say that with a little bit of CSS and theming, you can make your learnr tutorials look like whatever you like. I like starting my learnr tutorials with a bit of narrative, usually a little bit longer than this about the data and the analysis we're going to work through and introduce the students to the packages, and I like using the progressive reveal option, so they have to deliberately click through the material. I also like providing some ready-to-run code at the beginning, so they don't need to start working on exercises right away, but they do need to start interacting with the document.

Here they're working on a dataset on Airbnb listings in Edinburgh, and the first thing we're asking them to do is to take a look at the variable names. It is not necessarily meaningful interactivity, but maybe it introduces them a little bit to the structure of a learnr tutorial and how to interact with it, and potentially also helps build a little bit of anticipation around what the result is going to look like when they hit run code, as opposed to providing them with a static document where the code output is already provided. But really where learnr tutorial shines are these coding exercises, where the students can try out some code, submit their answer, and you can provide custom feedback to them.

So the first exercise here says how many Airbnb listings are included in this dataset. Let's imagine the student decided to look at the number of columns as opposed to the number of rows. The feedback that they're going to get sounds just like the type of thing I would say to them if I was working through the exercises with them and trying to nudge them in the right direction. It says, did you calculate the number of columns instead of number of rows? So you can write these very precise feedback based on your anticipation about the types of mistakes students can make.

And obviously, it's not possible to anticipate all possible mistakes. So what if a student chooses to try out some code that I didn't anticipate they might try out? For that, I like providing some canned feedback that's still constructive and still nudges them in the right direction. It says something like, each observation is represented in one row. Can you remember which function we used to calculate the number of rows? And finally, let's get this question correct. I'm going to use the nrow function to do that. Let's run our code and submit our answer. You'll see that we've got the green banner, which is great, but it doesn't just say correct, well done. It says a little bit more. So I like using the space for the message for the correct answers to give them a little bit more information about what's to come next.

So I also like following on my coding exercises with some multiple choice exercises, something that allows the students to take the code output that they saw and put it in the context of the data set that they're working with, which is so important for teaching statistics and data science. So what does each row in the eddybnb data set represent? That's an individual Airbnb listing. And again, we can use the feedback to give them a little bit more information or something to think about.

If you would like to see the code for this learnr tutorial, both in terms of creating the exercises and also the theme and the look, you can test drive it on RStudio Cloud or you can view the code on GitHub. And I'll be providing links to all of this at the end of the talk. For now, I'd like to talk a little bit about writing effective exercises.

Writing effective exercises

So let's start with an exercise in mind. The three most expensive neighborhoods in terms of mean nightly price are Newtown, Old Town, and West End. Calculate the median number of reviews in these neighborhoods and arrange them in descending order. One option would be to provide no scaffolding whatsoever, so an entirely empty canvas for students to work with. And the thing about this is we've really not given them any direction in terms of how to get started. And instead, what we've given them is this giant button that says solution. And it's so tempting to simply go, all right, let me take a look at the solution, copy it to my clipboard, paste it, run my code, and submit my answer. And lovely job, I got it right. But did I learn anything from this experience? Not really.

This is not to say your students are all going to peek into the solution code. But I think one thing to keep in mind about providing no scaffolding whatsoever is that I think there are better venues for giving such exercises. So for assignments where the students already have to work in the RStudio IDE, either in an R Markdown document or in an R script, I think is a better venue for a more open-ended question where you provide no scaffolding whatsoever.

Here we've given them no scaffolding, and we've also done something else where we've basically are doing this strict code-checking business. So let's say that your student, with no scaffolding provided to them, actually decided to move the filter over here. They group by and summarize first, and then they filter. And if they run their code, they're going to get the exact same response, but their code is not marked as correct. Because with the grade this code function, what is happening under the hood is that the gradethis package is checking the code your student wrote and the solution code you provided and looks for a one-to-one match. So it says that I expected you to call summarize where you called filter, give it another try. Sure, you might say that no, this is exactly how I want my students to do things, but I think that's not really entirely fair, especially because they actually got the right answer. So the strict code-checking with no scaffolding whatsoever can lead to situations where students get the right answer but are not necessarily marked correct, if you will.

So the strict code-checking with no scaffolding whatsoever can lead to situations where students get the right answer but are not necessarily marked correct, if you will.

Another option would be to provide a little bit of scaffolding, but here we're still doing the strict check. So here we've given them a really nice structure for how to write their code, and obviously we've made the exercise a lot easier for them, but that might be okay if this is one of the first few times they're seeing these multi-line dplyr chains. So these learnr tutorials are a great place to provide additional opportunities for kind of drill-type exercises students can work through at the beginning when they are first introduced to a new topic. And you can also provide hints for them. So with these hints, let's take a look at that. I like writing them in a progressive way where they can scroll through and get a chance to potentially think about what their answer should look like at each stage. And sure, they could at the end see the solution if you want them to, but at least it makes them think along the way as well.

So let's go ahead and run this code and submit the answer. I think in this structure there is no worry that they would be putting their filter function later on because we had already provided the scaffolding. However, since we're doing a strict check, what if the student in the last line forgot to actually put things in descending order? So let's go ahead and run that code, and obviously this is not the correct answer, but the feedback that we get tends to be not very informative when we use the grade this code function here. So it's basically saying, I expected you to call desk for descending where you wrote median rating. And in fact, here on line five, that statement is correct. But does it actually sound like the type of thing you would say to your student if you are standing right behind them and trying to nudge them in the right direction? Not really. It doesn't seem like very human-friendly feedback.

Checking results instead of code

So my recommendation overall would be to provide some scaffolding and also provide some flexibility. This is another approach to scaffolding where perhaps if this tutorial is coming later on, if they've already done a bunch with dplyr and you don't want to give them that whole structure, you might at least give them a little bit of scaffolding to say, hey, I want you to use a dplyr pipeline for answering this question. And let's go ahead and give that a try. So here I am going to forget to put things in descending order and run my code. And let's take a look at the feedback that I get. I actually am getting human-friendly feedback here. You've successfully calculated the median number of ratings, but did you forget to arrange them in descending order?

How did I do this? Here, instead of checking their code, we're actually checking their results. And one nice thing about doing that is, so let's go back and put the right answer here. What if they made that one change where they were doing group by and summarize and then filter to have them run the code and submit their answer? And in fact, this time it is marked as correct because again, we're checking the results and not the code.

So what does this look like under the hood? It's a bit more to write the code checking if you're using the grade this function, because you actually need to think about what are the various answers they might get and actually write some passive or fail if type statements. So here I have one passive statement that says if their result is this table in this particular order, we're going to say, yep, you did the right thing. But if they have put them in ascending order, then we're going to give them a specific message for that. Or if they have not ordered them at all, in which case the order will be the alphabetical order of the neighborhood, we're going to give them another message. And you can imagine stringing along as many of these as you think are relevant for the exercises you develop. And then probably you want to write a catch-all statement that's like, not quite, take a peek at the hints. If the students are making mistakes that you can't even anticipate when you write the tests, you can kind of leave the hints to take care of that.

And if you use these tutorials over multiple semesters or multiple workshops, multiple learning or teaching engagements, you're probably going to learn about the various mistakes students make based on their questions. And then you might iteratively build these tests to capture more mistakes that they might make.

Collecting student data

So we've talked about writing these exercises. What about collecting data? learnr and gradethis alone don't have some facility to allow you to collect data from students. Specifically, the learnr package makes this possible, but it doesn't itself offer the functionality to do so. So for that, I use a different package called LearnRHash, which hashes the student results and then allows you as the instructor to decode them. So a setting where I might use this is probably similar to this, where I give them a coding exercise first. It says make a histogram of the prices. So let's go ahead and actually do that. The data set is adbnb. The variable is price. We're going to have them make a histogram.

And here are the labels x, y, and title. They go ahead and run their code and they get some warnings and submit their answer. It says you got it right. And note that there are a couple of warnings. We'll get to those in a little bit because here our focus is a little bit different. So we follow this on with some multiple choice questions. Which of the following describes the shape of the distribution? It's right skewed and unimodal. And let's go ahead and pick another answer for the second question. This one happens to be incorrect.

Okay, so when it comes time to submitting their answers with the LearnRHash package, students can generate a hash, which is going to look very cryptic to them, which is exactly what it's meant to do. They can select it and copy it. And then I like embedding directly into the learnr tutorial something like a Google form or another form your institution might use, where the students can submit their information. If you're using something that directly your university uses, for example, this form could ask them to authenticate first. If you want to make sure that who is answering the questions is actually who you think it is. And you can ask them to paste their hash that they generated here. And I also like using this form to collect some free form information from the students as well. So a question phrased as, write about one or two questions you didn't get right initially, but were able to solve after a few tries. What was difficult about them? What did you ultimately learn? And let the students write some free form answers and submit their results.

On your end, what you're going to see is this hash that the students have submitted. And you'll be able to decode that and be able to take a look at their multiple choice questions and the answers they gave for those. And the exercises, that's the coding exercises, and the answers they gave for those as well. For summative feedback for grading, I usually tend to mark the multiple choice questions, but you could choose to do both.

These data are collected in a Google spreadsheet, so you can view the Google spreadsheet itself. I've pre-populated it with some mock student data. And what you can do is using the Google Sheets 4 package, get that data and calculate the students' scores by matching their answers to a key. And then also take a look at the free form answers. I like taking a look at bigrams, for example, to see what are some concepts that were mentioned many times as being difficult for the students, so you might tailor your next lesson accordingly.

Distributing tutorials at scale

Lastly, let's talk about distributing these at scale. One option, obviously, is to deploy these to something like ShinyAppsIO or another Shiny server from the RStudio IDE with push button deploy, just clicking on publish. It works out of the box. It's great. But something you might want to be careful about is that if you have a large class and a strict deadline, many students might try to access the tutorial at the same time. So you've got to make sure that the parameters for your deployment are set up properly in order to be able to handle that. You might want to give it more memory. You might want to create more instances. And at some point, if you have a really large course and you expect students to be doing things at the same time, you may need to think about using one of the paid tiers of ShinyAppsIO, and that may or may not be feasible for your setting.

So another option is to distribute them within a package. Here I'm going to give an example from the dsbox package, which accompanies the data science course in a box curriculum. And yes, I am saying that you can make an R package, and you might be new to doing that, so you can use this as a skeleton for doing so. But the nice thing is really what you want to focus on is that the tutorials go in the inst folder in a subfolder called tutorials of the package. Inside that is a single folder for each one of the tutorials. This package happens to have eight of them, and inside that is the R markdown file where you developed your tutorial and any accompanying files that go along with it. Chances are you may not want to put this tutorial on CRAN, maybe it's not for wide distribution, but you can ask your students to install it directly from GitHub, or if you're using something like RStudio Cloud, you can ask them to, you can pre-install it for them. And within the RStudio IDE, they will be ready to be launched soon as the students start that RStudio session. I usually tell my students once you launch the tutorial, just make that tutorial pane, maximize it, and it looks just like they're interacting with it within the web browser. But now you're not relying on a service like ShinyApps.io to be hosting your tutorial, so it really doesn't matter how many students are using it at the same time, as long as your RStudio session allows that.

The barrier to entry for getting started with learnr is pretty low, especially if you're already an R markdown user. In fact, the tutorials in the dsbox package were co-developed with students with only one semester of introductory data science under their belt, so when I say it has a low entry point, I really mean it. We all know that students get better at coding or whatever they're learning with practice. We also probably mostly agree that formative feedback that nudges learners in the right direction while they're working on the exercises, as opposed to summative feedback that comes weeks after they're done, can have positive effects on their learning and enjoyment. Plus, providing individual feedback to students, especially for exercises narrow in scope, can turn into dreadful busy work for us educators.

So while developing meaningful automated feedback can be quite tedious and time-consuming, I think that's a much more intellectually engaging way to spend your time as an instructor. It's also a great opportunity to develop software engineering skills, like writing robust tests, bringing together your pedagogical and technical expertise. You can find materials for this talk, including the source code for the learnr tutorial we worked through, and links to resources to learn more at rstd.io slash global 2021 slash mcr.

So while developing meaningful automated feedback can be quite tedious and time-consuming, I think that's a much more intellectually engaging way to spend your time as an instructor.

Thank you for watching! And remember that comment from your chair about no more TA support? With what learnr, gradethis, and LearnR Hash can offer, you might get closer to responding. Bring it on!