Resources

Chelsea Parlett-Pelleriti | Hands-on ways to remotely teach data science are invaluable | RStudio

Full title: With more learning occurring virtually or in hybrid mode, hands-on ways to remotely teach data science are invaluable With more learning occurring virtually or in hybrid mode, hands-on ways to remotely teach DS are invaluable. Guided simulation exercises in R allow learners to explore concepts deeply, on their own time, and with others. They can also experiment with the simulations, try out edge cases, and challenge their assumptions, leading to more fruitful discussions. The comparison between coefficient estimates in regular, LASSO, and RIDGE regression, or how PCA performs when data are related are great examples of concepts where guided simulations can encourage learners to build intuitive knowledge. This talk explores how to use simulation exercises in R to help learners explore DS concepts and provides examples. About Chelsea: Chelsea Parlett-Pelleriti is a PhD Candidate and full-time instructional faculty teaching Data Science at Chapman University. Her research centers around how we can use statistics and machine learning to improve the way we analyze behavioral data. In her free time, you can find Chelsea on Twitter making stats memes or statsTikTok's. She also writes about statistics, machine learning, and using R for various blogs

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi, my name is Chelsea Parlett-Pelleriti and I'm on Twitter at Chelsea Parlett in case you want to follow any of the fun stats content that I make. And speaking of stats content, something that I've spent a lot of time making in the past year are guided simulation exercises.

I teach data science to mostly undergraduates and it can be an intimidating subject to them. I often get asked for rules of thumb that they can use to make good data science decisions. And while there are some rules of thumb that work in a lot of situations, I want my students to come away with a deep understanding of the concepts so that when they're in a situation where that rule of thumb doesn't apply, they're not just stuck.

I decided that guided simulation exercises were a great way to create that deeper understanding. By my definition, guided simulation exercises are activities where you're asked to simulate data, you apply an algorithm or a computation to that data, and then you're given specific guided questions that can help you figure out what that simulation can tell you about whichever algorithm or computation you applied to the data.

Why simulation exercises work

Simulation exercises are helpful for three main reasons. First, I think it encourages this attitude of exploration when learning. And second, it gives you tools to test your intuition or try weird edge cases that you think of and see what happens. And finally, I think it empowers this deep understanding of the concepts that you're learning.

A lot of people learn data science because they think they should, but my goal is to help people really deeply explore the concepts that they're learning. And sure, they could write their own simulations, but I think the guided part is really important. I think these exercises are a little more practical than just throwing people in the deep end on their own.

It makes simulating data in R feel more approachable, and it gives people tools to try and figure things out themselves rather than just relying on what they're told. Relatedly, the guided nature of these exercises gives people the tools to learn and try things on their own and tailor their experience to things that specifically make them curious.

For example, maybe you're wondering, what would happen if I ran principal component analysis on a set of variables that are completely uncorrelated? Well, try it out. Or maybe you're wondering why we would ever use k-fold cross-validation rather than the simpler, more computationally efficient train-test split. Well, run a bunch of simulations and see what the difference is.

In my opinion, the best data scientists are the ones who have a little bit of doubt about whether the established method is the best way, necessarily. And these guided simulation exercises encourage people to test things out, to push on the boundaries of established methods, and to figure out how our favorite algorithms work in atypical situations. After all, our data will not always be well-behaved.

In my opinion, the best data scientists are the ones who have a little bit of doubt about whether the established method is the best way, necessarily.

Guided simulation exercises are not a replacement for the math and theory that go behind data science topics, but it can help support them. And I think the examples, the specific questions, and the established code help make simulation more accessible, especially to people who don't have the skills yet or maybe don't have the time to set all of it up on their own.

PCA guided simulation overview

I hope you'll check out the guided simulations that are already on the conference GitHub and all of the ones that we make in the future. But for now, in my last minute, I would love to give you the world's quickest overview of my principal component analysis guided simulation exercise.

When I teach principal component analysis, one of the main ideas that I emphasize is that principal component analysis takes advantage of relationships between variables in order to create more efficient axes that better define the variation in the data. So it shouldn't be a surprise that when we look at principal component analysis on completely uncorrelated variables, the screen plot shows us that each variable pretty much contributes its own independent information. On the other hand, when we have highly correlated variables, the PCA model as well as the screen plot tell us that a lot of the variation then can be explained by one single component because there was a lot of shared information in the data originally.

Thank you so much for listening to my talk. I would love to connect with all of you over the internet and hear whether you use any of these guided simulation exercises, either for yourself or for people that you teach. And definitely let me know if you create any of your own. I would love to hear about it. Thank you.