
Teaching Data Science in Adverse Circumstances: Posit Cloud and Quarto to the Rescue - posit::conf
Presented by Aleksander Dietrichson The focus of this presentation is on the challenges faced by teachers of data science whose students are not quantitatively inclined and may face some adversity in terms of technology resources available to them and potential language barriers. I identify three main areas of challenges and show how at Universidad Nacional de San Martín (Argentina) we addressed each of the areas through a combination of original curriculum redesign, production of course materials appropriate for the students in question; and the use of OS, and some Posit products, i.e.:posit.cloud and Quarto. I show how these technologies can be used as a pedagogical tool to overcome the challenges mentioned, even on a shoestring budget. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Teaching data science. Session Code: TALK-1094
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
all right so we'll start with alexander here who's going to be talking about teaching data science in adverse circumstances uh it's a little awkward but well this is plan b apologies um so um
adverse circumstances and this was an example uh of
so i live in argentina that is the uh another part of the adverse circumstance it's a nice country at the very south of south america um middle income country we have a rather high poverty rate and we have inflation at 113 percent um so think about that the next time you complain about six that you you have here where it could be considerably worse and everything is relative as we all know the um i live in the capital city buenos aires and i'm going to pull this down a little it's home to about a third of the population rather concentrated and i work at a public university there called some universidad de san martín which a public university in argentina means that the students don't don't pay any fees so it's completely subsidized by financed by by the government or the state so during covid we actually surveyed the students here and we found that more than half of them do not have access to laptops or computers at home this became relevant because everything was was online and most of them actually work well while they're studying i also work at escuela de humanidades which is essentially a liberal arts college i work in a communication program so my students often want to be journalists or pr people it's not really the math and and computer science crowd so that poses some challenges lack of resources financial resources most importantly there are some language barriers rhythmophobia and technophobia to to a lesser lesser degree as well and why why care about this
we shouldn't exclude people based on language or or financial resource situation and i also think it benefits society in general if journalists are actually statistically literate that's that's for the better good we can discuss whether it matters for pr people
um we um we have a host of english majors who are fresh out of work because of chat gpt if we give them some some data science they have some marketable skills we can help them out but mostly and foremost this diversity enriches our community they bring in perspectives and they bring in new inspiration and that's good for for us so the resource situation was very easily solved when posit cloud became available there's a free version which i used with my students and it runs on on everything what you see here is an 18 year old one laptop per child computer they were distributed by the government almost two decades ago now the only use they have is to browse the internet and that is essentially all you need for for posit cloud and i have another example here this is me checking in on a training job while i'm out to dinner on my cell phone not recommended but it it does actually work if if it's absolutely if it's absolutely needed and then i want to talk a little bit about language barriers
Language barriers in R
in our itself take a look at this code this is what our code might look like if you see it for the first time and your first language is not english this is essentially a trivial data analysis from the documentation of dplyr but think about that for a moment if you don't have semantic access to the function names then you know it's going to be a little more it's going to be a little more difficult this is not something that i think we can solve by by forking r and rewriting all the functions but it bears thinking about it we should keep it in mind that there is going to be a steeper learning curve for for this population you know for any programming language in fact because they're all going to be english-based there's interface issues that's easier to solve and in fact it has been solved for french not for my use case but i'm sure you know that's going to be updated shortly easy enough to fix and it does it does help but the most important
i think the most important one is to have teaching materials in the language of the students and here i am trying to make a contribution by publishing everything that i use and create it's freely available online and there are more people in the community who who do this so there is a growing not enough but there is a growing body of of teaching materials that we can work on so arrhythmophobia the fear of numbers i have sort of three areas where i've been trying to to work on that the first is curriculum this is sort of a standard setup for intro to stats
i think the most important one is to have teaching materials in the language of the students
Curriculum redesign for non-quantitative students
um and this is how i reshuffled it for for my purposes i'm not saying this is the only way to do it but i move certain things around hopefully that makes it easier for you know for the student to to get into this um and i'm going to go through the top two here the first one is hypothesis testing with karl popper and i bring him up to week three for for two reasons first of all i i can tell my students these are you know humanity students and karl popper is a philosopher he's one of your lot yet he is he's fundamental to everything we do in empirical science and the other reason is there's no there's no coding homework in week three and that reduces the likelihood that the students drop the class you know they've just been exposed to what you saw and they need they need a little break so so that's that's the other reason and then i i've moved the chi square analysis up it requires very little data to to get going show up with four numbers and and you're essentially done visualization is is easy not not beautiful but it is easy to to get going and it's a very intuitive imbalance that that we're analyzing and it doesn't have a lot of requirements if any of you have done a t-test lately you will remember that there are up to seven assumptions attached to that here there there are very very few um and you know arrhythmophobia there's only four numbers you have to deal with here you get a result so the other thing i i like to do is use relevant data what you see here is data that gets published by the government on income among other things and with you know very little analysis you get here there's a gender gap in pay the the mean salary is actually quite low ridiculously low and there's a huge standard deviation so you have a lot of income disparity with very little analysis you you you describe argentine society and i think this is interesting more interesting than than penguins and empty cars for for this population because it speaks to issues that matter to to us in the society where we live and and work and and study
so so i think this is interesting i'm if anyone is interested in live coding and schadenfreude as a pedagogical device i do live code a lot to demystify the process you know it's a lot of of trial and error and it's very iterative and and it takes forever that's something that that should be known and schadenfreude the way i i use that as a pedagogical tool is to take some bad research some bad data analysis and ask my students to to have at it and and that's they do that with gusto and it it works for me it's worked very well because it's sort of you can learn from the errors of others before you learn and have the confidence to learn from your own errors so from that point of view it's it's useful and i want to take a moment to think about what it is that we teach when we teach data science you've seen this surely before this venn diagram was omnipresent in in data science presentations at the beginning when we sort of had to justify ourselves and i feel we've sort of used it up but anyways so as as a teacher
i think we're doing this i think we're doing this to these two parts and i really think that it is the student who is the domain expert so we create a data science team here providing the the different components on on either side and i want to show two examples
Student domain expertise in action
the first one is matias salto he studies billboards in the in the subway system in the capital city and is interested in warm and cold colors and how they relate to the product that's being advertised when you pry open an image you get a bunch of color codes and then he took those and created color profiles for the different images for the different products that he was studying here you know you have headache drugs there's candy in the middle and cell phone service on the right there and then he created a classifier model using the colors as predictors essentially which i thought was was very clever and i don't know anything about color as you can tell from from my slide deck so in in this case you know it really is the student who brought that expertise uh to uh to the fore and we were able to do this you know i i can help with code and i can help with methods and then there's camilla ramirez
she works on animated shorts small films that have no dialogue so it's all visual language and she's interested in studying color as a semiotic resource and when you pry open a movie download it from youtube you get a series of frames you get 29.3 frames per second or something like that and then you can do the same thing you can do a color analysis on each frame and now you have a color timeline for the movie and when i saw this i said you know i don't know what's going on in this movie but i know when it's going on because there's a timeline along the x-axis um and so this is you know quantitative semiotics i thought it was absolutely brilliant again the student is the expert here i've provided some method and some code uh but this is really uh the work of a domain expert um that uh in collaboration so so i think i don't think we should be using this this diagram anymore and i think i'm gonna skip to my conclusions now
this is you know quantitative semiotics i thought it was absolutely brilliant again the student is the expert here i've provided some method and some code uh but this is really uh the work of a domain expert um that uh in collaboration
if you teach data science you should try to be relevant and be real and be there and if you don't teach data science you should keep supporting those of us who do thank you
Q&A
thanks alexander should i leave this um yeah jill will be next and she'll be virtual so um we do have a minute or two for questions if anyone does have one you could just uh raise your hand at this point if you don't have slido up i i have one i was thinking um like the the code is in english like you were saying and it's hard to get that like translated and still work right is there anything that you do to try to get over that barrier like what what are the tools that people use to help people understand if english is in their first language well so there's there's two well first of all the the function names are arbitrary but they haven't been assigned randomly i mean they have been assigned by a programmer for you know to be able to retrieve the information that's encoded in the in the function i mean there's a reason we name functions um and as i said um and i i sort of want to stick to that i don't think we should try and overcome this because then you would sort of disconnect a linguistic minority from the from the general community and that's you know that's division that's not diversity um and it would be counterproductive in in a you know in a in a in a long you know in ways that are too many to to sort of list so it is more about um if you write your own function write it in any language you want if you use somebody else's functions you you're gonna have to learn enough english to be able to read it um so it's just you know beware that there's this deeper learning curve i think that's the you know that's my answer to to that problem i know and be sensitive to it cool
any other questions real quick yeah she's got one yep hi thank you for the talk um my question is what can we as contributors to the our universe do in terms of like our brand and our documentation that make things easier for folks that don't maybe operate english so the question was for the virtual audience um what can we in our uni universe do to kind of help um make it easier for people who don't speak english as a first language and whatnot yeah i think publishing uh teaching materials online um and uh i guess we we need to do a better job marketing them so that other people uh get a hold of them and can can use them um you know i would love to have a curated collection of uh textbooks in my case textbook in in spanish for for statistics i haven't found that i found bits and pieces to uh you know that are that are useful um so um but but i think that's it you have to put your stuff online and make sure it's i guess you have to make sure it's licensed correctly with a free for all license what it's share alike um creative commons i think it's the most useful license um so that other people can use it and and contact people and collaborate i think that's uh you know we're i think we're also doing that but we could do more of it great thank you so much alexander absolutely
