
Riva Quiroga | Learning to program in R with a "communicative approach" | RStudio
Full title: How to do things with words: learning to program in R with a "communicative approach" Textbooks for learning a new language always start the same: you learn to say hello, to introduce yourself, and some simple and useful sentences to communicate with others. In language teaching, this is called a “communicative approach”, and is based on the idea that learning a language successfully comes through having to communicate real meaning to real people. This is what I expected to find when I first tried to learn R seven years ago. Sadly, I got stuck in resources that started with definitions of abstract concepts and no real examples of how to say things with data. In this talk I will discuss the benefits of adopting a communicative approach and how to implement it when teaching/learning R, writing documentation, and writing code that will be read by other human beings. About Riva: I like to organize R related things, like meetups (RLadies Santiago & RLadies Valparaíso), conferences (LatinR, satRday Santiago), book translations (R4DS in Spanish), and data projects (#DatosDeMiercoles). I am an editor at The Programming Historian, and I am currently pursuing a PhD in Linguistics
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, my name is Riva and I'm a linguist who uses R. Seven years ago I tried to learn R and I failed. I wasn't sure where to start my R journey, so as someone with a background in humanities I thought the best option was reading a book. I wanted to make one of those cool plots with a great background I've seen around to communicate some results. So I picked a book that promised me that I will learn the most important elements of R.
On page 99 I was still reading about defining class and methods, trying to understand what an array was and how to multiply its elements, but hadn't done anything with real data yet. The problem, I think, were my expectations. I was expecting learning resources that helped me say things with data, but all I found were description of rules that didn't make much sense to me as a beginner. It was very difficult for me to see how I can use these rules with the data I had, and how I will jump from these abstract descriptions to, for example, making a plot. It felt like trying to learn a second language only by reading a grammar book and not by interacting with actual human beings.
Programming languages and second language learning
The thing is that learning a programming language has many similarities with learning a second language. But what does this mean? How can this idea be applied to the design of learning resources? Or when choosing them to guide your learning path? And how can you take this into account when developing a package, for example?
First, let's talk about learning. Not everyone learns a second language with the same purposes. The same happened with programming languages. I failed learning R the first time because I wasn't the expected audience for the book I chose. At the moment, I just wanted to be an R user, not a programmer. Not everyone is learning R to be a linguist or a grammarian of the language. There are surely people that want a deep understanding of how the language works. And those are the people who are contributing to its development.
But there are tons of people that just want to use R to do things, like making a plot, or running a statistical model, or creating reports. I failed the first time I tried to learn R because the book I chose for learning was the equivalent of a grammar book of R. Descriptions of how the language works and the names and purposes of all its different parts. And as a beginner, or as someone who just wants to learn the language to communicate your data to others, that is not what is more useful. What you need is something that works more like a textbook.
The communicative approach
Textbooks for learning a second language always start the same. You learn how to say hello, how to introduce yourself, and some simple sentences to communicate meaning to others. Even books for learning dead languages have this approach. I had to learn a little bit of Latin as an undergrad. And the first thing we learned was how to ask someone who they are and how to respond to that question. I even learned how to insult someone in Latin, a language in which the last native speaker died centuries ago.
Textbooks start this way because they are built around people's needs and knowledge, not around language rules. In textbooks, you learn first what you can do with a language, not how the language works. In language teaching, this is called a communicative approach, and is based on the idea that learning a language successfully comes from having to communicate real meaning to real people.
In language teaching, this is called a communicative approach, and is based on the idea that learning a language successfully comes from having to communicate real meaning to real people.
Throughout the different lessons in a textbook, you learn new words and structures, and new contexts where to use those words and structures. The grammar rules behind those real-life examples are only explained after you learn how to use them, and if an old if is really necessary for you to know them. In textbooks, scaffolding works as a spiral. You learn something in one lesson, and then come back to that content in the next lesson, but now showing more ways to use what you learn, how to adapt it to different situations, and how to combine the words and structures you know in new ways.
It's like copy and pasting, and then adapting. You learn how to use a new language structure, like how to ask a question, and then you learn how to adapt it to your own needs, to solve the problems you are interested in. It's this ability to adapt language structures to new situations what makes us flexible users, in both natural languages and programming ones.
For example, we first learn how to create one type of plot using all the default settings, and then we come back to that content to add more complexity to the visualization, like mapping more variables, using different geoms, or making annotations. And we build those new skills over the ones we learned previously. This copy, pasting, and adapting is also the strategy we use with examples we find in Stack Overflow or RStudio community. We bring to our script something that worked in another context, and then try to figure out how to adapt it to our own needs.
Sometimes we are not even sure why it worked, but after doing it a couple of times, we start inferring what is going on behind the scenes. You don't need to know all the grammar rules to be understood in a language, and you don't need to know the name of all its structures to be a competent user. Even native speakers don't know the name of all the parts. You probably don't know what a paratactic or a hypotactic clause complex is, but you definitely know how to use one.
If you are a native English speaker, you know how to correctly pronounce every case of two O's together in words like moon, book, floor, flood, something very, very difficult for us not-native speakers. But you probably don't know why they are pronounced differently. The adults who raise you didn't write you a grammar book for you to acquire your native language. They didn't teach you the grammar rules. They talked to you and let you talk. They probably read you books. You sang or sang together. You did things with words progressively more complex.
In the same way, you don't need to know all the rules of a programming language to be able to do things with it. What rules do you need? The ones that can help you achieve the purposes you're seeking. If you need to customize a plot in a very precise way because the default options are not what you need, then learning how a theme works in ggplot is relevant, not before. If you need to work with arrays, then learning what they are and what kind of things you can do with them is relevant, not before.
Learner personas and modern R resources
One more thing about textbooks. They usually have learner personas. If you're learning a language because you want to travel, the first chapters will probably show you how to get along in the airport and ask for directions in a city. If you're learning the language because you're going to study abroad, the first chapters will probably be about students trying to find something in the campus or dealing with university bureaucracy. The same happens with programming learning resources.
Fortunately, in the past couple of years, a lot of online resources have appeared to learn R and other programming languages, and many of them are focused on the needs of specific users. So all the data and examples are connected with what those specific users need and know. So taking into account the community approach when teaching programming language involves doing real things from day or page one, building complexity as a spiral, revealing rules after you know how and when to use them, and only the ones that can help you achieve what you're looking for. And taking into account who your learner personas are.
In the last couple of years, a lot of R resources have been created taking into account these principles. Learning resources that let you experiment first and then build explanations. Books that show you the whole game first and then explain the details behind every step. Publications that had in mind different learner personas with different backgrounds that show you how to do relevant things with relevant data for your field.
But hey, what about the other books? The ones that explain how R works? Well, those actually are very cool books. I'm a linguist myself, so I really love grammar books. But for both natural and programming languages, they're not the first step to approach a language when what you want is to learn how to use it and how to communicate with other people.
Package development: error messages
What is this communicative approach only relevant for people teaching or learning R? What about programmers? What about people developing packages? Well, thinking package development from a communicative point of view can also help us in issues like developing more useful error messages and documentation. Let's take a look at error messages first.
Whatever you're learning, having feedback is the kind of thing that can help you improve. This applies to any human activity, from sports to cooking, from knitting to language learning. Having this in mind, let's take a look at two error messages addressing the same problem. We all have been here, putting one instead of two equal signs to filter a column. This is the error message you get in base R. It says error unexpected equal sign in, and then the part of the code where the unexpected equal sign is. Our message here is pointing you where your mistake is, but there is no clue on how to solve it. Why is the equal sign unexpected? Should I put something else? And if you're not an English speaker, there is even more enigmatic.
Pointing errors is not the same as giving feedback. To learn from your mistakes, you not only need to know where the error is, but you need some clues on how to solve it, so next time you know what to do. Let's take a look at the message for the same error, but using the filter function from Dplyr. Your message says, error, problem with filter, input one. Input one is name. This usually means that you have used one equal sign instead of two. Did you mean country two equal signs Chile? Run rlang last error to see where the error occurred.
This message let us know what the problem is, its probable origin, a suggestion how it can be solved, and if and only if you want to know where your code failed, how to take a detailed look. What we have here is feedback. We not only learn the origin of the problem and how to solve it, but also how we can take a deeper look at the code to see where the error occurred. And if you're not an English speaker, you can put this on Google Translate and we'll get something useful.
These couple of extra lines in your function code can make a lot of people, can make the life of many people easier and help them fail in a way they can learn. Giving useful feedback is the best way to learn how to overcome difficulties. And this is something we do all the time when someone is learning our native language. If you don't understand what they are trying to communicate to you, you usually say things like, sorry, did you mean... and try to rephrase what they said to check if you understand correctly their intention. We don't go around saying people lack of subject matter agreement or misplaced modifiers. We don't do that because it is not really helpful for learners. It's more effective to give cues on how to fix the problem and show how a correct version might look like.
Package development: documentation and vignettes
So if you're a package developer and want people to be happy users with your package, you can think about how your error messages can give guidance. But error messages are not the only part of a package that can adopt a communicative approach. Let's take a look at documentation. Function documentation is great. Sometimes you forget the name of an argument and taking the help page can allow you to solve this kind of issue. But to use it, you need to know the actual name of the function you're after. Otherwise it's kind of useless.
What if you know the problem you want to solve, but not the name of the function that can help you solve it? Function documentation are like dictionaries. They have the definition of the words you can use in a programming language. But we don't learn a language by memorizing words from the dictionary. We need to learn how to use the words, in what context they are useful, what kind of meaning can we construct with them? What if you don't know the name of the function you need? What if you know what you want to achieve, but not what functions can be useful for that? Well, here comes vignettes.
Vignettes are an awesome resource to describe the problem that your package is designed to solve and to show people how to solve it. They are a great resource to show how to do things with your package with real example. And they are easy to find using a search engine. So they are also a great way to promote your package. In a vignette, you can explain with details the different categories your functions are divided, how you can coordinate multiple functions to solve problems, and show extended examples that people can copy, paste, and adapt to meet their own needs. The best part is that vignettes are easy to create using the uses package. With uses-use-vignettes, you can create one.
So thinking of error messages as a way to give feedback to the users of your package and writing vignettes that explain how to do things with the functions you create are great ways to think about package development from a communicative approach.
Closing remarks
Two closing remarks. Is RubyR the best programming language to learn? Isn't Python better? Or what about Julia? With programming languages, it happens the same as with natural languages. There is no one intrinsically better than the other. The best programming language is the one that can help you solve more effectively the problem you are trying to solve in the context you currently are. Sometimes, things like processing time or number of lines of code are important to take into account. But sometimes they are not. And only what is easier or faster for you to write, not the computer to process, is more important. And sometimes it's more relevant what programming language your audience can understand.
The same happens with natural languages. Right now, I would really prefer to be speaking in Spanish, but it's definitely more effective if I speak in English. Why? Not because of any characteristic of English language itself. Actually, for this subject in particular, Spanish is way more precise, because we have more words to speak about language. We make distinctions like lengua, lenguaje, idioma, all of which have different and precise meanings. So I need to use more words in English to explain those concepts. But right now, being able to convey my message to a broader audience is more important than the lexical variety or precision that Spanish offers to this subject.
So which programming language to learn? The one that can help you best solve the problems you are interested in, and that allows you to speak with the people you are interested in communicating with at that moment. And with natural language, knowing, the same way as with natural languages, knowing more than one allows you to communicate with more people and do more things with words.
And this leads us to the second closing remark. Communication and community has the same etymology. A communicative approach in programming languages is important because it can help us build stronger communities where everyone can be part of the conversation. New people learning the language can offer new and diverse perspectives, and we should cherish that.
There is no one correct way to learn a programming language, but there are many things that can be done to make it more effective and enjoyable. Failing to offer people opportunities to learn in an easier way, or thinking that the only way to learn a programming language is the hard way, is not only kind of a gatekeeping issue, but it's also dangerous. Because the most dangerous thing that can happen to a natural or a programming language is not having a vibrant community of users to keep it alive and evolving.
Because the most dangerous thing that can happen to a natural or a programming language is not having a vibrant community of users to keep it alive and evolving.
