
Translating R for Data Science into Portuguese: A Community-Led Initiative (Beatriz Milz, UFABC)
Translating R for Data Science into Portuguese: A Community-Led Initiative Speaker(s): Beatriz Milz Abstract: How can open-source collaboration help make data science more accessible and expand Posit’s global impact? The book "R for Data Science" by Hadley Wickham, Mine Çetinkaya-Rundel, and Garrett Grolemund is a key resource for learning R and the tidyverse. In a collaborative effort, volunteers from the R community translated the second edition into Brazilian Portuguese, making it freely available online. This talk explores the translation journey, the challenges of adapting technical content, and key lessons learned to support future translation teams. Materials - https://beamilz.com/talks/en/2025-posit-conf/ posit::conf(2025) Subscribe to posit::conf updates: https://posit.co/about/subscription-management/
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everyone. Can everyone hear me? Yes? My name is Beatriz, featuring baby. I'm pregnant. And I came all the way from Brazil to share with you about a project that me and other people from the community that I'm part of did a translation of the R for Data Science book into Portuguese, which is the language that we speak in Brazil. And this is not working.
So I guess most of you already know which book this is. It's the R for Data Science book. And this is a great book to learn about how to program in R and also about tidyverse in general. And this book is available online for free. So anyone can access the website and read the book and study. And also you can buy a printed version if you want. And in this talk, I want to talk to you about how we did a translation of this book into Portuguese and how we got people to contribute.
Background: the first edition and the dados package
So how did we translate the book into Portuguese? First of all, there is already a translation of the book in the first edition of the book that was translated in 2019 in Brazil, some years ago. But one of the problems is that it was available only on printed version. So you could not access it for free. And I'm not sure how much you all know about Brazil, but it's a country with a lot of income inequality. And the cost of this book now costs around 10% of the monthly minimum wage in Brazil. So for students, this is quite expensive.
And this story about this translation starts back in 2019 in USAR. I had an opportunity to attend the conference with a diversity scholarship. And there I met a lot of people from the Latin community that Brazil speaks Portuguese. The whole other part of Latin America speaks Spanish. And we actually didn't connect. So that was a great experience. And Riva Quiroga, she was presenting about the translation of the book to Spanish. And she was presenting also about the development of a package called Atos, which had the translations of the datasets used in the book into Spanish. And that presentation inspired me a lot, because I didn't even know that it was possible to do a translation if you were not, like, a big publisher.
And Riva and I got in touch. And in 2020, she said, okay, you cannot translate the book, because it was already translated. But what if we translate the datasets of the book? So you can use in tutorials, classes, and so on. So Datos got a sibling, which is called package Dados. And in English, it translates to datasets. And we got 11 volunteers to work on this project. Everyone that is on purple borders are people from R-Ladies groups. Not only from Brazil, but in other Latin American countries. So you see, it was not a project from R-Ladies, but a lot of people from R-Ladies were part of it.
And now the package, it's available on CRAN. You can download and install it and use the datasets that you already know from the book, like Star Wars, Palmer Penguins, and New York City flights. And we did the translation of, like, the name of the variables, the categories and text variables, and also the help pages. All the help pages are also translated to Portuguese.
Getting permission to translate the second edition
I'm quite a GitHub user. And if you don't know what GitHub is, it's a platform that you can use to store your code and work collaboratively with other people that develops things. And this book is actually built on a GitHub repository. And during 2021, I noticed that there was some movement in the GitHub repo from the book, and that the authors were developing the second edition of the book. So that was our chance to maybe trying to have a translation of the book into Portuguese.
So in this moment, some imposter syndrome thoughts came into my mind. Like, oh, you're just a PhD student. The authors of the book have a lot of things to worry about. They don't have time to worry about, like, translating stuff to Portuguese. Or there's a lot of people that might be more experienced than you. Excuse me. To do this translation.
But after a while, I thought about a phrase that my mom always said. Which in Portuguese says... And the translation would be something like... If you don't ask, the answer will very rarely be no. Like, let's try. So inspired by my mom, I went on the GitHub repository and wrote an issue saying... Hey, Hadley, which is the first author of the book, it would be great if you have, like, the translation by the community of the second edition. I'm part of the team that did the translation of the package dados into Portuguese. Let us know if we can do it. And after a while, Hadley responded to me.
If you don't ask, the answer will very rarely be no.
He said that... Yeah. You cannot do the first edition. But maybe we can get an authorization to do the second edition. And eventually he said that... Already said yes. Yay! But we couldn't start that just yet. Because they were still writing the book. So we had to wait a bit. So during 2022, 2033, we had to wait until they finished the book. And this picture is from PositConf 2022. I had the opportunity to come here. And I said... Hey, Hadley, I'm a real person. I'm the person that wrote that issue. I'm really interested in doing that. It's not like a bot from GitHub or something.
And in the middle of 2023, he said that the book was finished. And he asked if we wanted to ask already again. And then in September, we got the permission. Yay! So from September 2023 until November 2024, we got more than 20 people working in this translation. Back then, I think, like, stuff was quite bad. It was not really good to be used. So people had to translate it. And we had, like, one person to translate the book and two people to review the translation. So... And these people are from, like, really different areas of knowledge. Not only all statisticians. And in the end of 2024, almost a year ago, we got the Portuguese version of the book online. So you can access it on pt.r4ds.hadley.nz. So it's an official domain from Hadley. And now everyone can access the second edition of the book for free and online. Yay!
How we got people to contribute
So how we get so many people to contribute in this project. So there are three main things that I thought about when thinking about this talk. The first is widen our reach. I needed to find people to contribute. So the first thing is communities are a great way to find people. We found a lot of people in RLadies, as I said before. But also LatinR, which is the community for programmers from Latin America. Which has people that speak Portuguese, English, Spanish, Porto English, and so on. So it's really mixed. And also, like, local communities. So we found a lot of people on communities. And a lot of people also came from a call from volunteers on social media. You can see that blue sky is not here. Because most people use, like, LinkedIn and Instagram in Brazil. So we called for volunteers on the social media we use most there. And a lot of people that I never met before that were part of the translation team, they came from these posts on social media that other people shared.
And the second thing that was important for us in this project was to show the path. As I said before, the book is stored in a GitHub repository. So we did a fork of that GitHub repository and did all the translations there. But it turns out a lot of people don't know how to use GitHub. So some people were interested in contributing, but they never used Git or GitHub before. Especially things like issues for requests and projects. Sometimes they knew how to, like, open up a repo and download it and close it. But they didn't know how to use these other features that we used a lot.
So in the start of the project, I was doing one-on-one calls with the volunteers that needed help explaining to them how we could use Git and GitHub. But as the team grew, it was quite hard to keep it up with one-on-one calls all the time. So we started doing a translation guide, which is in a Quarto book. So in this translation guide, we wrote things related to the translation itself. Some patterns that we were following. For example, Portuguese and Spanish are languages that have gender in all the words. Like, simple things that you speak in English and have no gender at all. Like, author. In Portuguese. So we had to think about how we would translate in a way that's more neutral. And not so, like, marked on gender. So we wrote all these ideas there. But we also wrote a bunch of chapters about what is Git and GitHub? How can you start, like, reviewing a chapter? And how you do a pull request and things like that. And that translation guide was, like, growing as the project was growing. Because new questions made me do, like, new contributions to this book. To this guide. And that helped us a lot.
And I think this part was really important. Because if we didn't do, like, this capacitations, this guides, and one on one calls, a lot of people would be locked outside the translation project. And for a lot of people that were part of this, this was their first contribution to a community project or to open source or, like, using GitHub in general. So that was really important for us.
For a lot of people that were part of this, this was their first contribution to a community project or to open source or, like, using GitHub in general.
And in this moment, I think it's really important to emphasize that we are standing in the shoulders of giants. So as I said before, the Spanish ‑‑ the Latin community, the Spanish people that are part of the Latin community, they already have done DR for data science in Spanish translation. So especially Riva, which is the leader of the translation, she helped us a lot with a lot of things that how they did the translation, the workflow, and what went wrong. So for us, it was really easier. Because we didn't have to repeat with the problems they had before. We didn't have to reinvent the wheel.
Being patient and the impact of the project
And this is a community project. Not only for our Brazilian community, but it has a collaboration from the community in Latin America in general. So the third thing that I think is important in this kind of project, and it was important for our project, was to be patient. So you see that it was not quick. So the whole process from starting from Dados and Dados until the translation of the book took around four years. And the book has 30 chapters from small chapters to longer chapters. And we have around ‑‑ had around 30 volunteers from Dados and the book as well, from developing the translations of Dados and the book. And people involved are volunteers. So a lot of them were doing the contributions during their weekends, the time they could be with their families, and they chose to be there. So it was important for us to be a good experience for everyone. Sometimes people could stop responding or it took a long time. We just gave them gentle nudges, like, hey, are you still up to the ‑‑ are you still want to do this translation or this review? And if they said no, there was no problem, because we bring other people as reinforcements, because we had, like, this big pool of people that wanted to participate.
And, as I said, it was important to be patient in this process, and it was really worth it, because some weeks ago, I was in a conference in Brazil, and I was in an effort of sharing about the book, because some people that are not part of the community still don't know about the book. And a lot of people thanked me in that moment, because ‑‑ and I said, it's not a project that I did. It was, like, a lot of people involved. But they thanked us a lot, because now they can use the book, like, in classes, and I know some teachers on universities that are already using it. So it was totally worth it.
And as I said, the book is being used in university courses and activities on R‑Ladies and other communities, and we also are doing R‑Ladies book club. And this book club actually started while the book was being translated. So we already had 16 meetings. Each meeting is focused on a chapter, and we're planning to do the whole book, so all the meetings is also recorded on YouTube, so people, if they cannot attend the meeting in that time, they can watch after. And some of them had, like, 600 views, 300 views, 900 views, so a lot of people cannot attend in that time, but they can watch later.
And, yeah. So this project was a community project, and I would like to give a big thanks to some people. First, the authors of the book. So the book is written by Hadley Wickham. So thank you, the writers of the book, for not only writing the book, but saying okay to the translation, and also Riley, the publisher, for saying okay for this translation as well. I would like to thank the volunteers in this project, like, all the 30 people involved. And also the communities. R‑Ladies, Latin R, especially Riva, and the local communities. And also Bianca and Ariana, because they did some arts that are in the slides, and they're also part of the translation project. And so thank you. If you have any questions, I'll be around the conference. And you can also send me an e‑mail. This QR code sends you to the presentation and also to the book. So thank you for listening to me.
Q&A
Thank you so much. We do have a couple of questions here. So how important is it to have an active leader in a project like this? It seems like a lot of effort from yourself, and how much of the work can be delegated? I think it's really important to have a person or some group of persons that are leading. Otherwise, people can be, like, with no focus. And I think there's some things that we could delegate, but it was a lot of effort from me to keep it up with everyone if they were still on board, if they want to join, if they needed help, and things like that. Yeah. They definitely ‑‑ we definitely need one leader in the project.
I can say that from the participation in Our Ladies, I'm part of Our Ladies as well, and there we don't have, like, one leader. We have a lot of co‑organizers. And sometimes, like, a lot of people have other things going on, and it's hard to, like, have regular meetings and continue because you don't have, like, one person, like, say, hey, let's do it, because everyone's, like, oh, maybe other person can do it. And that doesn't work so well sometimes.
Right. Totally. One other question here. Is the translation guide Quarto book or some lessons learned or tips from offering a translation guide, is it available in English? Or is the original available online still? It's only available in Portuguese, because we only did in Portuguese. Yeah. We only have it in Portuguese. Okay. But that is available online. Yeah. Yeah. It's available online. There's a link in the slides. Great. But I know there's other translation projects that also have some guides. Not sure if they have, like, these GitHub tutorials, but, yeah. There's also other translations, like, for Bazar or things that people have some guides for the translation.
Amazing. Is there any tracking on the number of views of the book? Oh, no. No. I have to I didn't thought about, like, setting up, like, Google Analytics or something. And all the time someone asks me something like this, I'm, like, oh, I didn't think about setting up Google Analytics. So, I don't know how many people are accessing the book. Amazing. Thank you so much. Thank you.


