Resources

Michael Chirico | Making .pot-ery with R: Translations in R Packages | RStudio

The R community is globally distributed and R itself is available with messages in 14 languages. Adding translations for non-native English-speaking users of your package can ease their experience and empower them to build better things with less frustration (though please note that ""object of type 'closure' is not subsettable"" is equally inscrutable in all human languages). In this talk, I will cover translations in R packages -- how to implement them, why to do so, and how to maintain them. This will summarize and extend learnings based on our experience adding Mandarin translations to data.table and culminating in the potools package. About Michael: Michael Chirico is a data scientist working on compute memory efficiency at Google. Before that he worked at Grab in Singapore and earlier got his PhD in Economics at the University of Pennsylvania. He is passionate about making tools to empower others who work with data (most of this energy is directed towards data.tab ≤) and loves learning languages (at various middling levels of proficiency in Japanese, Spanish, and Mandarin, with goals to learn Cantonese, Hokkien, Vietnamese and Bahasa)

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everyone, my name is Michael Chirico. I'm currently a data scientist at Google and I've been contributing to DataTable in my free time for several years. I'm here to talk to you today about how to add translations to your R packages.

So why would you want to add translations to your R packages? We're very blessed in the R community to have a very global community. Just from a snapshot that we see in the R ladies directory and in the RStudio community survey, we see that the R community is a very global one in all the continents with usage from all over the world. So by the time that you release your package, it will start to be used by people and their native language is not English. And a lot of package developers themselves, their native language is not English.

And kind of motivated by this and over the years seeing error messages reported on the DataTable issue tracker from people whose R session was clearly not in English, we took up the challenge of trying to add translations to DataTable. We landed on doing Chinese and we got together a team of over 20 translators to work on this. It ended up being kind of a monumental undertaking because DataTable is a pretty old package. Coming with that, there's over 1400 error messages in DataTable itself. I think typical packages have less than 100. A lot of packages have less than 20 even. So 1400 actually took quite a lot of time to figure out even how to go about translating. And I'd love to have time to thank individually all those people. It's really an honor to have been a part of it.

But just while we have time in this short time, to thank Hongyuan Zhao, Zhiyang, and Guangcheng Yu who really helped get the team together and get it over the finish line.

Introducing PoTools

The whole time I was working on this, I was kind of taken aback by how much friction there is in the process. There's a lot of pain points in having to learn a whole new language, which is this language of get text and pot files, po files, mo files, all these things, which you don't really need to understand that much to add translations to your package. So over the last couple of months, I've been working on a package called PoTools. The goal is to really eliminate as much of those frictions as possible to make it easier to add translations to the packages. And as of right now, there's only one user-facing function. It's called translate package. And it does all the legwork for you of adding together all those get text spacing utilities and just walks you through adding translations to all the messages it finds with your package.

Setting up your package for translation

So with that in mind, we really need to think about how to set your package up to be translated in the first place. There are some general rules about how to do development on your project that will make it easier for translation. The first one is about templating. Here, the stop message that we see, it's something very common to see and it's used all the time in my own development. But in terms of translation, it makes things a bit harder. What a translator would see from the first instance is three strings. Sound, columns, comma, but, and are needed. And out of context, those things are very hard to translate. And there's some issues with duplication where it would be even harder to translate. If it's in the templated form, the translator is free to rearrange things, which will have to be done in, for example, Japanese, which has a totally different grammar.

And, yeah, just the templated form makes it a lot easier for translators to really make the message appear as it naturally would in their own language.

And, yeah, just the templated form makes it a lot easier for translators to really make the message appear as it naturally would in their own language.

It's a similar thing for pluralization. English is blessed to have a pretty simple pluralization system. When n equals 1 in this message, the error would become found an issue. When n is different from 1, the error message would become found some issues. And, for example, East Asian languages is typically even easier. There's not really any pluralization in the first place. But for some languages like Slovenian and Arabic, there are up to four or even six types of pluralization that have to be handled by translators. And for that, there's this nget text function that is part of base R that you provided the number and some template translations and then your translator would help do the legwork of providing what the translations should be based on what n is.

And that's all we'll have time for today in the lightning talk. So, thanks to everybody. Thanks to RStudio for the invite. And thanks to all the contributors to the languages. And thank you.