
Kate Hertweck | R training and documentation for different levels of expertise | RStudio
Full title: Making the jump from learning to applying: R training and documentation for different levels of expertise How does someone make the leap from learning R to actively applying R in professional work? At what point (if ever!) do we get to call ourselves "experts in R? This talk explores what differentiates novice, practitioner, and expert R programmers, and how transitions between these stages occur. I'll discuss the type of support required for R users to move from one level of expertise to the next, and how different types of training and documentation can support R users at each level. Understanding variable levels of education among R practitioners supports our own professional work, from collaborative coding to package development, and helps build a bigger, more inclusive R community. About Kate: Kate Hertweck is the bioinformatics training manager at Fred Hutchinson Cancer Research Center, where they develop and teach courses on reproducible computational methods as a part of fredhutch.io. Kate's graduate training at University of Missouri in genomic evolution of plants was followed by a postdoctoral fellowship at the National Evolutionary Synthesis Center (NESCent) at Duke University, where they fell in love with R and began working exclusively in computational biology. Kate then spent four years as an assistant professor teaching bioinformatics, genomics, and plant taxonomy before transitioning to biomedical research training. Kate has been involved in The Carpentries, a non-profit organization that teaches reproducible computational methods, since 2014, serving as a leader in community governance since 2016. When not being an overenthusiastic instructor, Kate likes to spend their time doing fiber arts (knitting, crochet) and enjoying all things science fiction
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome, my name is Kate Hertweck, I'm the bioinformatics training manager at Fred Hutchinson Cancer Research Center in Seattle, Washington. My career focus is helping scientists learn the coding and data skills they need to apply cutting edge computational methods and accelerate the rate of science.
I'm excited to tell you about one of the best parts of my job, helping researchers unlock the mindset and tools that help them make the most of their data and get more work done in less time.
Here's how the general process of teaching someone R usually happens. I work for a biomedical research center with a huge breadth of research topics, but everyone wants to make figures for publication. My community loves R, especially Tidyverse, because it makes sense to them to be able to achieve those goals.
In my introductory R class, I start by teaching some general skills in data manipulation and filtering. My participants learn to create elegant, effective data visualizations that help them explore their data.
After a few hours bent over their keyboards, I can see the slowly dawning realization in their eyes. With the power unlocked by coding, I will surely obtain a statistically significant result supporting my insightful hypothesis, yielding fame and glory, but more importantly, a paper in Nature or Science, as well as future grant funding.
In case you're not a scientist and this doesn't resonate with you, let's think about it from a layperson's, well, lay dog's perspective. For scientists, unlocking the power of coding is like when we're dogs getting a one-way ticket to the dog park, complete with all of our favorite toys and lots of interesting things to sniff.
That moment when the dog realizes the dog park is near is like that moment when an important concept about data or computational work clicks for someone learning R, and they see a world of opportunity open up for them.
Unfortunately, all too often, when the time comes to apply their skills to their own data, they're confronted with a console that looks more like this. These red error messages are sometimes scary, often confusing, and always frustrating. The prevalent feeling they experience is disappointment, tinged with a touch of, how am I supposed to achieve fame and glory when I can't even get R to acknowledge my data exists?
That crushing defeat in the face of such promise is also something a dog can relate to. It's the equivalent of jumping in the car, thinking we're heading to the dog park. We can almost feel the breeze, taste the rubber ball in our mouth, and smell the grass, instead ending up at the vet. This is exactly the opposite of what we wanted. It's disappointing and demoralizing, so much so that we don't want to get in the car again, even if we might end up at the dog park eventually someday.
From learning to applying
So how do we jump over this barrier, restoring a belief that we can achieve those lofty goals? When I accepted my current position to develop and implement a training program in reproducible computational methods two and a half years ago, I thought the most important part of my job would be time spent in a classroom. I had spent the previous four years teaching computational skills to students at the university level, and after an entire career in academia, I thought that the highest impact I could have at a research center would involve teaching formally structured short courses.
During my career, I've had the privilege of personally teaching hundreds of people to code in R. Especially in my last few years focusing on adult learners outside of formal university courses, I've learned a lot about what it takes to help people succeed in learning R, and that success is not solely reliant on formal short courses.
Now I define success as whether people continue to work with R after leaving my classroom, or at least if they gain some general literacy in data and computing skills so they can work more effectively with computational staff.
Now I define success as whether people continue to work with R after leaving my classroom, or at least if they gain some general literacy in data and computing skills so they can work more effectively with computational staff.
Given the breadth of research topics members of my community pursue, I am faced with the monumental task of helping people apply R skills to many different types of data and research questions. The question I then continually ask myself as I prepare to deliver training materials is this, what prevents people from using these tools after they've learned them?
Levels of expertise
Let's take a quick tour of the process of gaining expertise and think about what this looks like with coding. Regardless of the prior education someone possesses in a different field, I consider a person who has no prior experience learning to code in R a novice. In my community, that means they probably don't have any experience coding in a different language and probably also don't know much about how to structure and store their data effectively.
I integrate general concepts in coding and data management into my intro R course and focus on directly applicable skills like those I showed in code earlier. By the time someone has completed my introductory R class, I like to think that they're no longer a novice, since they have now transitioned to competent practitioner.
I define a competent practitioner as someone who applies R code to their own research problems. A practitioner probably doesn't know if the way that they code represents the most elegant or efficient solution to a problem, but they get the job done. I consider expert-level coders to be people who are writing code that make it easy for other people to use it. They encapsulate their projects in fully documented R packages and try to apply best practices in coding to make it robust and reproducible.
To be clear, I'm less concerned about formally categorizing people into these groups. I think more about how to help people follow the arrows to the right. In practice, most people I encounter are practitioners of some type, most often interested in getting the immediate job done. Regardless, this model is a really useful way to think about a few key questions. What do people need to learn? And how are they most likely to learn these things?
Meet Ash, Avery, and Quinn
We're going to investigate these questions by visiting a few colleagues of mine who generally represent each of these categories. Please meet Ash. Ash is trained in software engineering and currently works in data management and software development in a research lab, writing software for scientists to use for research purposes. Ash is writing a package named Borker. Borker creates maps indicating density of dogs being walked in certain areas of a city at certain times. Ash hopes that Borker will be used for both public dog health tracking, as well as integration with the dog social media network, Snoot Bork.
However, Ash hasn't worked much with maps lately, and Ash needs to know the best approaches in R to achieve their goals. What Ash does is go and asks on Twitter and some local Slack communities for their recommendations. They are able to narrow down the options available based on what they and their friends understand about the project requirements. Ash goes ahead and references the relevant documentation for each option, and given their breadth of knowledge, Ash can quickly identify the relevant functions and how to connect them together to implement mapping in the context of Borker.
One of Ash's colleagues is named Avery. Avery is a graduate student with some formal coursework in data science, but is still learning the best methods for optimizing workflows in R. Avery is developing documentation for a group of summer interns and wants to streamline the process of creating projects with them.
Avery goes and asks Ash, who recommends the package, use this. Avery is really excited, goes and takes a look at the main documentation page, and is really intimidated. They haven't worked with a package like this before, and they also haven't worked much with Git or GitHub, which is one of the really awesome features involved in use this. They especially aren't comfortable in their ability to use this package and help other people use the tool as well.
So Avery comes to my office hours and asks for help. I point them towards a project from our colleagues' lab that actually applies use this. This particular project is a template that includes working with GitHub, starting an analysis, and explaining why their particular approach works for data analysis projects. The explanation Avery needed didn't just include the code to run, but an example of how all the pieces would fit together in a final project.
Avery is really excited about this breakthrough and uses the package to support their summer intern, Quinn. Quinn is an undergraduate in biology who has recently started learning computational methods. Quinn thought my intro R class was quite intuitive and is eager to apply their skills to their summer project.
However, they ended up encountering the same problem I showed at the beginning of this talk, errors associated with read underscore CSV. Now Quinn knows how to use the help function to share information about it. And then they encounter what is, in my opinion, one of the great contradictions of computing. The simpler a function seems, the more complicated the help documentation appears, especially to someone new to coding.
Quinn's reaction is similar to the general response I get from a novice learner's first glance at this help documentation, a bit of abject horror, thinking, what have I done? However, Quinn goes and talks to Avery, who helps them think about the basics of R coding, the structure of their data, and how their project is organized, ultimately helping them understand how to piece together that information to import their data.
Prioritizing support and documentation
These of course are the problems with which we all grapple on a nearly daily basis. How do we prioritize how to exert our energy, especially when we have different goals with competing interests, and it's not clear whether or not the effort we're exerting is going to make a huge difference in the long run? These questions include things like, do I write a vignette or add a new feature to the package I'm developing? Do I keep trying to get this function to work for my analysis, or do I give up and try something else? How do I even find the help I need to address this key issue?
As it turns out, even small differences in how support materials are developed can make the difference between someone struggling and wrestling with implementing a new approach or being able to take the ball and run with it. This is because, in my experience, learning specific technical skills is easy. However, once you've learned how a technical skill generally operates, being able to apply that to your own problem is much more difficult. The usefulness of training and documentation depends on both what information is provided and how it's delivered.
Now that we have an idea of the problem we face, how do we actually meet these needs? So far, I've made some generally vague references to training materials and documentation. I think it's worth highlighting what some of these options include. And if you'd like more information on detailed descriptions of types of materials, head over to the resources I've included in RStudio's collection of conference materials.
First, manuals. We have a baseline expectation in the R community that a standardized user manual documenting features of a package is included. This particular documentation provides basic information like how does a function work and what options are available for working with the function. In my experience, manuals are most useful for expert users or people who have some familiarity with how the particular type of function can work.
Another type of material is vignettes. Vignettes answer questions like how do functions work together to answer general questions. These tend to be more like demos or tutorials for implementing code, which ends up making them much more accessible for a broad variety of users. A third type of support material are example projects or code. These demonstrate what an applicable, authentic project incorporating a tool looks like. And these end up being very crucial, especially for practitioners. Fourth, formally structured, in-depth short courses are generally in highest demand for novices and often something that most people are familiar with. They would like someone to tell them what the answers are.
Now, each of these different approaches requires a different amount of effort to both find, if you're wanting to learn from it, or develop, if you are creating it. And sometimes these resources do blend together. They're also generally supplemented by conversations with colleagues, which I've found is one of the most valuable ways to fill in knowledge gaps.
Now that we've considered how types of support vary based on levels of expertise, let's return to Ash. Ash is excited that their package is nearing completion, so they share the manual with Avery and Quinn. Avery has trouble understanding how some of the Borker functions work, but learns a lot in the process of trying to figure it out. Quinn doesn't understand any of it, including that Ash actually wrote the package, and then Quinn tries to Google it to find more information. They come across search results for a Chinese death metal band by the same name, and Ash ultimately regrets not Googling the name before deciding on it for their package.
At this point, Ash has a choice, continue the package development to integrate with Snoot Bork, or make it clearer what the package is doing right now. Ash decides to ask Avery to collaborate on explaining how functions fit together more clearly in a vignette. They are supported by another of Ash's colleagues, who have developed some code using Borker, which Ash is happy to reference in the documentation for the package. Quinn keeps attending meetings and learns a lot about the process of code collaboration, and eventually even clarifies a few instructions in the manual.
Guidance without a leash
This story from my colleagues is one example of the process of developing documentation. What does it mean in broader practice, though? There are countless online classes, tutorials, demos, and other resources that can help you learn R. These are approaches that walk us through the process of some coding tasks, essentially leading you by a leash. Given that availability, why do people keep coming into my classroom?
What I've learned really makes or breaks progression to higher levels of expertise is access to information that helps fill in the biggest gaps in our knowledge. My job as a trainer is to provide guidance that fills in those gaps, which allows learners to keep moving in the same direction, but without a leash, so that they have guidance, but more importantly, autonomy.
My job as a trainer is to provide guidance that fills in those gaps, which allows learners to keep moving in the same direction, but without a leash, so that they have guidance, but more importantly, autonomy.
So while I'm developing support materials, I consider identifying what knowledge is assumed in the materials I'm developing, as well as collecting curated examples of code that apply methods commonly desired in my community. In fact, one of the most effective things we can do is share high-quality training materials with the people for whom they are most suited.
The ecosystem of tools available in the R community really reflects the interests of the people comprising that community. With so many tools to maintain and develop, is it worth developing and maintaining support materials, too? Well, we share our code so other people can benefit from our hard work, and to increase the impact of our efforts. If sharing our code results in a multiplier effect for our effort, then making it easier for people to use the code multiplies it yet again.
At a time when disparities in the world feel more stark than ever, access to information makes a huge difference in promoting equity for developing these skills. These values are why we support open science projects, and that means considering how even our small actions can continue to uphold these values. In many cases, we don't even have to create these materials ourselves. Knowing what resources exist and helping raise their visibility can have a huge impact, too.
Advice for every level
Regardless of what level of expertise you think best describes you, please consider the following. For those of us beginning to code in R, it's likely we'll encounter a piece of documentation we don't understand. That's okay, and it's not our fault. That information probably isn't written to be accessible for people with our type of expertise. There's almost certainly a better resource out there, or materials that can help us bridge that gap.
For those of us who are practitioners, worrying that, even if our code works, it just isn't good enough, it's okay. Every expert coder has been where we are now. There's also no requirement that we keep learning more advanced skills, but the community is here to help us, and it'll probably be very satisfying to learn.
Finally, those of us who are experts developing R packages, unsure how to make the information accessible to a broad audience, it's okay. This is a great opportunity for us to collaborate with someone with a different type of expertise. These relationships can help us share our tools more broadly while supporting other community members in advancing their own skills.
For all of us struggling to communicate with people who possess different knowledge and skills, remember this is something everyone confronts. Given that we're all continually learning things, and yes, also forgetting things periodically, our own knowledge and that of those around us is constantly changing too, making for a sometimes rapidly moving target. One of the best things we can do for each other is understand that we can't expect a resource to be one size fits all, and that helping someone else learn is a way of supporting the entire community, as well as yourself in the future.
One of the best things we can do for each other is understand that we can't expect a resource to be one size fits all, and that helping someone else learn is a way of supporting the entire community, as well as yourself in the future.
I learned about the basic framework of levels of expertise from the Carpentries. This nonprofit group teaches reproducible data and computing skills, including R, to communities worldwide and have a number of resources related to training diverse audiences and technical skills. My current work at Fred Hutch builds on methods and materials from the Carpentries. I've adapted their workshop content to directly apply to the research community I serve. We otherwise apply the concepts I've discussed here to training and community development.
I'm also proud to be advising MetaDocencia, a group focused on a different type of targeted community development, supporting educational practices for Spanish-speaking communities, and this includes teaching technical skills. Each of these groups facilitate training for diverse audiences and demonstrate how even small changes can mean a lot for individual learners in the R community.
If you'd like more information about the ideas I've mentioned here, check out the resources I've included in RStudio's collection of conference materials. If you'd like to talk about computational training and customizing R resources, follow me on Twitter. I can also recommend my Twitter account if you'd like to see pictures of my favorite coworker, Loki. He wouldn't consider himself to be even a novice R coder, but he does listen very intently to all of my thoughts about training and community building.
Thanks for watching, and for helping support a bigger, more inclusive global R community.
