Resources

You Can Lead a Horse to Water...Changing the Data Science Culture for Veterinary Scientists

Presented by Jill MacKay A retrospective look at supporting data science skills in a research-focussed veterinary school This is a talk about environment management, but not in the way you're thinking. In many industries, domain-specific experts need enough understanding of data science to support their work and communicate with data scientists, but often have insufficient training in these skills, and limited time with which to obtain data science skills and practice them. This is particularly challenging for those who are interdisciplinary and have limited control over their workload, such as medics and field scientists. In this talk, an educational scientist describes the previous 10 years of supporting veterinary scientists to adopt open science practices surrounding data science. What worked, what failed miserably, and reflections on why it can be so hard to get a horse to drink. Materials: - https://github.com/jillymackay/positconf2023_vetdata - https://jillymackay.com/post/positconf2023/ Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Teaching data science. Session Code: TALK-1095

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

My name is Jill Mackay and these are my reflections from working for the last 10 years in veterinary science. This is actually a talk about environment management but when I'm talking about environment I want you to think about culture, that is the unwritten rules that a group of people agree upon when they work in close proximity or exist in close proximity to one another.

The central question for this talk is how do we make the vet school environment that I work in compatible with the PositConf environment. Not the same, you guys at PositConf are not going to be vets and my vets are not going to be fully data science practitioners but they should be able to talk to one another a little bit more than they currently do.

I'm not going to spend a huge amount of time telling you guys what the PositConf environment is like, you're there, I'm here in Edinburgh, but I think we do share some key ideas about good data science practice, its importance and we probably share some ideas about what it looks like.

The Edinburgh vet school environment

Let me tell you a little bit about the Edinburgh vet environment. We are a very old vet school, we are called the Royal Dick School of Veterinary Studies, we have the word dick in our title and we just own it, we're that old, we're that respected, we are the dicks. We also have with us the Roslin Institute, you may know them from their work with Dolly the sheep, from their work with COVID over the last couple of years and they have some of the best bioinformaticians in the world. I'm not talking about teaching data science to these guys, they have forgotten more than I would ever learn.

Well we also have our vets, these are both our students who have to be day one competent vets when they graduate, but also our clinicians who are amazing in the field of one world health, the field of animal health, but they are very time poor and at Edinburgh we have both international impact but we also believe it's important that we make local impact as well. This might seem really complicated and specific in terms of environments but I suspect it's really really similar to any of your workplace environments. We have subject specific experts who are really really good at something that they do, who need to know enough data science to talk to, for example, our bioinformaticians who are really really good at their data science part of their role and not everybody has the same amount of time available to them to do data science, although it is creeping into everybody's role more and more.

I'm going to do three things in this talk, I'm going to talk about how we've changed our formal data science teaching, I'm going to talk about how we've changed workplace culture and I'm going to talk about how we're trying to change some of our operational practice. At the end of this talk I'm really hoping there's going to be a long uncomfortable silence and then somebody's going to have to clear their throat and say we're really sorry Jill we've actually solved this already, it's all here on this lovely GitHub repository. That's perfect, Posit will have paid for Edinburgh to get a consulting session and I get a lovely big tick in my performance management review at the end of the year.

I suspect we're going to have a little bit more of a discussion about some of the barriers to institutional change that occur particularly within the data science context.

Formal data science teaching

We teach a huge number of students. We teach public engagement with massive open online courses, we're very big on open educational resources. Our undergraduates, we have our Bachelors of Veterinary Medicine and Surgery which takes on about 200 students a year-ish. We also have our Bachelors of Science in Agriculture and I just think it's important to highlight that we also have a Bachelors of Science in Veterinary Science which is what's called an exit degree. So if somebody enrols into the veterinary degree, gets to the clinical stage and they realise they don't want to do it, they can still exit recognising their learning.

While we don't offer that as a degree to enrol on, it is a degree we offer and I think it's a really important one actually. And that also includes the research project if they want to graduate with the BSc Honours, so there could be our teaching in there for example.

Our postgraduate taught programmes are many and varied. We have a host of online programmes, distance learning programmes which we've been doing for many years and they all sort of serve a professional development kind of purpose. Often our students are those who are already working in the sector and are looking to upskill particularly their research because we are a research intensive university. We also have our postgraduate research, that's a little different from the North American context. Our postgrad research students are not enrolled in any classes, they sort of learn by doing the practice of research embedded within a research group.

Within our undergraduate, so for the last 10 years we have seen a lot of curriculum overload with our vets, so we don't have the opportunity to completely teach them or change how we teach their data science. We do however offer optional R teaching within their research projects if the student and their host research group wants to go down that path. For our agricultural BSc's which we started in 2016, we do do all of our stats and data teaching in R from first year and that was really important to us. We recruited a lot of staff for that very purpose.

Currently our ag BSc's are no longer recruiting, partly because of Covid and that's meant that we've lost those staff members who had that expertise in R teaching. Hold on to that, this is going to become a common narrative.

Within our taught programmes, as we were expanding our postgraduate taught programmes, we realised that we really needed to provide more research stats support, so we created a new role called a stats guru and they were pretty influential in trying to encourage every new programme and every new research course to do their teaching in R at a distance. I'm sure you're all familiar with some of the challenges that that can bring. We did try to support the campus with this, with our R at RDSVS resource, say that three times fast and part of our issue here is that our initial stats guru has since left us and our new stats guru is not quite such an R aficionado.

So you may think we taught R forever more, but the big problems we're running into is we need skilled people to teach R and they keep leaving. We don't have the staff time for those who remain on programme to take up that R teaching because they can't themselves learn R quickly enough. We also have a recurring issue, which our programmes feed back to us, that student satisfaction suffers when they teach R and I think this is really important when we're talking about those students who are doing this for professional development.

Changing workplace culture: the Data Methods Club

We realised that we kind of wanted to look at how we were talking about R in our culture and one of our very skilled people, Ian Handel, created something called the Data Methods Club. If you've worked in any kind of university environment, a lab environment, you'll know how common it is to get an email saying I need more of this particular reagent to work at my experiment. Data Methods Club aims to make it as easy to ask for help with an analysis as it is to ask for an extra reagent.

Data Methods Club aims to make it as easy to ask for help with an analysis as it is to ask for an extra reagent.

We have a range of approaches. We use a book club, which is more for slightly more comfortable R members, for example. We use parties, which are protected analysis in R time, yay, and these were really effective. This was booking out a room, inviting people to come to the room to do their analysis away from their emails, away from interruptions to their office, and we would always have some people who were pretty comfortable in R in the room just floating around ready to troubleshoot if needed. This was hugely, hugely successful. We also run occasional sessions and there's our aforementioned email help. If you take one thing away from this talk, take away the recurse center social rules. We explicitly encourage all of our Data Methods Club members to follow them and it helps to make a really constructive learning environment.

So we supported R forever more on campus. Well, not really. The global pandemic really impacted how we ran our Data Methods Club and staff time meant that we do struggle to make this run. It was supposed to be self-organizing, but it just hasn't really worked out that way. It does require somebody to organize it. Ian retired and is now building a house and sailing on his boat, and I tried to take some of this over, but I then went on match leave. So this is a recurring challenge that we have, this staff time problem.

Operational practice and quality assurance

Something that I wouldn't have thought of 10 years ago, but has become more and more clear to me is that we need better data science in our operational practice. As a vet school, we have a lot of quality assurance. We have to guarantee that those vet students we were talking about earlier are what we call day one competent vets. So we need to know how we're teaching them. Are we teaching them effectively? Are they happy with our teaching? All of these sorts of things.

I, for my sins, I'm the vet school's new director of quality assurance and enhancement, and we're really interested in what data we can use to better inform our teaching. Teaching data is often managed by teaching organizations, and it is highly confidential. It cannot be shared in any capacity. So this, to me, is a really obvious place where we can use workflows to better support how we share practice. In Scotland, our quality assurance is what's called enhancement led. That means that we are all meant to share best practice and to just really reduce the workload of everybody else.

This is starting to slide away from veterinary science and starting to talk more about the UK higher education sector as a whole. But something we are really trying to do, for example, is using Shiny apps to show the data processing that we have got and that other institutions can use our Shiny app to analyze, for example, their student survey data that they have got from the same place that we've got.

Reflections and key issues

So where are we in veterinary science with data science? Well, remind me in 10 years. I don't know. I don't know if I'm going to be here in 10 years, considering we keep losing skilled people. But my two key points that I have here is that we do keep losing skilled people and we do need more time to support staff so that they can teach data science practice.

If we were to think about this as a Git repository, these are the issues that I would log. I would say that we're losing skilled people too often, that package updates far too often. Well, this is not actually a bug. This is a feature of working in a university. We are supposed to nurture talent and let that talent go off and do wonderful things.

One of the things that we need to do better, and I would encourage anyone teaching data science to think really carefully about this, is to utilize extensive documentation and make it as easy as possible for somebody to take your course and run with it. I think Glasgow does this really, really well with their SciTeacher resource. Go and check that out. I want to steal it and use it for worlds. And I think we're doing it reasonably well with our quality assurance work.

The second issue, and this is one that comes up again and again when I'm talking to staff about whether or not they want to bring in more data science teaching, is that anything to do with R or coding is really complicated and time consuming and the students hate it. I think we need to make use of the many, many tools available. There's so much wonderful stuff going on at this conference and we need to use it. I personally am completely putting aside the tidyverse versus base R for teaching argument. I think we need to use it as easy as possible for those people who are time poor. I want to highlight here our Global Academy of Agriculture and Food Security does a really good short course on data science for farmers. And just take a look at how basic that is. And this is aimed at the public and it is really, really good. But this is the kind of level of data science that we're actually talking about when we are bringing in these postgraduate students. So making it as easy and accessible as possible is so important.

My final issue that I'm going to log is I don't actually think this is currently possible in neoliberal higher education. Maybe this is a little bit controversial. I can't see the room. But I think this is actually a replication of the capitalism issue. If you're familiar with UK higher education, we are in a bit of crisis at the moment. We have been on strike since 2018, off and on. And we have a lot of issues with staff leaving, with staff being burnt out. And the traditional benefits of working in academia in the UK are being eroded. And a lot of very, very good people are going somewhere else because of it.

I don't know that there's an easy fix for this. And I don't know that anybody in this room necessarily has the power to change this. But it would be entirely remiss of me to talk about my reflections in the last 10 years without saying that UK academia is not currently a very friendly place for this. I am able to be here. I am able to do this because I have been incredibly lucky in my career to always have secure employment. Not everybody in academia has this. And without secure employment, people are not going to be able to have the time to develop these skills to a level where they can then teach them to others.

I hope I'm preaching to the choir there, but I do think it's something that I have to explicitly say, if anyone's ever going to look at this talk again, this needs time and it needs support from a management level.

I hope I'm preaching to the choir there, but I do think it's something that I have to explicitly say, if anyone's ever going to look at this talk again, this needs time and it needs support from a management level.

OK, so now it's time for Indra's free consult time. Remember, I want that long, uncomfortable silence. And then I want somebody to tell me where you've already solved all of these issues. And thank you very much for listening.

Q&A

Thank you, Jill. I wish you could see we are very we're very happy with this talk. You did a great job. Thank you so much. We did have a question on Slido here. We had one that said just who leads those party sessions that you described? Are they student led or is this something else?

So, no, unfortunately, our parties are mainly led by me at the moment. We have really passionate PhD students who are really good at this sort of thing, but they rotate in and out of this so quickly. And this parties are mainly aimed, mainly aimed at staff, supporting staff to get better and to do their analyses there. We do offer it to our postgraduate PhD students. They are always they're just kind of treated as staff. Our MRes students, our master's residential students, they kind of tend to come in. We have had the odd master of science on a taught programme come in, but not too many because they tend to be more on campus based. And so many of our MSc students are distance learning.

So, yeah, it's it's mainly me, to be honest, I've actually just set up another further three today and it's me and anybody else who likes to hang around. And I think this is one of the big things is this isn't kind of really part of my role. This is just something I'm interested in. And again, without giving somebody like me the space to do something like this, you're just not going to upskill a group of people.

And I see there's another question on Slido, which is any tips to persuade academics who have been using paid commercial software for a long time to switch to R?

For me, the really obvious argument is reproducibility. And I really like R Markdown and Quarto for those kind of things. I've been trying to get people into doing sort of package development to sort of show how that can support a workflow. And we have a big divide in the vet school between those who do really intensive data science and those who for whom data science is just a very small part of their role. And it's really, really difficult to get those for whom data science is just a small part of their role to move away from where they started way back when.

This is something I would love to talk about in a huge amount of depth, and I will try to pull myself back. I think a lot of this is actually how we teach science, because we teach science as this immutable thing. And I think we need to think more critically about the underlying philosophies of science and show how important it is to adapt as we go. My biggest tip is to lead with reproducibility and the time saving of that. But to be honest with you, a lot of academics are still at the stage of like color coding an Excel spreadsheet to demonstrate who coded what data or something like that. And that's a problem. That is a fundamental knowledge problem that we just need to keep talking about it over and over and over again with these communications. So not a great answer, I'm sorry.

Thank you so much, Jill, we really appreciate you calling in.