Integrating video & data in sports analytics | Arielle Dror | Data Science Hangout
ADD THE DATA SCIENCE HANGOUT TO YOUR CALENDAR HERE: https://pos.it/dsh - All are welcome! We'd love to see you! We were recently joined by Arielle Dror, Director of Data and Analytics at Bay FC, a team in the US National Women's Soccer League (NWSL), to chat about working out loud, integrating video with data analysis, technology they use in sports analytics, and quantifying intangible player traits. Oh, and ICE CREAM! In this Hangout, Arielle discusses how she works to integrate data into Bay FC’s decision-making processes, including recruitment, tactical analysis, and game preparation. She uses tools like Quarto and Posit Connect to automate weekly match reports. To enhance understanding among non-technical staff, Ariel’s team also uses proprietary sports software (Sportscode) to build dashboards on top of timestamped game footage with specific events tagged in it. This allows end-users to click on specific data points, such as those related to chance creation, and immediately view the corresponding video play that demonstrates the data's meaning. This visual context is essential for translating data results to coaches. Resources mentioned in the video and zoom chat: R and AI Conference → https://rconsortium.github.io/RplusAI_website/ Bay FC Data Video on LinkedIn → https://www.linkedin.com/posts/wearebayfc_bayfc-activity-7390100562731712512-xKIY?utm_source=share&utm_medium=member_desktop&rcm=ACoAACSkp_4BRz9mhkQZvnAk0Wdehn749sDDYJY If you didn’t join live, one great discussion you missed from the zoom chat was about the "passion penalty". Attendees discussed whether working in a field you love, such as sports, typically comes with lower pay than other industries, especially given the high supply of passionate people who want to work in the space. Do you think the passion penalty exists? 🤔 ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us! Timestamps 00:00 Introduction 03:19 "What traits in players do you wish you were able to quantify?" 06:48 "How is the data science work implemented on the field?" 10:50 "How do you translate results to coaches?" 12:44 "What tools do you wish you had access to?" 18:27 "Tell us about how you began working out loud and what that led to." 23:26 "Why is it so hard to recruit for sports?" 24:31 "Do you feel like there's a passion penalty for working in the sports world?" 28:31 "Are you working with PyTorch, scikit-learn, or OpenCV computer vision?" 30:28 "Do you ever correct anyone and tell them it's football, not soccer? Is the same set of data available for everyone?" 34:08 "Were there any sporting myths you've been able to analyze, like home ground advantage?" 37:28 "What's the best way to break into sports data roles?" 40:11 "Are athletes willing to share personal health data from wearables?" 42:06 "Is there commercially viable technology for 3D modeling movements?" 43:43 "How do you handle scenarios where play on the field contradicts data predictions?" 45:33 "How did you pick the analytics stack you use?" 47:41 "Do you have any advice for new people learning data science skills?" 48:44 "Is there a danger players will optimize individual statistics over team performance?" 50:22 "Is there a 'Believe' sign in the locker room?" 50:59 "What's the most important thing you learned about communicating analytics to stakeholders?"
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey there, welcome to the Paws at Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.
Can't wait to see you there. I'm super excited to welcome our featured leader today, Arielle Dror, Director of Data and Analytics at Bay FC, which is a soccer team in the US National Women's Soccer League. Arielle, welcome. Thank you so much for being here.
Hi, Libby. Thanks for having me. I'm really looking forward to today.
Me too. Okay, I would love it if you could give a brief background about yourself. Tell us a little bit about what you do, and also something that you like to do for fun.
So I am currently the Director of Data and Analytics at Bay FC. We're one of the newer expansion teams in the NWSL, National Women's Soccer League. I work on the football ops side, where I work to integrate data into all of our decision-making processes, whether that be recruitment, tactical analysis, game prep, match review, salary cap analysis. If there's something that the sporting side is working on, I'm trying to integrate data to try and make our decision-making processes more informed.
And what I like to do in free time, well, I have a puppy. His name is Bingo. He takes up all my time these days. He's not here right now, unfortunately, otherwise I would have him make an appearance, but maybe I'll post him on Blue Sky after. I'm also an avid knitter, currently working on my second sweater of the season, and I also really enjoy swimming.
Quantifying intangible player traits
So we do have an anonymous one that says, what traits in players do you wish you were able to quantify, and do you think we'll ever be able to quantify traits like drive in athletes?
So I will say, the data that we have in soccer is slowly evolving, and it might be helpful to just give an overview of what we have available to us right now. So most of the data we have is event level data. So if you think of a soccer game, there's one player that is usually on the ball, and then there are every other player on the pitch is not on the ball. And so this event data only collects on-ball information, which means we are missing like 95% of what happens in a game. So we have things like passes, shots, defensive actions, things like that. We don't necessarily have off-ball runs, where people are in space, et cetera. And so most of what you can do in soccer is actually quite limited.
We're now getting to the point where we have that off-ball information, where you can actually quantify off-ball impact. So we're slowly getting to the point where we can quantify a lot of things. I do think that often in recruitment decisions, we start going into these personality level questions, and things like drive, or dog, or how fancy they are. I would love to be able to quantify those things to be able to actually give an informed discussion to actually drive what we're doing. So I might actually just take the example that you gave here, which is drive, or just tenacity. That's something that often is really talked about in our discussions that I really can't contribute to, and I have to try.
It's tough. I think that anybody who works with people data, data about people, qualitatively or quantitatively, will say that anything that's a personality trait is extremely hard, and the results are dubious. Humans are complex systems, and anything that humans participate in are complex systems. I don't think we can ask, based on how recruitment works, we can't give them psychological evaluations either.
Integrating data into tactical decisions
Yeah, that's a great question. So I'm really lucky at Bay. We have two video analysts that work under me who do a lot of the work in terms of integrating data into our tactical decisions and sort of in our weekly scouts. And one of them is particularly data driven. He worked with the U.S. Women's National Team for several years, won a World Cup with them. And so they had a lot of data folks on their team. So he's actually pretty data literate. And so he, whenever I build our weekly pre and post match reports, he will take a lot of that data and integrate it into our video scouts that go to the players and to the coaches. And he uses it to inform some of the tactical recommendations that he makes each week.
We will occasionally put like data in front of the players to sort of emphasize a point. We went through a period this year where we were dropping points where we shouldn't, like we're probably, the probability of us dropping the points we did was like less than 1%. So we put it in front of them to sort of emphasize the point that they were doing well. So we do it a lot. We do a lot of work like using data to increase the confidence of our players.
But we do use it, like I said, for tactical decision-making too. So a good example is we have a pre-match report that I have automated that goes out every week. And one of the plots that I show is like who is the top chance creators on the team based off of a lot of different things that we do. There were two players who nearly all of their chances were through these two players who were not forwards. And so we designed an entire tactical plan around keeping them off the ball. And they did not get on the ball. And then we were able to win the game. And we were able to also use data afterwards to check our process. So we were able to see that they actually did not have any chance involvements. So that was like probably the cleanest example we have of where data is used to inform the decision, to find the information, inform the decision, and then actually evaluate after whether it worked.
So that was like probably the cleanest example we have of where data is used to inform the decision, to find the information, inform the decision, and then actually evaluate after whether it worked.
Automated reports with Quarto and Posit Connect
Yeah, so it is we use Posit Connect. We love it. And it's just a Quarto document that is scheduled to run every Sunday. And then I should automate sending it also. I would love to. But I want to be able to check in first. So I download it. And also, something that I've learned, and I'm sure this is true in other industries is that the less technical you are, the more less likely you are to actually log on to the portal. So it's usually better to just download and send it to them manually. But our analysts will go onto the platform and look at it.
Yeah, be really open to questions. I think I was really lucky with this coaching staff, who's unfortunately moving on at the end of the season, they had never worked with a data person before. So they had no preconceived notions about what was possible, or really like a lot about a lot of concepts. So I was an open book, they were really eager to learn from me. Last year, I actually sat in the office with them. So just being able to integrate my thoughts and share my insights, and allow for conversation was a big, a big help for us. But yeah, it's just been a lot a long term process of conversation, teaching, informal moments, and just trusting that we both have the best interests in mind.
Integrating video with data using SportsCode
So in terms of using data and video together, there's a program that's used in sports called SportsCode. It's a proprietary software that's used specifically in sports, and it allows you to basically take a video feed and then tag it with specific events. Something that's really cool about SportsCode is that you can actually import your own, like, specially formatted profile, like a file that is timestamped to the video, and then build things on top of it that really just look like dashboards. They're called SportsCode windows. They really just look like dashboards, and so our analysts and I have worked together where all of the data that we care about is timestamped, and then I format it in the format of this SportsCode format that is necessary, and then he's able to actually build dashboards on top of it.
So, for example, we have one around chance creation, so he can click all the chance creations that are created, or all the chances that are created via through balls, and he can click on it, and then it'll pull all of this data that we have on all the through balls, and it'll give us, at the timestamp, he can watch the video along with all the data tidbits that I add at the bottom for information for the coaches. And at the beginning, I mean, I've talked a lot about collaborating with coaches, and I found out halfway through last year that when I talked with just pure data terms, they had no idea what I was talking about.
And so we were able to build these SportsCode dashboards, basically, where whenever they had the video, but they also had the data, and it worked incredibly. So, some of our reports don't have video, but the ones that our coaches use most often do, and then I was able to automate all of that, and so now, because of Posit Connect, now we have a Shiny app that downloads all these files, generates all these files, and our analysts can put them on our lovely SportsCode with us, which is great.
I appreciate it. I know it's easier said than done. It also helps that we have a really small org, right? There's 20 of us in the sporting organization, so I can actually have touch points with people. Scaling things is tough. It is really, really tough.
What the NWSL could use with more resources
More funds and also a league office that is able to do technological innovation. I think that's the big one. NFL has next-gen stats and has a whole internal data science team, and my understanding is that those products are available to teams.
There's a lot of really advanced data. I've talked hand-wavy about this tracking data, which is essentially sensor or GPS data. It's a lot harder to work with than event data, and most teams don't have the capacity to work with it to the scale that they would want to, and so having a league office that is able to actually process that data and give out-of-the-box metrics and maybe even some dashboards would go a long way, not just for us, but also for any team in the league. We probably have one of the most built-out analytics departments in the league, and even we are struggling to think about how we would approach that, so just having people at the league office who are able to process data is really a huge help.
Working out loud and breaking into sports analytics
So I got this entire job by just obnoxiously working out loud until this paid off for me. I went to Smith College in Northampton, Massachusetts, historically women's college, also one of the first schools to have a data science program, or at least a liberal arts college. So I didn't go in thinking I was going to be a data science major, ended up being a data science major.
So my senior year, I was taking a class, an advanced programming class in R with my professor, Ben Balmer. We had to build an R package, and I had started watching. It was right after the 2019 World Cup. I was feeling really inspired. I started watching the NWSL, where I now work, and I wanted to learn more about Soccer Through Data. I basically scraped the website, all their stat pages, put it into an R package, and then over winter break, I just put it on Twitter. I didn't really think it was going to pop off or anything. I just put it up there to see what would happen. I wanted people to use it. I wanted feedback. It went mildly viral, so that was really helpful.
From there, I was able to get some opportunities working for American Soccer Analysis, which is the main open-source soccer writing blog. I was writing for them and then doing some work for them behind the scenes, and then that turned into my first job. While I was doing that job, working for a multi-club organization, doing data analytics and some data engineering, I was also putting up post-match reports for NWSL every single weekend, just to keep my name out there.
They weren't really that great in the beginning, but I was getting lots of feedback from other people on Twitter who were using them. After doing that for two-ish years, the Bay Area team was announced, and I really, really wanted this job. I made sure everyone knew I wanted this job, and it paid off.
I do endorse working out loud. Even if it doesn't turn into a job, you make lots of really great friends, which is maybe even better. I've made so many great friends just through working out loud, and people who eventually became my co-workers, and people who I've traveled with, people who I use as professional sounding boards even now, now that I work alone again.
I do endorse working out loud. Even if it doesn't turn into a job, you make lots of really great friends, which is maybe even better.
Recruiting challenges and the passion penalty
Honestly, I think the pay is just not on the level of what we would expect for people with technical skills, and that's something we're trying to change our pay. Also, we often want people who have some level of sports experience. It's really hard to get that experience, so working out loud is really helpful. Those are probably the two big ones. I find the most talented people that I would really want to come work here are just not willing to work in sports, which is super fair. It's a really hard, it's definitely not your typical tech job or typical data job.
I would say definitely. Actually, let me backtrack. I majored in data science. I also majored in political science. I was really interested in doing data science for social good before this. I feel like every job I've had has paid a passion penalty to some extent. I would say it's true. It's definitely true. I think it's getting a lot better compared to where it was when I started. It's definitely gotten better, but there definitely is some level of passion penalty, especially because those jobs that you're comparing it to, the work-life balance is just not the same. You're usually working in a smaller team.
Something that I really love is that I get to see my work actually put into play every single day. I know I'm making an impact with the work I'm doing. There's also the fun stuff of working in a team. You don't have to be experienced all the time, but I travel with the team. I've gone to see the world, quite literally the world, and meet really interesting people, work on really novel ideas. Those things I don't think I necessarily would have had the opportunity to do in a different industry.
Technology and modeling in sports analytics
I would love to be doing really fun machine learning and cool novel work like that. Unfortunately, we have a lot more low-hanging fruit that we need to meet first before we can do really cool stuff. I actually don't do a ton of modeling in my day-to-day. It's a lot of data engineering and a lot of much more simple data analysis, data viz. One day, I would love to do something like that. I do also think we probably need to get a larger quantity of data or a video before we could feel comfortable doing something like that, but it sounds really cool. If someone does that publicly, tag me because I would love to see it.
Data availability and live data
In terms of live data, we get a little bit from the league from Opta, mostly just in the form of a dashboard where there's nothing we can really ingest, though that might be changing next year, fingers crossed. All the data we get is post-game. So we get stuff from our data providers ingested via API. We get more data than what is available publicly, but it's not from the league. So some teams in the league do not work with those data providers at all.
Home field advantage and sporting myths
So we don't do a ton around audience home field advantage, etc. Most of the stuff we use there is just stuff that's publicly available because I try and use as much as publicly available as possible because it's just me. There's been some stuff around different tactical decisions that have happened or in-game moments. I want to share them, but I feel like they're kind of competitive advantages, so I don't want to share them too much that have been interesting to prove me wrong. But I will say a lot of those types of questions, we use publicly available information. Just again, there's not a lot of time to do some of those analyses.
I know that this sounds like a fun thing for like fans to tackle. There are some questions that like if you are curious and if you think this is a fun thing and you can find data that is publicly available, it would be really fun. I'm sure there are some home team advantage analyses out there for some sports, right? I think that would be super fun.
Breaking into sports analytics
So, first of all, I'll say that a lot of clubs now also do have BI roles. So, that is totally an avenue that you can take. I do think like in terms of networking, working out loud, et cetera, there's a couple conferences that I've gone to that I've found particularly friendly. America's Soccer Insights Conference Summit. I can't remember the full name. In Houston, I went last year. It's only the second year. It was the most friendly conference, except for maybe Posit Conf that I've ever been to. And they especially, they like particularly plan things so that industry professionals are talking to people that are interested in breaking in through like lots of breakout sessions.
I've gone to a lot of like women's sports data type conferences. I've also found those to be really friendly. So, women in sports data, it didn't happen this year, but in the past, it's been super friendly and lovely. So, I highly recommend following people on Blue Sky. Like following people on Blue Sky is often how I did most of my networking and just talking to people. In terms of like projects that people do, I find like the stuff that is quite interactive and not just like a single report or article. It's the stuff that draws me to profiles. So, packages, dashboards, that sort of work is, to me, just much more compelling than like a single analysis.
Wearables and athlete performance data
Yeah, that's a great question. So, to answer your first part, we have a sports scientist on staff who handles all of our sports performance data. We're slowly getting to the point where we can collaborate together more, but she handles most of that day-to-day. All of our players wear GPS units in-game and in training for us to track things around sprints, accelerations, decelerations, heart rate, all that kind of stuff. And in some leagues, that's required. In other leagues, it's voluntary. It's all up to the CBA. I think in our league, it's required. I've never seen any player not wear one. So, that's how they collect most of their data. The players also do fill out daily wellness surveys around sleep and whatnot. I don't think we're taking any data from their personal wearables. I think that might be a privacy violation.
Motion capture and tracking technology
Yeah, so I will say I'm not familiar with any 3D motion capture through sensors at all, so I don't want to talk about something I don't know much about. It might exist. There's tons of 2D motion capture stuff. So there's obviously the GPS units we've talked about, and then there's also optical tracking systems where the cameras are set throughout the stadium, and they're collecting all the information from all the players and also the ball. And then what we use more often is broadcast tracking, which does the same thing but uses broadcast video. So it's basically using computer vision to track the players in the ball that are on camera. The challenge there is that in some leagues, the broadcast quality is not as good, and they have to use models to impute the off-ball players, which is a challenge for many reasons.
Handling false positives and data-video collaboration
Yeah, I mean there's plenty of moments where that can happen. So we always try and, like every half six months, we look at our reporting and see where we were right, where we were wrong, what we might need to add to capture things. Sometimes things happen in games that are not quantifiable in data, or at least not easily. And that's why, you know, going into this, I only wanted to use data to make all of our decisions, because that's sort of how I operated in my last job. And I still live in that camp a lot of time, but I've also learned that using video analysts and using their expertise in collaboration with the data is really like where our best work is done. So it's a lot of interrogation of where those things might diverge, fixing things, seeing where things might be breaking, and also accepting that people, like everyone's best skills work together to give us that answer that we're looking for.
Building the analytics stack from scratch
Someone asked how I picked the analytics stack, and I really like that question because I was in a pretty interesting situation when I joined Bay, where I joined in October, and our expansion draft where we picked players from existing teams was in December, and when I joined, we had absolutely nothing. We had no contracts signed except for one single data contract, and we were already making decisions on player transactions, which is really stressful. So I was tasked with deciding really early what that stack was going to look like, and so I did my best to try and find things that I knew were really flexible because I didn't really know how we were going to grow. So that's why we ended up with Snowflake, for example. I felt of all the warehouse solutions, that was the most flexible for what we wanted at the time.
At first, I just built R pipelines, which is still great. I've worked in full R stacks for a long time, but I built everything working off of my computer. There was no orchestration. Nothing was deployed, really, and so in the second year, I was finally able to build a more coherent, error-proof tech stack using Astronomer and Python and R and a lot of different things, but the point I wanted to make here is that I worked really hard to build a flexible tech stack because I didn't know how things were going to work and also was okay with having something really crappy at first with the knowledge that I was eventually going to go back and make something more robust. It's okay that things aren't perfect right away. Sometimes things just need to work. Sometimes you're already making decisions without any data and you need things to give you context.
Players optimizing for stats and communicating analytics
That is such a fun question. First of all, I do want to say we have some players that are super interested in data. One of the cool but also sad things about women's soccer is that most of our players will have to have jobs after they retire, and so we have players that have majored in computer science or math or are just interested in data, so that's been really cool to give them fun high-level intros on what we're doing. I'm not worried about them trying to optimize for stats because we use lots of things in our decision-making processes, to be honest. I'm more concerned about them — I mean, it's good that data is publicly accessible. I would love them to just use us as the source of truth for data because the reality is that we can contextualize it more, so that's my bigger concern is that they're just trying to get data information without context from other people.
Yeah, well I think, to me, the big thing is meeting people where they're at, right? Like, I spent the whole first six months of my job, or even more, building dashboards and reports with, and you know, I would talk to them and be like, do you understand this? And they would be like, yeah, I understand, but it turns out, like, I wasn't meeting them where they were at. I was looking at things with the total data lens. And once I found out that I had to use the video to contextualize what I was doing, they opened up this, like, communication that we didn't have before. So yeah, meeting people where they're at and just working with them slowly. Eventually, hopefully, they feel confident enough to tell you when things don't make sense to them, and I make it really clear to them, like, that I am a thought partner. Like, I'm not here to be, like, the, like, nerd in the corner that's telling them what to do. Like, I want everything we do to be really collaborative and that we build together, because they also have expertise, you know? I learn a lot from them just as they learn from me.
I am a thought partner. Like, I'm not here to be, like, the, like, nerd in the corner that's telling them what to do. Like, I want everything we do to be really collaborative and that we build together, because they also have expertise, you know?