Biogen Data Jam Team | Data Science Hangout
We were recently joined by the Data Jam team at Biogen to chat about running an engaging data competition, connecting people across the organization of different skill sets, and generating enterprise-wide "buzz" about data and analytics. Their recent Data Jam had 260 participants across 22 teams! Featured Leaders: Anuja Das is a Principal Analyst within the Technology, Strategy, and Innovation group at Biogen. She enjoys understanding and contributing to cross-functional technology strategies and using automation-focused techniques to create efficiencies in the teams' ways of working. Dan Boisvert is Director of Technology Strategy and Innovation at Biogen. He is interested in the intersection between process, data and technology and how they can all be leveraged to improve human health. Danielle Cloutier is the Associate Director of Decision Analytics at Biogen. She is passionate about modeling, analyzing & visualizing data to generate insights to help shape the portfolio and ultimately change patients’ lives for the better. _____________ ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!) Thanks for hanging out with us!
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
With that, thank you all so much for taking the time out of your day to join us today. And welcome to the Data Science Hangout. If we have never met before, I'm Rachel Dempsey, I'm the host of our hangout here and I lead our customer marketing at Posit.
If it's your first time joining, this is our open space to chat about data science leadership, questions you're facing and getting to hear about what's going on in the world of data across all different industries. And so we're here every Thursday at the same time, same place. So if you're watching this recording on YouTube later, and maybe want to join us live, there'll be a link below where you can add it to your calendar.
At the hangouts, we're all dedicated to making this a welcoming environment for everyone. We love to hear from everybody no matter your years of experience, your titles, your industry or even the languages that you work in. There's always three ways you can jump in and ask questions or provide your own perspective on topics too. So you can jump in by raising your hand on Zoom. You can put questions in Zoom chat and just put a little star next to it if it's something you want me to read out loud instead. And then lastly, we have a Slido link where you can ask questions anonymously too.
Introducing the Biogen guests
With that, I am so excited to be joined by a few different co-hosts this week. We have three people joining us today from Biogen. So I'm so happy to be joined by Dan, Danielle and Anuja from the Technology Strategy and Innovation Group at Biogen. And I first connected with Dan, I think on LinkedIn, when he reached out about a data competition they were running. And I thought we could just start with some introductions on your side.
My name is Dan Boisvert. I work at Biogen, which is a biopharmaceutical company based in Cambridge, but we're worldwide. I've been at Biogen for 12 years, which makes me sound old. I do a lot of technology strategy. I do a lot of data strategy. I really think about our clinical trial data and how we can best leverage that both for our business, but also just for humanity for us to develop better medicine, for us to understand disease better.
Hi, everyone. My name is Anuja Das. I also work at Biogen, have been there about two and a half years. I have worked over the past few years on a variety of projects, primarily within technology. And I contribute typically from a project management and strategy contribution point of view, but I've also been able to delve into the technical aspects a little bit. My favorite part of my job so far is kind of bringing people together who speak different languages and have a variety of skillsets to work towards the same kind of end target and solution. So the data jam was like a large scale embodiment of that, and it was fabulous to see.
I'm an avid disc golfer. And I also like to bake a lot.
Hi, everyone. I'm Danielle Cloutier. And I've been at Biogen for almost 11 and a half years. I've had many different roles at Biogen. But my most recent one is decision analytics, specifically portfolio and program modeling analytics to drive decision making, strategic decision making. And something I like to do outside of work was I love, obviously love to travel, specifically eat different foods. But more recently, I did take up learning how to golf.
The origin of the data jam
So we have a couple of different community of practices at Biogen. One of them that Anuja and I lead is the data and analytics and really the whole goal was just to get people across Biogen that are passionate about data and analytics together, just to learn sort of the different ways that we work across the different groups, find areas of opportunity and collaboration as well as network.
And really the way the data jam came about is things have been pretty interesting at work. And so we wanted to try to find a way to have a little bit of fun, try to get people together. So we decided why not try to throw together a data jam and something that we thought would just be something small and fun, turned out to be something actually very big and really showed the passion that people at Biogen has for data.
Danielle comes to me, Danielle and Anuja come to me in one meeting and say, hey, I think we should like do it like a data competition. And you know, the day you send it out, I was hoping myself 30 people would show up. I think in the first hour, 50 people signed up and then we just kind of went into panic mode of how we're going to organize this.
I think in the first hour, 50 people signed up and then we just kind of went into panic mode of how we're going to organize this.
I think a lot of people like have ideas for things like this, or just like ideas in general that they'll bring to meetings, but you all actually made it happen. How did you go from just an idea for something to making it this real life event?
I think actually executing to the point of like, getting a data set and sending out a form for signups was fairly straightforward. We had a couple of ideas of data sets that we could use. And so we landed on one and then just kind of decided to go for it. Where the challenge came in was when we saw the rate of signups because people, we realized we knew on some level, but we really saw that people love data and are passionate about it and want to work on a hackathon of sorts or a data jam of sorts.
So when we saw how many people there were, we kind of decided let's do two rounds instead of one. And so it was part of the excitement, but it also was part of the challenge of figuring out how to accommodate the growing number of participants.
I think part of it too was that we actually had support from our leadership to actually go forth and do this. And I think part of it too was the fact that there has been a lot of eyes just on data in general for many different reasons. And you know, I might clarify what Danielle says, because it's not like we were given a task to go run a data jam. At some point we just had to make a decision. There's a little bit of a leap of faith there. I think you just kind of, you know, what's the worst that can happen is no one signs up and we just go back to our day jobs. What's the best thing that can happen is, you know, a whole bunch of people sign up.
Team structure and randomization
We decided to randomize it. And so we had people sign up and part of the signup was asking what their kind of capabilities were. So is it that they've never really coded before? Are they like experts or gurus? Do they fall somewhere in the middle? So we kind of defined four different levels of competency and had people describe or kind of self identify within one of those four. And then what we tried to do is when we did randomize teams, we tried to kind of spread the capabilities out.
Honestly, most people fell in the middle, but we didn't want to put all the experts together, but also didn't want to have a team that was all novices. And so the data jam, of course it was about data, but another kind of primary element was for people to meet each other. So this was an opportunity to network outside as well.
The teams, because we don't know any of the people, but Biogen's a big company, but the teams end up kind of wild, you know, multiple time zones, people in like Buenos Aires and China and Switzerland and US. And they gave them like three weeks to go analyze some data. They don't know each other. They're working wildly different departments, you know, some in manufacturing, some in sales, you know, some in R&D. I feel like you get a lot of benefit when people are forced to problem solve together because that creates like a little bit of conflict. And when you get that conflict, then you get people's different opinions coming out and the different viewpoints, then you build on something together.
What teams made and how the event unfolded
So we actually started with an internal data set. It was a survey that was done a few years ago, specifically around data maturity at Biogen. And so we had a data set that was kind of easy, somewhat easy to use and understand. And then the way we set it up was we gave the teams a few weeks to analyze the data set in whatever fashion or whatever type of tool they wanted to use. And the idea was to generate insights. There were a few key questions that we had asked, but other than that, it was pretty much open. And then present it not in PowerPoint. So the challenge here was to not use PowerPoint and to be creative.
And so what we did was we had a few rounds of sort of battle rounds, like kind of posed it as like a Eurovision, the voice type of, you know, head to head, knockout battle rounds where we had groups of, I think it was five or six teams. And then they would vote for the one team to move forward to the finale. So at the end we had, I think 22 teams down to five teams.
After that we had like the big finale where it was something that we did in one of our auditoriums and we had all the teams come present. We were actually able to get our executive, some executive committee members to come and do some openings and some closing. And we had got our CEO there. It was a really big event where we then had people present in a Shark Tank style pitch about the data sets and sort of what they would recommend to have better data stewardship and different types of strategies with data at Biogen. And then we crowned a winner at the end, we had some trophies and medals.
Leadership buy-in and participation
Did the competition happen during work hours? And if so, how did you pitch the idea to leadership to get them on board?
We didn't ask. We threw it out there and said, you know, we just threw a data set out there and said, look, find some insights. We left it really broad. People came back and said, well, how much time should I spend on this? And of course, data scientists could go nuts on something like this. So we tried to like, you know, look, you only have three total weeks to work on this thing. For the question that was just posed, we didn't ask leadership. We asked people to volunteer, and just kind of try to keep it manageable.
I think part of it too was, I mean, there was support for this. I think, you know, we've been going through a lot of changes and again, there was a lot of focus on data. So part of this too was a culture piece. And I think the recognition that this is a vital part of, you know, working at a company is the culture and the experience. So I think a lot of that also allowed for this to happen and not really need permission per se.
Data governance and stewardship insights
You know, my role here is around data stewardship. I mean, people talk about data governance, but personally, everyone hates data governance across the board, executives, everyone, but so I rebranded it into data stewardship, which I think evokes a little bit more of we're all in this together and we do our own stewardship of the information.
You know, so like what we heard was probably not surprising to a lot of people, but data is hard to find. It's hard to get access to data. And then we had a lot of great recommendations on how we could improve that.
I would say 80 to 90% of the teams, which were incredibly cross-functional came to the same conclusion, which was that data is hard to find and access. And so it's not a problem that's kind of isolated to a single group. It's pretty, pretty functional, cross-functional, which I thought was fascinating. It was expected, but it was still like very interesting to actually see it.
I would say 80 to 90% of the teams, which were incredibly cross-functional came to the same conclusion, which was that data is hard to find and access. And so it's not a problem that's kind of isolated to a single group.
I know there was a lot around kind of metadata driven cataloging, ensuring that, you know, your data isn't all over the place. There was a lot of conversation about the fact that, you know, copying data is something that happens on occasion and is obviously not best practice. So I think the big ones that we saw were essentially a catalog to understand what data even exists. So that we know what's there and then find a way for people who need to access it to be able to access it. And the other piece really was actually having the enterprise wide strategy. I think that was one of the key things that was identified was just needing some sort of consistent across enterprise and not just specific to each of the different individual functions.
One of the ones that resonated with me was, you know, not to kind of create like a partial catalog or something like that, where as soon as you create it, it's out of date and then end up being another stale data asset that you have. So really trying to weigh, like Danielle was saying, to enterprise it or productionalize it and then find ways to make sure that it's sustainable long-term, because I feel like that's where we always end up with a hiccup of like, how do we get it going, but then how do we actually keep it going?
Technology and tools used
From a technology perspective, how did teams interact on the data jam? Did they use common tools?
There was a variety, and I think it, the randomization of teams kind of spoke to what tools ended up being used. So for instance, what ended up happening was there was a kind of data communication person, a person in charge of creating the output, a person who was more of the programmer, and then the person who was kind of describing what the approach would be to actually solving the problem. So it really depended on what programming tools that people within that specific team were most comfortable with, which kind of made it more exciting because we had people working in R, people working in SAS, people working in Power BI, amongst others.
And so folks did rely a lot on what we had in-house just because it's kind of, you know, sensitive data. So we didn't want to kind of put it on external tools, but there was still quite a variety of what was used to produce the end results and the solutions.
Participant feedback and formats
So we did actually do a post-event survey. A lot of it was just understanding, you know, thoughts and feedback on the format. Would they participate in something like this again? I think at the end of the day, there was a lot of really positive feedback on the actual data jam and wanting to do it again. And there are just obviously some logistical things that we had to deal with that we could always make better for next time.
I think one of the challenges was the different geographic location. I think that was tough trying to find, you know, teams that were kind of spread across when they could actually meet because of the different time zones. So there are a lot of things that we have sort of on our list to think about for Data Jam 2.0.
There was also feedback around the type of data set that we could potentially use for upcoming data jams. So for instance, for this one, we did surveys and some people love analyzing survey data. Some people despise it. So there was some feedback there around, you know, what else can we do moving forward and what data sets could we use? A lot of people had offered like, hey, we're going to do this again. Can you please use our data? So that was pretty cool, too.
Yeah, I like that idea of forcing them not to use PowerPoint because I definitely find PowerPoint presentations, I definitely learn less from PowerPoint presentations than any other format, I think, that exists. But I'm curious, like, what other formats people ended up using, you know, instead of PowerPoint.
Some people use the Power BI dashboards directly and went through that. Others use some forms of, like, Canva or animations. There was one really cool group that did sort of these animations to tell their story. So a lot of Power BI, Canva, animations. There were a couple of Shiny applications.
So when you put this question out there, you never know what you're going to get, right? It has to be a presentation under five minutes, I think. It has to explain, like, these three things. I mean, we had people, like, dressing up in costumes. We had people do, like, a set of, like, trivia questions. And then we definitely had some animation, like, to tell the whole story. People walking through the Power BI dashboard, people walking through Shiny apps, stuff like that. And I think people had fun with it, even if they didn't have, like, a big, you know, epiphany of data science in this. But they were just able to present something, because we know that you need a lot of skills for data science, right? And so part of the storytelling and everything is part of it, too, getting the message across.
I would say the funnier, the better, personally, for this one. But I think it really depends on what message you're trying to convey. For, like, the big, broad 300-person audience, I think you want to go big and go creative.
I feel like it was less about the output, like one type of output being more effective than the other, but more so about how you communicated what you were trying to get at, how you communicated your story. So I feel like I saw effective presentations across all formats.
And I was just going to say, we're trying to make it as inclusive as possible. So it's not, you know, only for PhD statisticians, of which we have a lot here. It's like, really, you know, it could be a novice, have no idea anything about data, but you'd be maybe more creative for the presentation side. So, you know, a lot of, like, cross-pollination with different people and really keeping it open, use whatever you want, do whatever you want.
Outcomes and follow-through
I know a lot of people reached out to some of the people who were either Shiny developers or Power BI to get consults on how to do stuff. Our CEO reached out to us and asked, how do we fix this whole data problem that we have at Biogen that he just heard for two hours during the data jam?
And one of the metadata catalogs that someone recommended, they called it DASH. But it was like an idea of just a metadata catalog. And that's being taken up by our computational biology in research. And they're going to try to push that forward next year.
I think you definitely hear of some interactions that never would have taken place before. And I think we run these two community of practices, which I think are kind of like this, where you get people who are passionate about something together. We have one around data stewardship that I run and one about Biogen analytics and data community of practice, which they call BADCOP, which Danielle and Anuja run. But we've definitely had more people volunteer for leadership in there. And we're getting their voices in to continue this conversation.
I mean, I can say, too, that I've gotten a lot of feedback on, oh, thank you so much for having this. I was able to connect with this person. And I was able to solve whatever issue or whatever problem where I was able to get extra help on this. So there has been a lot of connections made that then help somebody in their current role or responsibility.
Responding to the CEO's question
So your CEO asked you, how do you fix the whole data problem? I'm just curious, what did you say or what are you planning on getting back to him with an answer for that question?
We were kind of like, oh, OK, yeah, we'll come back to you. We thought about it for a long time. And I think we've seen a lot of these initiatives come and go and a lot of hype about, you know, fix the world through data or fix the entire business through data. And the idea, everyone wants the answer to this. OK, well, the answer is it's complicated. But I think the answer that we really need to do is where do we want to be as an organization with data? And we need to start from there and then go down through like exactly how do we want to improve our business and then try to connect our data scientists, our data innovators into those real business decisions or business questions that are there.
So that's my five-minute Shark Tank pitch on what you need to do. You need to align the business so you get them to state their position on data. Because otherwise, people like us, we always just like we do our initiatives, we try to do our things, and it all hits some glass ceiling at some point, which we're all too familiar with.
I think, you know, one of the things that we had recommended, and so we had worked on putting sort of this proposal together, but I think the first thing was really just actually having an enterprise-wide data strategy goal. And it's, you know, something on as a goal for our metrics to at least have some traction against at that enterprise-wide level.
Sharing outputs and planning Data Jam 2.0
We did a couple of things at the end to essentially make people's solutions accessible to all of Biogen. So just for some context, the finale following all the battle rounds was open invite to everyone. And so one thing we did was collect links to everyone's Power BI's and dashboards and Shiny applications and kind of final findings and put them in an accessible location for everyone to view and see. Because again, because it was a five minute Shark Tank pitch, we couldn't necessarily delve into the details. And so this gave people the opportunity to look into how people did the analysis and what the more detailed findings were.
We would love to hold another data jam because this one has been so successful. We haven't necessarily planned for one yet, but we're, I think, still hoping to do one. What has happened is I think Danielle mentioned earlier is people have been coming to us with the data sets that have the potential to spin off another data jam. And so we're kind of playing with the idea of potentially doing one that's a little more maybe hardcore data science for specifically folks who want to dive into the more intense data science techniques on a specific data set. And so instead of doing a big data jam like we did before, maybe in the interim, we do a mini data jam of sorts.
Marketing and making it fun
The marketing is key. So Sam, who's not here today, but she's our marketing expert. And I do think marketing makes a big, big difference. You know, I think when we sent out, well, we called it a data jam. It sounds kind of fun. We had a lot of like fun things, like we had like a logo with like a piece of jam on it, you know, toast with like jam on it. I know it is my jam is the theme. I know when we had the finale, the music was jamming by Bob Marley. We're coming in, people were walking in.
And I think when we sent it out, I know we had like a one page flyer to describe this. It had very little detail, you know, it said like, you know, work with your colleagues. And we really try to say what's in it for them. What did they get out of it? Instead of like really explaining what we're doing, we're just kind of saying, you know, what's in there for you. Get to meet your colleagues, get to work on something interesting.
Oh, don't forget the seven C's of jamming. Collaboration, creativity, collaboration. There's a bunch of C's and the last one we just jokingly called for the seventh C fun. We couldn't come up with a seventh one. But yeah, I think it was fun for our team too. Because like all of us, all four of us love data and love getting people together. So we got really kind of creative and crazy with the way that we were pitching it too. And so yeah, the logo, for instance, we did a couple of stickers. We had kind of swag set up in the office as much as we could and tried to make it fun.
Everyday work and broader initiatives
A big one that my team is working on is reuse of clinical trial data. So clinical trial data is personal information, obviously. But it's like protected personal information. It's very, very valuable, right? I mean, we know, you know, we study diseases that don't have cures. We study diseases that don't have medicines. We're trying to find better ways to treat the disease, trying to find better ways to measure the disease. And that's all, you know, held together in this clinical trial data that we run.
So a big thing that I'm working on right now is how do we best leverage this? It is immensely complicated. There's many privacy regulations, many GCP, good clinical practice regulations around this. But I have this feeling that we need to find a way to use the data up to its maximum potential or maximum allowability and no further. But if we just say, hey, it's complicated, I'm not touching that, I feel like we're like losing a big piece of medicinal development.
So for me specifically, a lot of what I do is really focused on understanding sort of the business risk for some of our portfolio and programs. So really understanding what our portfolio looks like, what are the decisions that we're trying to make to whether we progress programs or we terminate programs. So a lot of it has to do with really modeling and understanding our current portfolio, the different decision points, the investment points. What's the risk? What's the value? And ultimately coming up with insights for leadership to then decide what our portfolio should look like.
I am helping Dan with part of his secondary use project. But I think my biggest project is around automation for our end to end clinical reporting and analytics space. And so I help to manage, but also contribute to the strategy of understanding, you know, what can be automated and also what should be automated, because there is a lot that can be automated, but we want to make sure that, you know, the resourcing that we put into it is worth the impact that it will eventually drive. And so part of that is, you know, playing with the fact that, you know, people work in SAS, but how do we bring in additional languages? How do we bring in more additional analytics techniques? When is it worth doing that? And what is the best way logistically to also bring that in? So it started off as like a smaller scope of work, but it's kind of ballooned out to be this, you know, massive initiative that has been very fun to work on.
AI and chat GPT usage
Was there any application of AI or chat GPT or large language models in this?
So we got a lot of questions of, can I take this data and copy and paste it into chat GPT? Which the answer is no, just FYI. If you work at a big company, that answer is no, but we have an internal instance of chat GPT that's secure. And people did use that. I think, I forget how many people took the survey, 300 something. And people wrote comments in there. And so people were doing like sentiment analysis of the comments and trying to pick out some insights that were in the comments, which was fun.
I mean, that's tough, right? Because you have an initial estimate of, yeah, I'll be able to develop the macros or the scripts needed to be able to automate this the next amount of time. But then there's other pieces that we need to factor in around, you know, how long is it going to take to adopt? How long is it going to take to scale? And then there's a learning curve associated with it as well. So as more people become aware of the automation initiative, it's automatically leading to more ideas flowing in. And so we're really having to put a structure in place to try and gauge from the very beginning how much resourcing is going to be used in order to make this.
Community of practice and training
Yeah, I can speak for the community of practice that we run. So we try to have forums. But we're trying to have it, you know, at least every other month. And so there's a series that we have called data bites. And basically, what it is, is either bringing somebody from it or somebody from the community to talk about the work that they're doing. And more recently, we had one that was actually somebody that participated in data jam and wanted to leverage the data jam network to look at their data and help generate insights.
And I know we've also talked about doing some more collective meetings across the different communities. So kind of gathering all the different communities together on a more regular basis, did bring a lot of those communities together. And so there are definitely plans to start to continue that. It's been a little bit hectic at Biogen lately. So we haven't been able to do as much as we wanted to. But I think now that things are somewhat settling, we are definitely looking at sort of the next few key events.
A lot of it really is on-the-job training. So it's really just once you, you know, working with the team, working with your manager. But at the same time too, I know for some of what we hire for it, we try to sort of be a little bit diverse. So a lot of times there are some technical aspects that we are looking for. But a lot of the times for anything really data related in my world, I look for people that are just genuinely curious and want to solve problems. So I feel like as long as you have those two skills and that you're willing to learn, that's pretty much how we get things done.
Yeah, I think in the data science groups, we hire a lot of more senior people. There's not like a lot of entry-level roles there. But I think maybe you're also hinting at like a data literacy kind of effort of how does every single person in the entire organization know not to present a pie chart? And that's something that we need to look at. And I think these community of practices that we have are the best way that we can really do it. It's organic. It's leaning on people already wanting to learn and then like kind of helping them get up to the next level with what they already want to learn.
I know kind of speaking to the communities of practice, but there are some like ours are a little more general, but there are specialized ones. So there's a machine learning group that dives into that a little bit more. And we're about to start a data science conference, which will bring in kind of speakers and such for people to learn a little bit more there.
Well, thank you all so much for joining us. It's really awesome to hear how well the data jam went and how many people were interested in the event too. There's been a number of people who've reached out to me around holding their own sort of data competitions. And I'm so excited to be able to share this as a resource with them too. Thank you so much for having us. This was fun. You know, if anyone has any questions, please feel free to reach out. You can find us through this or on LinkedIn for sure.