Natalia Andriychuk @ Pfizer | Hosting an *internal* data science hangout | Data Science Hangout
We were joined by Natalia Andriychuk, Statistical Data Scientist at Pfizer to discuss running an internal Data Science Hangout, building a supportive R community, discovering and learning new R packages/tools, and contributing to open-source initiatives. Resources shared: Rachael's guide for hosting your own data science hangout: https://docs.google.com/document/d/1uKgLRacVyUJA1dH4f675dKP1gnbaw--QztxwIIiLAAA/edit?usp=sharing RTP useR Group: https://www.meetup.com/rtp-r-user-group/ Data Science Hangout LinkedIn Group: https://www.linkedin.com/groups/12610075/ Shiny app submitted to FDA Pilot: https://www.r-consortium.org/blog/2022/12/07/update-successful-r-based-package-submission-with-shiny-component-to-fda 56:41 - What's something that you've learned from that process of starting your own internal data science hangout? Summarized from below: 1. Start with a list of 10 or so people you’d like to invite 2. Ask colleagues you have an established relationship with to be your first featured leaders 3. Create a calendar of events that you can invite people to 4. Use a form to collect the speaker's bio, photo, etc. 5. Schedule a 30 minute call with each featured leader the week before 6. Try to plan out 3 sessions in advance 7. Start inviting people :) Natalia’s response: It was a little bit overwhelming to think about how to start it but one thing you would need to start is a list of 10-15 people who you'd like to invite. First I would create the calendar of events. At Pfizer, we do our data science hangouts every other week. I specifically put the dates and send people an invitation asking them to be a part of hangout. I actually started the hangouts with Mike Smith and Douglas Robinson, who are part of our core team and are moving this initiative forward. For your first few hangouts, it can be easier to have colleagues join us as your co-host that you already have an established rapport with. I set up a Microsoft Form that I would send out: asking about: occupation, bio, certain questions they want me to ask, etc. I also ask them to provide their headshot. I think it makes the hangout invitation a little more colorful. For the featured leaders, I schedule a 30 min call with each guest the week prior to go over any questions that person might have if they never attended before. It helps me to know more about what they're doing, especially if this is the first time I'm having a 1:1 with this colleague. I'm pretty new to the company as well, so we would chat a little bit about their interests - and then we have the hangout session. I'll try to have at least 3 people booked in advance. So find your featured leader, and then start inviting people. ____ Having an internal hangout has been very valuable because we are able to talk more about certain topics within our organization. For example, if I come to this hangout and we discuss certain topics, sometimes I might have to say, “Well, this is a great question, but I can't talk about it because the information is proprietary.” With your internal hangout, you do not have to worry about that. This is how certain problems get solved faster. It also brings people together that wouldn't work together necessarily. I think this is a very valuable part of it. ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!)
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the data science hangout. Hope everybody's having a great week. I'm Rachel. I lead our pro community at Posit. I am based in Boston and I have some bad allergies right now. So if I sound different, that's why.
The data science hangout is our open space to chat about data science leadership questions you're facing and getting to hear about what's going on in the world of data across all different industries. And it happens every Thursday at 12 Eastern time, same time, same place. So if you're watching this recording on YouTube later, the link to join us live next time will be in the details below.
At the hangouts, we are dedicated to making this a welcoming space for everybody. So we love to hear from you no matter your level of experience or area of work or industry, it's also totally okay to just listen in if you want. You can also join in the conversation and jump in and ask questions by raising your hand on zoom. You could also put questions in the zoom chat and always feel free to put a star next to it if you want me to read that question instead of calling on you to, to jump in the conversation. And then third, we also have a Slido link where you can ask questions anonymously.
Um, but with all of that, uh, thank you again for joining us today. I am so excited to be joined by Natalia Andriychuk, a statistical data scientist in the R Center of Excellence at Pfizer, and Natalia is a long time data science hangout attendee too. So, so excited to have you here. We'd love to have you kick us off with just introducing yourself and sharing a little bit about your role and also something you like to do outside of work.
Thank you. Yeah. Thank you, Rachel. And thank you for having me. You know, it's a, it's very interesting when you join hangouts as the guest and then you, um, as, as the person who's just like, you know, coming as a guest and then you come to listen to other people speak, it's just kind of a change of roles and it's very exciting to me to be here today, so, yeah, I'm Natalia Andriychuk, I'm a statistical data scientist at R Center of Excellence at Pfizer, I joined Pfizer a year ago.
Previously to that, I worked in CRO for, for like seven years. It's a contract research organization and I was in variety of roles and this was the company where I actually transitioned from more like a project management type of role to application programmer and then became a part of the data science team there and now that I joined Pfizer, a part of the SWOT team. So there's R Center of Excellence that has a, our core team and our SWOT team. So the SWOT team stands for, I really like the, the acronym, it's statistical workflows and analytic tools and we primarily use R for everything we do and we're developing training, we're providing support for business lines on various projects and projects can vary in scope and support will provide the only common denominators like they all written in R and we, significant part of our job is to provide technical expertise and also promote best practices for the projects that people are already developing and also community building is a big part as well.
And outside of my work, last year, I discovered paddleboarding and I really loved it and I live in North Carolina, so it's pretty hot in here now. So I think we can already start a season for paddleboarding this year. So I'm excited to do that with my friends.
The SWOT team and community building at Pfizer
Well, so excited to have you here with us today. And, and that was, I mean, as we wait for questions to come in from everybody, I thought it might be helpful to learn a little bit more about the SWOT team at Pfizer and how many people are on that team and kind of how you came about creating it too.
So I wasn't the one who came up with the idea. I went, I came later as, you know, on boarded as a SWOT team member and Mike here on the call, you can, you can see him Smith MK name. So if, if I say something wrong, please, Mike, feel free to jump in and correct me.
So basically before I joined Pfizer, there was an initiative to streamline the process of, you know, bring together the, our users, because when I joined Pfizer, I think it was about, it was shy to like 1000 people on the Microsoft teams channel and Pfizer who were either already knew R or learning R and, you know, create projects in R and try to learn themselves or find colleagues to learn together. And there were a lot of projects who were going on, which were going on in R. And the idea was to develop the team to help to bring all those people together to streamline the training, training part of it, and also provide support to business lines. So how to approach certain projects and answer certain questions.
Also, one piece of it is, I heard Mike was saying on the call the other time that they were guardians of like production R tools. So we have the professional suite of tools at Pfizer, Posit Suite, Workbench and Package Manager. And then you can also use the local version of R and people were a little bit confused about that and we're still providing that support. And yeah, so streamlining the learning aspect of it as well.
So we do have some pockets of organizations that use Python. My impression from what I've seen is maybe they're less users or they're less engaged in building their own community. But the user channel that we have is like R and Pfizer user channels dedicated for, we also provide a lot of documentation. We have like a confluence part as well. It's mostly dedicated to R users. But anyone with Python experience can join and help us out.
I've been really impressed to see all the different aspects of community across the SWAT team as well. And like all these different events and how you connect with people across the company. And I know that's something that I guess happens for a lot of, like a lot of big companies. They may have like pockets of users in different areas, but it's hard to really like get everybody together and start sharing that knowledge across the team. And I'd love to just hear your feedback on how you've been able to do that or some things that have been helpful for you in getting people together.
Yeah, that's a very good question. And so we have the SWAT team is like 100% dedicated consultancy group. And then we have the R core team and people in that team. They can be pharmacokineticians, stat programmers, biopsychicians, but they also help with this initiative as well. And then within the core team, we have multiple initiatives, which is like internal outreach, external outreach, advances in R, communication, education. So we have all those different streams that are targeting different levels of engagement within R.
So for example, within, let's say internal outreach. So within internal outreach, we have the hangout sessions similar to, well, very similar to what we're doing right now, but within Pfizer. And then we have the book club. And then we have the community of practice events. It's every event that happens every month where people can come in, colleagues can come and present their various projects. And it also varies in the expertise level. Certain presentations kind of very suitable for beginners level. And then others are more complex and advanced.
So targeting this, you know, approaching R and learning and bringing people together from all the different perspectives kind of helps because even if colleagues can come to hangouts this week, maybe they can attend community of practice. And at the community of practice, they would hear about hangouts and who's the guest next time. Maybe they will have time to do that next time. And then the book club is also the initiative that is very, I think we have around 140 people now all around the world, like in Philippines and Japan and U.S. and Mexico. So this was a big initiative as well.
And then we have the external outreach where we track people who are part of different external working groups. And we're trying to identify the gaps in the representation. So it's if I try to answer your questions, question kind of like in short, it's targeting from kind of different sources and different angles, the community aspect of it.
Internal data science hangouts
It's so cool to hear that you have the internal data science hangouts as well. I believe you are the first company to ever do that that I know of. But it was giving me some ideas that maybe I should put some documentation together for how to host your own internal data science hangout too.
Yeah. And I really want to acknowledge your help with it because what happened is we reached out to you asking if you could share your expertise, your knowledge, your experience, because coming to the hangout and it's totally different piece from facilitating it. And I know it's probably so much going on in the background to actually putting this whole thing together. So you're very responsive when I actually reached out and we set up the meeting and ask a few questions.
Having the hangout, I think, within the organization is very valuable because of the added aspect of actually talking about certain things within the organization. So, for example, if I come to this hangout and we discuss certain questions, I get certain questions, we discuss certain topics. Sometimes I might say, well, this is a great question, but I can't talk about it because the information is proprietary, like I can't really share it, which if you do the hangout within your organization, you don't have to worry about that. All people work kind of on the same project or on different projects, but you can share the information together. And this is how certain problems get solved faster or issues people are running into. And then it also brings people together that wouldn't work together necessarily. And I think this is a very valuable part of it.
And this is how certain problems get solved faster or issues people are running into. And then it also brings people together that wouldn't work together necessarily.
R for submissions and pharma initiatives
Yeah, this is, I guess, the question that every company, every pharma company is asking themselves. So, I'll start, I kind of take a step back. I know there are a lot of great initiatives happening in the pharma space and specifically within the R submission working group within our consortium. And I think we have Eric here from that group, Eric Nance. And then my co-worker, Sam, he's also part of one of the pilots or a couple of the pilots. And we're certainly, as a company, probably are thinking about it.
But I think if I'm talking about my opinion is that this is where industry is probably moving towards. And this is definitely something that company is going to pilot. And I think the idea that the companies are actually coming together and working together open source within working groups kind of talks a lot about where the industry is moving and that everybody are interested in it. And eventually, this might come to this point.
So we do have some pockets of organizations that use Python. Yeah, so we do have one of the teams within the organization, they're actually leading the Admiral Vaccine Extension Package. This is part of the Pharmaverse initiative. And then I'm not necessarily involved into the like a study teams into study teams. So there might be like a harder question for me to answer.
So yes, is the short answer, Travis. But there's pockets throughout the organization using R. So global supply is one, pharmaceutical sciences and, you know, not just the kind of GXP stuff. So good lab, good manufacturing practice, good clinical practice. But all kinds of dashboards are being built for all kinds of uses. R has been used for quite a long time in the clinical pharmacology and pharmacometrics space for predicting PK concentrations and PK profiles for drugs and different dosage administrations. Those tend to be used kind of internally. So for internal teams to address questions of what do we do next or for reviewing data as it comes in.
But, you know, those are not the things that we submit to regulators, you know, at this point. But yeah, there's a, at the R in Pharma conference two years ago, I did a quick look-see and it was over 1500 people had downloaded R. So, you know, that's across the whole organization. And, you know, I can't tell you what they're using it for.
Leadership support and return on investment
I was wondering if leadership has been supportive. It sounds like they have been, or if there's been any encouragement for people, our users, you know, not necessarily the SWAT team people to spend a portion of their time doing these community events, like time that's kind of taken away from them doing their heads down work. I know that having leadership support to say, hey, spend 10% of your time or 20% of your time doing that stuff that's important is always helpful. So I was wondering how you guys had approached talking with leadership and getting kind of buy-in from them.
So I consider myself a lucky one because all the heavy lifting about talking to leadership is done for me by my managers. But I can say that from where I sit, I see the leadership support because even in the last semester, there were certain goals set towards having our training and certain people, certain parts of the organization were encouraged to like start programming, were encouraged to take the R training and redevelop the so-called mini projects. And hopefully we can open those to the world one day. It's basically little projects that show you how to work with the data, create plots, manipulate data, wrangle all the sorts of things, create functions. And then we encourage all the programmers start working on the mini projects. So I think that shows that the leadership is interested in investing people, colleagues learning R and using them within their job, day-to-day work.
And I think it's kind of an anticipated question as well. And in terms of, I just wanted to add, I think the idea that we have a SWAT team, a dedicated team that was hired also shows leadership support because we do have those FTEs. And in terms of how many people are in our team right now, so the SWAT team is six people. Yep, six people for right now.
So for starters, for the projects that are starting for bringing community together, I think this is the perfect size. The way the organization is going to build up on it, it's kind of under the discussion. How to show the return. So basically, regarding the showing the return on investment, we are setting up certain goals on every semester that we are moving towards. And then we are trying to show the leadership that it's worth it.
And just sorry, Rachel, I was thinking maybe it would be easier to me to talk about the example, because I really want to come back to this like return on investment question and stuff. And how do we measure the success? So one of the examples could be one project we did last year, I think. We developed like the Shiny framework. It's basically a process where the project team can specify and refine content within the interactive environment of like maybe Shiny application and then capture certain settings and output and in the production environment. And then this was the framework.
And right now, after we published it last year, certain teams kind of picked it up and went with it and implemented within their projects. And now it helped not one, not two, but maybe more teams to actually facilitate their projects and make the workflow more reproducible. So this is how I guess it works in this setting right now, where we can set up certain tools or maybe moving forward, develop the packages that then can go within the organization within Pfizer and project teams can pick it up and actually use it within their production environments.
Diversity of speakers and topics
So I think some of the initiatives I mentioned, they're kind of helpful with it. So within Hangouts, I can ask my coworkers, since I'm pretty new to the organization, and this is a huge organization. So I can go to my managers or coworkers who have been with the company for a longer time and ask them about the names. Who do you think should come to the next Hangout? The what experiences we want to share, because we might have people from set programming coming, people from digital coming, and they all have very different experiences, but they all use R. So the experience of using R within digital would be totally different from set programming world.
And this helps to also diversify the audience, because certain people would join the Hangout to support their coworker, which is happening to me right now. And thanks for all of you for joining, who are my coworkers.
And then the community of practice meetings that we hold every month, they're also helpful. I think at some point when we started establishing these events, we were kind of asking certain people, knowing that these colleagues who would be eager to share, that they're in front lines of our community building. And then after that, when this initiative has started, it brought more people who actually expressed interest into sharing, or certain presentations inspired them to actually go back to their team and say, hey, we need to share something, or we're doing this great thing, we need to share with the world. So the community of practice, when they present, we have two 20-minute presentation talks, and we try to diversify more like beginner level, and more expert level, or more advanced level talks.
And talks can also be not only technical, but around best practices, like mishaps, stuff like this, because it's all internal. You can share a lot of stuff internally.
Yeah, that nugget right there, I just want to emphasize again, it's not just the speakers themselves, but it's the range of topics too. One of the biggest hurdles I've had is people thinking that, oh, I have to have this perfect to talk about it. No, no, no, no. Sometimes my biggest mishaps end up leading to an innovative solution, but you don't get the full story just by looking at the end product, getting the process there, and being, I hate to say vulnerable, but being comfortable with sharing those stumbling blocks.
Yeah, this is actually a very important point, because in my mind, it's kind of a GitHub mentality. So you're committing your work, even if it's not perfect, and then you're pushing it when it's perfect. So it's the same. You don't have to wait till the last moment to maybe present maybe an external presentation, but definitely not internally. And we had the presentations where people were presenting the work that maybe wasn't 100% finished, and they were asking for the feedback. And this led to further collaboration with other folks from maybe other pockets or the SWAT team, pockets of organization or SWAT team.
Yeah, this is actually a very important point, because in my mind, it's kind of a GitHub mentality. So you're committing your work, even if it's not perfect, and then you're pushing it when it's perfect.
Yeah, I just want to kind of chime in there to say that we've had a huge broad range, as Natalia is saying. And the presentations I love are from people that say things like, I just discovered R in the last year, and it's awesome. And this is the things that I find really nice to use, because that's kind of opening the door and saying, look, I'm not an expert, but here's what I found, which is fantastic. But the other great thing is to showcase the variety of uses of R throughout the organization. We had a Hangout session with someone from the IT security world, R user for years and years and years, and yet still kind of curious about new things like Quarto and dashboards. But he's dealing with things like, what was it, his API using R had been hit millions of times per day. And it's like, oh, that's next level compared to clinical trial reporting, right?
Yeah, and we have certain people who kind of met. So for example, I'm leading one of the groups in the book club, and I have pretty experienced programmers in my team as well. And we're just, by reading the chapter, something comes up like data formats or how to deal with a big amount of data and stuff. And we just start sharing the packages, the workflows, and even certain people who have been working with R for a while, they might not use certain tools that people who maybe attended certain sessions heard about certain tools, then the sharing process becomes inevitable and leads to greater collaboration as well.
Spreading the word and marketing resources
And I was sort of taken aback. Like, oh, my God, how have you not heard about all these resources we have at the company for, you know, not just the Posit professional products, but also like what R can do in 2023. And it dawned on me that, like, I don't, I still struggle with, like, finding these pockets of individuals or organizations in other parts of the company to educate them on, like, modern R usage. So it's kind of an open-ended question, but do you have any tips on, like, how to sort of spread the word to these types of people? Do you have maybe a landing page or something or a Quarto page that your team has kind of opened up for the entire company or a confluence page?
Yeah, we have, and this is a very good question because I'm asking myself that a lot, maybe once a month. So we have a lot of resources. We do have a confluence page, again, the Teams channel. We do have the associated, like, Microsoft Teams website that is associated with it where you can come, and there are tons of links that are cross-linked between the confluence and this website and then the Teams. For certain announcements, we're tagging the whole Teams channel so that people receive the notification. There are multiple channels within the whole channel. There are multiple extremes, questions and answers, general, R and pharma. So a lot of resources there.
One thing that I think is a little bit hard is that sometimes you can work hard on actually creating those resources, and this is what we're doing. We created so much documentation since last year on various different topics. So the rule of thumb, if somebody's coming to us with a question and we answer this question, we write a confluence article about it, especially if it may be helpful for other colleagues within the organization. But I think the hard part is also marketing because the resources can be all there, available at any time, no passwords required, but people just kind of maybe choose to ignore it, not because they want to, it's because there's so many other things happening. They're working on the projects. The deadlines are approaching.
I even had one colleague who asked me how to opt out of notifications. He's like, you're sending too many. How can I opt out? And it's kind of a situation where then the same person can later come and ask why he or she didn't see the announcement about the meeting that they were interested in. So I think there is a very thin balance of where you should notify people about upcoming events, but maybe don't overdo it because then it becomes a white noise and they just start ignoring all the notification and everything.
What we chose to do is that during some of the events, like for example, in the community of practice events, we would make an announcement about hangouts and hangouts, we can make an announcement about book club or community of practice. So we're kind of cross-linking all the events and making announcements here and there because person can join one meeting, but then skip another one, but constantly reminding people about certain things that are going on are very helpful. And then we also discovered that maybe having easy access to certain resources, for example, if you have a Teams channel, make sure you pin certain messages or certain links in Teams so that it takes people one click, not two, but one click to get to the resources they need. And eventually, I think with a lot of training, this will become a pattern and people will start sharing the knowledge with other colleagues and they will start sharing those links with each other.
I was just going to add on to that, that after the user community meetings, we kind of save the slide decks and, you know, that becomes a showcase for what R can do. So that's one way of kind of having that, you know, internal place for people to see what R can be used for. The nice thing, again, because that's internal, you can be as specific as you like and you can, you know, talk about real kind of detail of the project you were working on and, you know, where to find resources, where to even find the code to make things happen that maybe you can't share externally.
About kind of influencing it upwards to let people know what R can be used for, I think it's, you need a handful of cases where you can demonstrate someone being super humanly efficient and that will get management's, you know, attention really quick. It's like you went from data to final result in how long, you know, that will garner attention like nobody's business. So, you know, that's where things like Markdown and Shiny and things like that are gateway drugs for R because managers will just look at that and go, that's amazing. How do you even think about doing that?
Book club and bridging learning to practice
Yeah, and this is one of the questions we also asked ourselves and I think this is where those mini-projects came in because the mini-projects actually focused on the data like SDTM or Adam data that we're using. So basically those concepts of plotting data, wrangling data, creating functions applied to something that they would do in their regular job, creating different listings and tables and specific plots that they would create.
But I guess there's also this component of reaching the gap where you attended the book club and maybe you haven't heard about the mini-project, so there's a lot of advertising going on in that space as well. And then since book club is joined by so many different people from different parts of organizations, they may even do different things. Like we have one person in my book club who is part of the real world data team and then the other person who is a part of Cisco programming team. So they pretty much face different challenges as well. But I think they're also supported by, some of them also supported by their own teams and given tasks within their own teams as well.
Pharmaverse and open-source contributions
Yeah, thank you, Nick, for this question. And I also want to let people know, I know Nick from RTP, our user group that we're creating together. We had our first meeting this week. We didn't, like the attendance was a little bit lower than I would have wanted, but it also was just the first meeting. So if you're in the RTP North Carolina Triangle Research Park area, go to your meetup page and search for our user group and join us next time.
Yeah, so pharma-verse, we are actually, as a SWAT team, we started looking at the pharma-verse packages to see what each of them is doing and where we can help and contribute. This is an ongoing discussion in some of us. And I know that, well, when I'm thinking about pharma-verse, I know it's like a subset of packages. And when I'm thinking about all packages that do clinical research or connected to clinical research, it's way bigger than, I guess, pharma-verse. But we do have some people contributing to the packages. So one of them, I want to mention Sam Parmar. I think he's on a call. So he already started contributing to one of the pharma-verse packages, LogRx. And I think he's doing some good progress there.
And I think Nick was also involved in that package as well. So packages that I already mentioned, the Admiral Vaccine Extension package, we're contributing to that one. And I know there are so many great packages that we potentially can contribute to. But we first need to kind of assess what we're dealing with and then create a plan on how to move forward, because there are a lot of open-source packages. We also looked into Teal. I don't think it's a part of the pharma-verse package, but it was used in our submissions working group. So yeah, there's a lot of packages to look at and figure out the plan on how to implement them internally and if there is a need.
Tips for starting your own internal hangout
Yeah, so that's a good question. It was a little bit overwhelming to think about it and how to start it, but then one thing you would need is the list of people who, at least like 10 or 15 from the start, and then you would, what I would do, I just like give my example, I would send, I would identify the person. First, I created the calendar of events I would think would make sense. We do hangouts every other week, so I specifically put the dates I would invite people to, and then I will send them the invitation asking to be a part of hangout.
I actually started hangouts with Mike, you know, and Douglas Robinson, he's also part of our core team who are moving initiative forward, so I thought it's a great start to start from the people who are actually like interested in also doing the hangouts. It will be first easy to hang out, especially maybe if you know colleagues or you work with certain colleagues that you already have a report established with, so do that and then invite people, and then I set up the Microsoft form that I would send out asking about, you know, regular information, occupation, bio, maybe certain questions they want me to ask, and then they would fill out the form, and I also ask them to provide their headshot. I think it makes the hangout invitation a little bit more colorful, and then what I still do now is that I schedule a 30-minute call with each guest like the week prior or something to go over any questions that person might have if they never attended the hangout or, you know, something that can come up, and then it actually helps me to know more about what they're doing, especially if this is the first time I'm actually having one-on-one with this colleague because sometimes I don't, you know, I'm pretty new to company as well, so we would chat a little bit about their interests, what interests them are and stuff, and then, yeah, then we have the hangout session, and then I'll try to have at least three people booked in advance.
That's awesome. I love the point about kind of meeting as a smaller group first so that when you got to the first hangout that you also had some people there to kind of help you with the conversation and get people talking too. Well, thank you so much, Natalia, for joining us today, and it's been a pleasure learning about how you're running these hangouts internally, and it's given me a lot of ideas and things we could share to help others do that, so thank you again for joining. If people do want to get in touch with you, is the best way LinkedIn or what do you prefer?
Yeah, I think LinkedIn and Twitter will both work. Awesome. Well, thank you all so much. Have a great rest of the day and hope to see you back next Thursday. Thank you so much. Thank you for having me and for all the questions.