Resources

Data driven decision making in pharma | Ning Leng @ Roche-Genentech & Jing Huang @ Veracyte

We were recently joined by both Ning Leng, Global Head, Data Sciences Acceleration at Roche-Genentech & Jing Huang, SVP of Bioinformatics and Data Science at Veracyte to chat about embedding modernized data science solutions in pharma to enable data driven decision making. Links mentioned in the chat: R Consortium R Submissions Pilot 4 documentation portal: https://rconsortium.github.io/submissions-pilot4/ Bay Area Biotech-Pharma Statistics Workshop: https://www.bbsw.org/ San Francisco Bay Area Chapter of American Statistical Association: https://sites.google.com/view/sfasa-org/home?authuser=1/ Dahshu Data Science Symposium: https://dahshu.wildapricot.org/ DS Hangout Survey/Feedback: https://docs.google.com/forms/d/e/1FAIpQLSdqXKPCPOwC7tSLygc43gf2ezkNO4bhM4TAgjUQMeoPe_GOmg/viewform Speaker Bios: Ning Leng: Ning Leng is the ad-interim Global Head of the Data Science Acceleration Enabling Platform, under Roche Product Development Data Sciences. Ning joined Roche-Genentech in 2016 as a statistician and worked on both early and late phase oncology development, with a special interest in utilizing diverse data sources and advanced methodologies to generate insights for personalized healthcare. Since then she has been driving a number of internal and cross-industry projects on modernizing Data Science solutions in pharma. Prior to joining Roche-Genentech, Ning obtained her PhD in Statistics from University of Wisconsin-Madison and worked at the Morgridge Institute for Research. Jing Huang: Jing Huang holds a B.A. in Statistics and Probability from Peking University and a Ph.D. in Statistics and M.S. in Epidemiology from Stanford University. With over 20 years in the biomedical field, her research focuses on statistical methodologies in clinical trial design, genomic analysis, and machine learning. As SVP of Bioinformatics & Data Science at Veracyte Inc., she oversees bioinformatics pipelines, algorithm development, and statistical analyses for product development. Jing has co-authored over 30 peer-reviewed articles, with more than ten thousand citations, and is a co-inventor on over 20 patent filings. She actively promotes data science through volunteer work, including as founding president of DahShu, 2024 president of BBSW, and chapter representative of the American Statistical Association San Francisco Bay Area Chapter (SFASA). In 2023, she was elected as a Fellow of the American Statistical Association to recognize her outstanding contributions to the medical research community in the field of statistics; for numerous statistical innovations in genomic tests; and for exemplary leadership and community service to the profession. ________________________ â–º Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software The Hangout is a gathering place for the whole data science community to chat about data science leadership and questions you're all facing that happens every Thursday at 12 ET. To join future data science hangouts, add to your calendar here: https://pos.it/dsh We'd love to have you join us in the conversation live! Thanks for hanging out with us!

Aug 7, 2024
58 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi everybody, welcome to the Data Science Hangout. I'm Rachel Dempsey. I lead Customer Marketing at Posit. This Hangout is our open space to hear what's going on in the world of data across different industries and connect with others facing similar things as you.

So we get together here every Thursday at the same time, same place. So if you're watching this as a recording and want to join us in the future, there's details to add it to your calendar below. And just make sure it adds it for 12 Eastern time so you can join us live. And I know from our Hangout survey, people really enjoy getting to connect with others here in this Hangout space. So if you are interested in connecting with others, I want to encourage people to say hello in the chat, maybe introduce yourself, your role or your base or if you want to share your LinkedIn, it's a nice way to make friends here in the Hangout.

We're all dedicated to keeping this the friendly and welcoming space that you all have made it and love hearing from you no matter your years of experience, titles, industry or languages that you work in. Something I sometimes forget to add here is if you're hiring, please feel free to share those roles in the chat as well. And also 100% OK if you just want to listen in today, although we love getting to hear from you live.

So there's three ways that you can ask questions or provide your own perspective. So first, you could raise your hand on Zoom and I will call on you to jump in. Second, you could put questions in the chat and just put a little asterisk next to it if you want me to read it. Maybe you're in a coffee shop or walking your dog or something. And then third, we have a Slido link, which I'm sure Curtis shared already, where you can ask questions anonymously as well.

So with that, I am so excited to introduce our two co-hosts today. So we have Jing Huang joining us, SVP of Bioinformatics and Data Science at Veracyte, as well as Ning Leng, Interim Global Head of Data Sciences Acceleration at Roche Genentech. And Ning and Jing, I'd love to have you both kick us off with having you introduce yourself a little bit and your role, but also something you do outside of work, too.

Introductions

Very nice to meet everybody. Hope you hear me okay. My name is Ning. I currently work at Roche Genentech. I'm based in South San Francisco. And I'm part of this huge data science organization under Roche Genentech. We have about 900 people in our organization working on clinical trial data, everything related to basically a patient, human-level data, clinical trial data, real-world data, biomarker data, et cetera. So currently, I'm the Interim Global Lead of a sub-team of this large organization called Data Science Acceleration. We are about like a 40-people team, and we are focusing on this transition to open-source software, transition to modernized solution infrastructure, et cetera.

And outside of work, like I like outdoor activities. Maybe one fun fact is that earlier this year, we did a Machu Picchu four-day, three-night hike, which was amazing.

Sure. So my name's Jing. So I work for Veracyte. And Veracyte, if you don't know, is a much smaller company compared to Roche, but we are a public company. We actually specialize in providing diagnostic insight using genomic data. And all of our models are machine learning models using genomic data as an input, and then give the critical information to the patient, physician, to better manage the patients. And we have about a dozen products on the market, fully regulated, and all that.

So as a senior VP, I actually deal with data almost like everywhere about data all the time. Biostat clinical studies, and as I mentioned, we have our machine learning model are the key core engine of our product. And my team develop all those machine learning models and bioinformatics that go through sequencing and all that to derive the genomic input to feed into training of those machine learning models. And we also do all the experimental design, app optimization. So just literally everything everywhere about data we work on, we lead.

Outside work, I like outdoorsy stuff too. We just came back from Europe of a small trip from a couple of days in Vienna, a couple of days in Budapest. I also love gardening. Although my passion is disproportionately higher than my skill set, so there are a lot of bad plants along the way. But nevertheless, they're also ones that are thriving.

Really happy to be in this forum and get to know you all. One more thing I do need to say, I see it's recorded. I do want to mention this will be my opinion, and my opinion only doesn't represent Veracyte or any of the non-profit I work for. Oh, last thing, we also love to work in non-profit just like Ning. So that's how we met each other. We were both leaders in BBSW, the Bay Area BioPharm, Biotech Statistic Workshop. And if you're interested, we can tell you more about it. We have our annual conference coming up in October.

Data-driven decision making in pharma and diagnostics

Sure. I can kick it off. So what I realized is data science, statistics are fundamental and crucial for product development. Everybody knows that, right? Either in drug development like in Ning's world or in diagnostic development in my world. However, we can do much more, right? We can help almost every step of the business and we should, right? Nowadays, data is so readily available from supply chain to how to get more effective inpatient claims.

So there are many more things and it should be like a smart business decision should be one available based on data and evidence, right? And we have a unique advantage of sitting at that juncture with access to the right data and with the skillset to derive that important data insight to help better the business. So that's, I felt like it's opportunity, but also a wonderful responsibility for us to make our business better and benefit the patients.

Yeah, totally. Totally agree, Jing. Yeah. Maybe just like dive a little bit more into the pharma space. Yeah, exactly as Jing mentioned. I think traditionally, I think the file set has been very important or data sets have been a very important role in the pharma industry for many years. And if we look at a kind of like a traditional job description, I think a lot of emphasis is on kind of how we design trial once we have a successful phase one or phase two, how we design a confirmatory trial based on the scientific findings, et cetera.

But nowadays, I think with, totally agree with Jing that there are so much data out there and there are so much we can influence our stakeholders. For example, like, yeah, we have manufactured data. We have those like kind of like site performance data in terms of their enrollment speed, in terms of their data quality. And we have those like kind of competitor data or research data, real-world data to let us know that whether the contrarian assumption we had from earlier phase trials still holds in the current situation. And also for reimbursement, like maybe there is a, so in some cases, like there is a meaningful, statistically meaningful, like difference over there, but whether that magnitude of improvement is sufficient for reimbursement in certain countries, et cetera. So there are so much we can do by bringing different type of data, different modality of data together.

Accelerating data science

Ning, your job title is something around accelerating data science. So what does accelerating involve? How do you accelerate things?

Yeah, that's a very good question. Yeah. So for my team, our team actually are really focused on data science software and data science infrastructure. Yeah. So like in the past, kind of like reflecting on what Jing and myself said, in the past we oftentimes like follow up a trial in a more linear way, et cetera. And like there are very well-defined output to be generated, tables to be generated, like graphs to be generated. But now with the kind of like richness of data, on one side, like with the opportunity of open source tools, like those automation, like advanced technologies, like there is a ways for us to automate many of the stuff. Like that's one thing, like what my team is focusing on.

So like if we can have like very good standard in a certain disease area or certain like a certain disease area, then we can automate a lot of things. So we don't need to do this more like transactional type of analysis repeatedly. On the other side, like with all those modern technologies, modern database, and then also tools and methodologies, there is a great opportunities for us to bring different data together. Like from Roche, we do have a big initiative, like trying to have this like large curated data set by combining like data from different molecules together, so that like we can have a very rich data set to help us do this reverse translation for the next generation of drug targeted discovery, et cetera.

Getting buy-in for data-driven strategy

I can chime in with some experience, right? So first of all, it's usually easier to do quick wins, right? Whether it's small or big, quick is important, right? And also, hopefully, at least the initial investment is small. You can face it, right? Otherwise, you say, hey, I need $3 million for three years. You can see the result, at least for small company with shallow pocket, like, oh, OK, maybe later, right? So you want small wins.

And also, you have to translate or at least speaking in the language where the leadership is more open to listen. For example, our company wants to leadership change. Previously, the leadership has more of a science background. So convincing them for the scientific breakthrough, how that directly benefit patients, it's important, right? It's easier to get buy-in. But recently, our leadership has a much stronger finance business background. The end goal is always benefit patient, but you have to come in from operational efficiency, how quicker you can touch more patients, increase the volume of the test, right, from that angle to get buy-in. So I found, like, small, not necessarily small, but quick wins to build the trust and speaking the language that the leadership will be open to listen, they're more familiar with, is very important to get into the strategic space to have data insight impact.

You have to translate or at least speaking in the language where the leadership is more open to listen. Quick wins to build the trust and speaking the language that the leadership will be open to listen, they're more familiar with, is very important to get into the strategic space to have data insight impact.

I totally agree, Jing. I actually have a recent example that it was, like, my biggest learning was basically being specific, being concrete, and identify those quick wins actually driving the impact. Yeah, the example I want to share a little bit is basically, I think, started from maybe a year or two ago. Within Roche, we have, like, this very strong interest of making our trials more inclusive. We realized that, like, nowadays, like, I think the clinical trial, because we measure more stuff, so the clinical trials have longer and longer inclusion criteria. And then that means, like, you have a smaller and smaller patient population over there. So basically, we really want to make our trials more inclusive and then try to pressure test using data to pressure test that whether this is really a concern or this is just a hypothesis.

So I think in the very beginning of the initiative, actually, we experienced one problem, like someone called it, like, a nodding head problem. So if we present this concept, everybody agree with it, but we didn't see the needle being moved. We didn't see an actual difference on those studies. People all agree with the concept, but, like, people don't know what action to take. Yeah, so what we did, what has been very helpful is we really kind of, like, provide this, like, white glove, like, surveys to one particular study team and sit together with them and pressure test all the inclusion exclusion criteria and use real-world data to really pressure test that.

And we came up with some very concrete suggestions. And what we did, the next step is basically look at all the protocols in other trials, like with similar molecule, and then to see whether they include similar language. And then we show the management team that basically, look, we have this number of trials with exactly the same language over there, and there is no strong justification of including this language. And what additional evidence you want to see for us to remove this language for all those trials and how many more patients potentially we can enroll, like, by removing this one. Yeah, so I found that was very helpful, the kind of having this more hand-waving statement saying that we all should optimize our inclusion exclusion criteria.

Hiring for data science roles

Yeah, so because we are a huge organization, like, we have, like, 900 people globally, you can imagine that definitely we have people who are more like generalists. So, like, there are people we want them to be, like, equipment to take different roles, et cetera. But we also kind of, like, hire very, very specialized talent, for example, imaging talent, or real-world data experts, biomarker, like, genomic experts, et cetera. Yeah, so I will say that, like, for our hiring, I think for junior roles, especially when we hire master grads, we don't really look into specific kind of, like, specialty. So, like, what we look into is more curiosity and ability to learn collaboration skills, et cetera. Yeah, but, like, for those more specialized roles, like, we may look into more senior people.

Yeah, I can add a little bit, definitely similar theme for our company, which is much smaller. So, we are definitely very focused on the business need, and it can change from not only year to year, but quarter to quarter. Sometime there is a very focused need on a specific area, and we foresee that will last at least two to three years, for example. There is a big need on assay optimization right now. And then we need bioinformatics who are just extremely familiar with RNA sequencing assay, right? And then there are other need in different function we need that is almost like kind of wearing all kinds of hats and be fluent. Then it is less about a specific skill, but there is just a general focus on the data science or data engineering.

Breaking down data silos

Again, it actually echoes back to the quick win. Usually, just like Ning said, if you come up to say, oh, we need to standardize, we need to connect, people will nod their head, but they're not going to really commit. It's like, yeah, sounds like a good idea, no reason not to, you know, play along, but whatever, right? But usually what we do, and it's out of necessity for a smaller company, is concrete project, right? Concrete project, it turns out, because in our company, we want to really leverage the resource and talents across side. We went through several acquisitions in the last couple of years, and we really diversified our talents, and we realized, oh my god, like there is a, you know, resource and talents in this side, although the project is initiated by this legacy company.

And just to enable that talent can access that data, then there is a concrete problem you have to solve instead of calling it. So basically, it's almost bottom up. Start with project to say, hey, see how much efficiency we can create instead of hiring outside consultant, blah, blah, blah. We have internal talent resource to tackle it together and break silos, and then that become an ecosystem. You get a lot of buying from cross-functional stakeholders, then it's the time to maybe start an initiative, and that's how we came about.

Yeah, Jing, I totally agree. I think like starting from something concrete, and I think like Jing, you and me also discussed a little bit on like whether we, it's a matter of bringing data together, or it's a matter of bringing the people together, like the data scientists together. Yeah, I think I resonate in that a lot, because like Roche is a huge company, and it's impossible to bring other data together. Like even combining more kind of integrating data from clinical trials, which seem like a straightforward task, is not that easy.

On the other side, I feel like a quick win, like to Jing's point, is maybe bringing the data scientists together. For example, in Roche, like in pharma, like we have data scientists varying different heights. There are people who are really, really good at clinical trial design, clinical trial data analysis. There are people really good at real-world data analysis. There are people really good at genomic analysis, and sometimes we see that like actually like each person is only kind of like representing their own kind of like specialty, and then they only share their results like to the stakeholders, for example, clinicians.

And then for the clinicians, actually it could be confusing if the clinician like see some result from real-world data, some result from clinical trial data, some result from biomarker data, and we all know that there is no 100% consistency like across different data sources, like all the models are wrong, all the data's are wrong. Yeah, so kind of how to triangulate, navigate through those data sources is not an expertise from the clinicians, from our stakeholders. Actually, it is our expertise. It's an expertise like of data science to really kind of like take all those diverse results in and find an interpretation in acknowledging the limitation of each data sources.

And I can maybe add one thing. It may sound mundane, but when we say bring data together, it is very worthwhile to figure out for each stakeholder what that means. And you will be very surprised that actually may mean a very different thing. One person may be meaning we have to standardize literally into one database. One person may be, oh, I just need the data could be linked together. They could be conjoined, whatever. I don't care where they sit. It could be even further loose, as Ning said. I just need different data scientists at different sites to be able to access so they can analyze the data. So what bring data together means figure out that understanding difference.

And then as a data science leader, you come in for that specific problem to say, what makes the best sense for this particular project? Right. Then you educate all the cross-functional stakeholder to say, I know you initially saw means this, but for this project, because of ABC, we really need to do either just this or more this, right, to enable the project to go forward. That really help establish your technical leadership and strategic leadership, because the cross-functional leader is very important for them to feel they have been heard. Instead of we just rush into our own assumption, they were like, what? This is not what I saw, what data together even means, right? So spend the time at the beginning to do this understanding alignment will benefit tremendously down the road.

Addressing the nodding head problem

I can also chime in, right? It depends on things, right? If it is really just, oh, bring to the awareness, bring data intelligence across functions without a need for a concrete output, nodding head is great. Then you follow up with a quick survey, oh, 90% support, right? That's just wrapping the project up. It's a great success. We increase the awareness. However, if it's an urgent project, I think instead of relying on other people to be a strategic leader, we have to step up to do more things technical. I encourage my team members, including myself, we have action item beforehand. We say, if you agree, please do this, this, this by this week, right? Or if you need to review, here's the timeframe we can answer question, blah, blah, blah, blah. But we expect you to take this action by this time if you agree with the general plan.

If you have other thing you need to sort out, clarify, whatever, we are here to help, right? And then you have a framework, a timeframe. Then you kind of start nudging people. Like you said, you're going to deliver three weeks. That's what we decided in this meeting, followed by meeting minutes, where's the progress, right? So you start the framework that will help make everybody accountable instead of just nodding head, right? You start, you prepare ahead of time, like what is required, implied by you nodding your head.

Yeah, I totally agree. I also feel like sometimes when I talk to my team, like, yeah, sometimes I also see myself and including myself, some of my team members, like we have the perception that we will present this problem to a meeting and magically in the meeting, someone will chime in and find a solution. But 90% of the time, it won't happen because for people in the meeting, they only think about this specific problem for like 30 minutes, an hour. And you have been thinking about this problem probably for weeks and months. And if you don't have a very clear, to Jing's point, if you don't have a very clear kind of like expected outcome or expected next step, like for this project that no one in the meeting will come up with a better plan over there.

Automating workflows and managing change

Yeah, I feel like in my mind, like kind of like when you talk to people like about their day-to-day job, there are definitely things they enjoy doing and there are things they don't enjoy doing. And oftentimes the things they don't enjoy doing are the ones we try to automate over there. Those are the repetitive tasks. Those things can be standardized, et cetera. Yeah. So I think, yeah, I totally agree with you. I think there is a fine balance over there, kind of like trying to like align with people or make people realize that by automating certain tasks that allow you to have more time to work on the things that you enjoy about. And also to ensure people that in the portfolio, there are sufficient amount of work that you will be more interesting and more impactful than kind of doing this repetitive job every day.

Totally agree. I think, you know, from a personal level, make sure it's, you know, more enjoyable. And then definitely on both Ning and my case, we just have way more work we can handle. And I think from business level, right, there are two ways. One is kind of not offensive, but proactive, right? Being able to automate a lot of the tedious, repetitive work, we will be more innovative, right? We'll be more at the cutting edge and more competitive in our own business. And we'll be able to use our insight instead of just the technical capability of doing repetitive work to basically create not only more work for ourself, create more data insight value for the company, right?

And also from a defensive point of view, if you just coming out of the fear, oh, I couldn't lay out people, so let's not automate, you become obsolete. You become much less efficient compared to your competitors in the same space, right? Then the company will suffer. Versus thinking about you being replaced by a more modern technology. If you're a competent data scientist, you should be able to create more values than just doing repetitive work.

Open source and FDA submissions

Sure. Yeah. So I think my example may be more toward the deliverable, but not too much about the design. I know that you mentioned about synthetic control, et cetera. I know within Roche there are also a lot of colleagues who are like working on that, working closely with FDA in terms of kind of like having those more modernized design and getting buy-in for those modernized design. And for myself, my involvement is more in the kind of like deliverable, actual finding of like phase three trial of a product.

Yeah. So basically I've been working in this space trying to kind of like enable the industry to use open source language and especially our language for FDA submission. Yeah. So I would say like, yeah, it is a process. It's definitely a process. Like exactly as you said, I think like from, I don't know which year, maybe Mike remember which year, I think it was like from 10 years ago, FDA like had this guidance saying that FDA is not requiring any statistical software for this like drug filing. However, like in reality, I think like every single company is using commercial software for their filing and we don't see the needle being moved. So similar to the nodding head problem.

And then, so we started this like our consortium, our submission working group probably four years ago. And then the idea was kind of like having those publicly available examples showing actually sponsors can use open source software to do drug filing. Yeah. So like similar to a previous example, like we have like a connection from FDA, like from FDA, they also have open source and services like from their group. So we identify those people, collaborate with them, and then like did those like pilot filings in the public space, showing people how to do that, and also kind of learn what is the best practice over there.

The other thing, what I learned is also in those cross-industry collaborations, it's very good learning experience to know like what's in it for them, like what's in it for FDA. Yeah. Because like in FDA, like they have people who want to use open source language, but they also have limitations of their system, et cetera. There are so much learnings in those cross-industry collaborations for us to realize that how can we make their life easier when review the application.

Yeah. So, I mean, I have to do this whenever I see Ning, because like her and the R Consortium Submissions Working Group, I think are really transforming the future of submissions to regulatory authorities. The great thing about it is that it's a kind of, it's bipartisan. So it's the industry and the regulators both looking at this together and saying, what shape do we want this to take in future?

Maybe not derail the topic a little bit, but I can share maybe some learnings over there. So basically we are wrapping up our pilot number three right now. And I think that one learning is that, going back to James' point, again, start small. So when we did pilot number one, there are so much unknown over there. So the scope was really, really small. So for our pilot number one, we decided to only do an experiment to submit four table graphics. And we are not doing any data set submission. And we want to test out different tools, et cetera. It's really just a feasibility testing at that time. And I think that was great so that we can wrap up the pilot one in several months and get a formal FDA letter saying that this is feasible.

Yeah. And after that, I think we became bolder and bolder. And our pilot two was actually submitting a Shiny app, an interactive app to FDA so that they can redeploy the app on their system to do their review. And our pilot three is adding this data set component to show that the open source language-generated data set process, the data set can also reproduce what the commercial language produced. And then now we are looking to pilot four where we will introduce, again, very boldly introducing this container and also WebR component, which will be really, really interesting. Yeah. So I think the journey has been really nice to have this iterative learning. And I see our goal become bolder and bolder after knowing each other better.

Yeah. So I talked about this at PositConf last year that over 30 years in industry, we've gone from paper submissions to PDF submissions, which are essentially the same things but with hypertext links. But what I think we're getting to with the WebR and with the submissions for pilot is something utterly different, really fundamentally different. And I think that's really exciting because it's getting into the 21st century. We're typically, pharma industry is known for not moving terribly quickly. But I think this is an example where we are.

Infrastructure and languages

So like from Roche, like right now, we are finally moving to a cloud infrastructure. Yeah. So previously for pharma, oftentimes we use commercial software, we use proprietary system, etc. Yeah. So it's a little bit black box sometimes. And also you can imagine that it's really hard to add in new features when there are new AI models coming in or new language pop up like Julia, etc. So it's really hard to make them work on commercial platform or together with commercial language.

Yeah. So we are in this process of moving to AWS-based infrastructure. And on the infrastructure, we enable kind of data scientists to use whatever language they found it being appropriate for their project. So our Python stats like Julia, etc. Yeah. And we are in the middle of that. So this year, we are hoping, like in Roche, product development data science, we are hoping to move 90% of our active molecules to this new platform. And I hope next year, once people settle down with the platform, etc., we will see more innovations and more automated solutions, etc.

Yeah. We're in a different regulation space. I would say not as rigorous as drug, and that provided us some flexibility. So R and Python has always been, you know, kind of the building blocks and fundamental language we always use. And we then use specific language to tackle, for example, sequencing, sometimes tackle imaging analysis. And we just use whatever is the most fit and most, you know, advanced, right? It could change from year to year as well. We are really agnostic to the language we use.

And infrastructure-wise, there is definitely a move from on-prem to cloud. Interestingly, to add to the complexity, because our recent acquisitions, not only we leverage different cloud platforms, there could be also regulation, like GDPR restrictions, like we acquired two European companies. What company can flow from U.S. to Europe versus Europe to U.S., and how to control that in AWS could be, you know, not only a solid but sometimes complex problem. So we are definitely also in the journey.

Balancing business needs and data science innovation

I can provide, because our product is machine learning models. So it is – I will say in the most healthy organization, I certainly believe Veracyte is one. It's never one or the other. It's a joint one, right? So usually it's an iterative process. We say, hey, let's just start, right? Like one example, hey, you know, our commercial folks will come say to say we continuously hear feedback to say it would be really helpful to add a feature to our product, right? Or they will come say, well, it would be really helpful to add A and B and C to our product all at once, right? Then we'll say, hey, guess what? From a data science point of view, A needs six months development you can deploy. B needs about two years. C, we are uncertain. We need to do phased approach, right? What is the investment? Remember finance people is in the leadership. What is investment in each? What is the ROI? What is uncertainty? Let's decide as a team, right?

And then, you know, that's the initial data evidence we provide with our expertise, right? How long it will take to develop? How long, you know, how large is the investment? And then the commercial come back to say, guess what? Now I see A, B, C. Each will give us how much more revenue or volume increase or stickiness from the customer. This is what we get. And then with that two piece of information, we decide to say, hey, maybe sometimes it's a selection. Let's just try B because it's so quick. Or they will say, let's try B and A at the same time and hire more people because it sounds so important. So it's really a dynamic, cross-functional decision for our business.

Yeah, I agree with Jing. I also feel like it's rarely A versus B, like kind of like a functional interest versus business need. Often it has both components. Yeah. So from Roche, like we are actually, our culture actually encourage a lot of grassroot effort. So we encourage people to innovate in their daily work. So there are a lot of ideas popping up and et cetera. And oftentimes those ideas are coming from actual study need. And to the earlier point, there are the things people feel bored about, like the repetitive work, et cetera.

And a larger question is some of those needs are only tied with one particular project or one particular study. Some of them are applicable for the whole portfolio or across different projects, et cetera. So I feel like what our leadership team try to do is basically still encourage innovation, but if they see any kind of innovation and they see an innovation that can be applied as an amplifier across a large number of projects, then they give a push to this project.

Yeah. So that kind of like taking an example from myself, actually, when I joined Roche about eight years ago, I was about statistician. So I was working on clinical trials, like doing design, doing clinical trial reporting. And at that time I realized that there is opportunity for us to adopt open source languages and do better code sharing, code standardization, et cetera. So actually it was kind of like my hobby project for several years until maybe like three, four years ago. In Roche, we decided to double down to this strategy of adopting open source tools and platforms. And then this became my day job. Yeah. So I think like, yeah, kind of like there is definitely a kind of like incubation period, but like from our company, we encourage like all the new ideas from the grassroots efforts.

Career advice

I can start. So we usually are very technical. This is about communication, just to set the context. We're usually very technical people, but the audience sometimes, especially for strategic decisions, are usually not technical people. Sometimes we want to be so rigorous, we say or claim humbly that add 20 caveat, right? Does the 20 caveat really need to be said in the meeting? So I always say, really think about the audience. It's rarely that important what you're about to say or what you said is about what they hear, right? What do you want to get out of that conversation, right? Then structure your communication. Be more confident on the positive claim if there is one, right? Maybe put the caveat in the no section.

You are no longer a student. You don't have to be demonstrate your capability. We hired you. That's already enough. You should demonstrate what the result means. What is the next step, right? How I can move the business forward? What is the impact to each cross-functional leaders beyond the technical caveat and technical details, right? So that's always a communication tip I give people who work as a data scientist or in data analytics in the industry setting.

You are no longer a student. You don't have to be demonstrate your capability. We hired you. You should demonstrate what the result means. How I can move the business forward? What is the impact to each cross-functional leaders beyond the technical caveat and technical details, right?

I really like that. And I know from BBSW conference, we also talk about this decision size, like my side. So basically, to James' point, I think the main goal is enable decisions, is to enable actions. It's not showcasing our capability that I can generate 200 outputs over there or showcasing that I can do a really complex model over there. It's really enable the next decision.

And I remember I was reading a book called The Manager's Path. And on one section, it's saying, how do you manage your one-on-ones with your manager? I think in my earlier career, I always treat my one-on-one as a reporting meeting, just a laundry list of things that I did, et cetera. And after reading that part, it really emphasized that treating every one-on-one meeting as an opportunity to ask your manager to help you with certain things, kind of like manage app, to ask them to take certain actions to help you or ask them to share a certain context that can help with your work. Yeah, I found that really insightful.

And maybe another thing to add on is also learn from BBSW. I think I learned so much from those nonprofit organizations. It's kind of like finding a network of mentors. Yeah, I think for BBSW, it gave me the opportunity to meet a number of leaders in the Bay Area. And everybody had their own perspectives. Everybody had different career journey. People working in different type of companies, different type of groups. And I found that having this mentorship coaching, like the opportunity to get mentorship and coaching from a diverse group of people have been really, really helpful. Yeah, because I think everybody has their blind side. And what your manager believes or what your leadership tends to believe may be slightly different than other leaders. And it's really helpful to get the diverse perspectives.

Community resources

So we both, Ning and I, we both work on the BBSW, the Bay Area Biotech Pharma Statistic Workshop. And if you just do bbsw.org, you get there. We have our exciting annual conference coming up at the end of October in Foster City in Bay Area. If you happen to be in Bay Area, that is one of the, I'm biased, but I still say one of the best conference. Really, not only we talk about cutting edge methodologies, we have now a data science track and a statistical track. So we talk more broader topics. And we have also soft skill sessions. It is emphasis leadership coaching. And it is, many people come back to say, not only the technical and soft skill sessions are helpful, it's the best way to network, right? We have a lot of opportunities with breaks to networking with each other, really broaden your horizon. And the food, I think is otherworldly.

So that says something. So the food and you can, you know, it never, always helps with yummy food when you network. Yeah, that's one. And we also have SFASA, the Bay Area ASA chapter. So sfasa.net, that's another non-profit I'm working on. So that also has a lot of, again, Bay Area focus. If you are not in Bay Area, don't worry. There's another one called dashu.org, D-A-H-S-H-U.org. That's a data science community. It's a global community. We have monthly seminars, and we have just restarted our annual in-person conference in Michigan in May after the pandemic. So that's another one you can connect, you can contribute. If you have topic you want to present, go for it. It's a platform for every person in the data science community.

Yeah, maybe just for BBSW, like if you are not in the area, like I'm also co-hosting the BBSW meetups. Like we have probably three, four virtual meetups every year. And then the topics are very, very varying from data science topics to self-sealed topics to maybe disease area knowledge topics. Yeah. So you're more than welcome to join the Zoom meetings. And also if you have any topics you're interested in, please feel free to share with me. And like we can try to reach out to the experts in the field.

When projects don't go as planned

Maybe I can get started. I feel like the really hard problem to solve is to stop a project when it is not promising or it is not relevant anymore. And I think if I ask for my practical advice, I would say that if you feel like something is not right, then probably something is not right, and speak up and have an honest conversation with leadership, et cetera, to see whether we want to stop it and how to stop it. Sometimes stopping it is not easy, especially if it's a cross-functional project. And if nothing works, my advice is to try to move to another project. Try to find a more impactful project if you really don't see a value in this project.

Very true, and echoes earlier point about managing up. This is a perfect discussion you can bring to your manager. I always tell my team, and I practice with my manager, is you don't always do the status report and give me the good news. I can read that on a conference page, right? Use me as a thinking partner. Bring the question, right? And then as a data scientist, one important but sometimes awkward question you don't want to ask is, is this still a data science question to solve? For the project, even everything works well in the data science, putting the resource time aside. Let me say, let me say I can give you that data science answer right now. Would that solve the fundamental problem, right? That may help people thinking, because otherwise the human psychology tendency really just want to keep on going, not admitting this is already beyond the point of safe, right? You may be asking small questions to large people there.

Thank you all so much for joining us today and for all the great questions to Ning and Jing. This has been awesome. I've loved this conversation and I have so many lessons here to go and think about for my own role too. I really appreciate you taking the time to join us.

Our pleasure. Thank you. Very nice to meet everybody. Yes, indeed. Let's keep in touch. Have a great rest of the day.