Resources

David Sluder @ Institute of Nuclear Power Operations | Data Science Hangout

We were recently joined by David Sluder, Data Science Sr. Program Manager at Institute of Nuclear Power Operations to to talk about what he’s learned from helping build out a new data science capability in a nuclear power organization. Bio: David is a data science senior program manager at the Institute of Nuclear Power Operations, a non-profit company that sets standards for safety and reliability across the US nuclear power industry. A long, winding road led him to his current position, having worked as a database administrator, engineering educator, event organizer, and glass bead maker. He enjoys solving complex technical problems, being a champion for open-source software, and finding opportunities to laugh and connect with others. _________ ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!) Thanks for hanging out with us!

Nov 21, 2023
58 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Happy November, everybody. Welcome to the Data Science Hangout. I actually just shared our 100th Data Science Hangout recording to YouTube yesterday and made me realize that we never actually celebrated that. So thank you all for being here and making this space what it is. I can't believe we're already over 100 Data Science Hangouts. If this is your first time joining us, hi. So nice to meet you. I'm Rachel. I lead customer marketing at Posit. This is our open space to chat about data science leadership, questions you're facing and getting to hear about what's going on in the world of data across different industries.

So we're here every Thursday at the same time, same place. So if you're watching this recording on YouTube later and you want to join us live, you can use the link below to add it to your calendar. But at the Hangout, we're all dedicated to making this a welcoming environment for everybody. Love to hear from everyone, no matter your years of experience, your titles, industry or even the languages that you work in. It's totally okay to just listen in here. I say if you're out walking the dog, maybe you're at lunch, it's okay to just listen in.

But there's also three ways to jump in and ask questions or add your own perspective. So you can raise your hand on Zoom and I'll keep an eye out here. You can put questions in the Zoom chat and just put a little star next to it if it's something you want me to read out loud instead. And then lastly, we have a Slido link where you can ask questions anonymously too.

And thank you, Curtis, for sharing that there in the chat. Sometimes I forget to say this, but if anybody is hiring, feel free to share any open roles in the chat as well. That's definitely not spammy at all to me. I think it's great to share those jobs in the chat too. But with that, I am so excited to introduce my co-host for today, David Sluder. David is data science senior program manager at the Institute of Nuclear Power Operations. And David, I'd love to have you just introduce yourself a bit and share a little bit about your role, but also something you like to do outside of work too.

Introducing David and INPO

Sure. Thank you, Rachel. And good morning, good afternoon, good evening, depending on where you are. I'm really excited to be here. I've got two disclaimers though I have to start off with. So number one, I just want to publicly say I'm here to represent my own opinion. I am not here to represent my employer, Institute of Nuclear Power Operations or the broader nuclear industry. Of course, I'm going to talk about them to give the conversation some context, but this is purely my opinion. And then the second piece is that I really need to make sure everyone knows that anything I talk about today is, it's not just my work. So I work as part of a broader team and the team has worked so hard to build out data science at INPO. I work with a bunch of smart, passionate people and it's just really important that they get a lot of credit too here.

So with that said, my name is David Sluder. I'm a data scientist here at INPO. I've been at INPO for almost 12 years now, only about two years in the data science space, about 10 years in the IT space here at INPO. And it's a really interesting company and I didn't know anything about INPO until I started to work here. So maybe I'll kind of give you a briefer on what we are and what we do. So INPO is an independent nonprofit that is funded by our members and our members are the nuclear power industry. So within the US, I think we have 54 nuclear power stations and they all kind of, you know, pay us a member-based fee to perform our service. And now our service is that we set and assess safety and reliability standards for the nuclear power industry. So we set these standards, we send out evaluation teams to actually kind of determine how well the different stations are adhering to these standards. We, you know, kind of give them a score at the very end of that. And, you know, we do that on a regular cadence. So we do that along with also kind of facilitating the sharing of knowledge across stations, which is really interesting in an industry where you have different companies competing together, but then also sharing information with an independent organization that allows them to, you know, kind of share what they've learned so that we're all able to learn from each other's experiences, mistakes, challenges, that sort of thing.

I've been doing, you know, kind of data science for about two years now. Before that I worked in IT, but my background is at, my bachelor's, you can call it my background, is in humanistic psychology, which is, you know, kind of the most qualitative thing that you could do. So it's been a fun road to figure out how I got here. You know, outside of work, I like to do a lot of things. I like to read, I like to laugh, and I like to travel. So this past weekend I was able to indulge all three. You know, Posit Conf this year was in Chicago, and it was the first time I had been to Chicago with any chance to explore it. So I got back home and I told my wife, Katie, hey, we need to go to Chicago. So we went to Chicago this past weekend and we saw, you know, our favorite comedian. We went to go see some improv. We found kind of a punk bookstore, which was a whole lot of fun. And we got to do just a lot of wandering around the city. So it was an awesome trip. And, you know, I at this point kind of recommend Chicago to anyone, especially if it's the weekend before Halloween, because then you just get to ride around on the L train and watch everyone's costumes.

Moving from IT to data science

Well, David, thanks so much for the intro and sharing a little bit about your background there. I know when we were chatting, you shared some more info on how you had previously worked in IT as well. And so I was just curious to kick off the conversation with that because it's something that comes up quite a bit in this space, but having moved from IT to data science, I know you have probably pretty unique perspectives on both sides. What would you recommend to some of us who might be struggling to communicate across those lines?

That's a good question. And, you know, I think a challenge that I hear in the Hangouts and then also just kind of conversations I've had with other people, you know, to me, a lot of times you have these challenges and it really just kind of boils down to building a good relationship, you know, so that can pan out in a lot of different ways. And it's really dependent on your organization and I think how it's set up and just, you know, kind of the general size of it. The way that we did it was, you know, we kind of built data science, you know, not quite from scratch, but we had one organization at INPO that did some sort of kind of data science-y work. And then our senior leadership team kind of identified it as a strategic priority for the company to actually build out that capability. So then there was a lot of emphasis and a lot of focus on it. And that was right after I had moved into the data science organization. So we knew that, you know, data science and IT needed to work together because, I mean, every single project that we work on has an IT component to it.

So some of what we did is we started having leadership meetings between data science leadership and IT leadership on some sort of regular cadence, just to kind of talk about what the roadmaps look like, what projects are on the horizon, what challenges there might be, and just kind of putting everything out in the open as much as you can just to have those conversations. You know, on a more personal level, I think it's really important for the two groups to kind of learn more about what each group does, you know. So sit down with a network engineer and understand why, you know, they might be a little cautious about opening up all of, you know, opening up their infrastructure to all of the packages on a repo or, you know, having a network engineer sit down with a data scientist and understand why that's a challenge for them. It really, to me, just kind of comes down to having conversations and being open and honest and transparent and figuring out a way to move forward together.

It really, to me, just kind of comes down to having conversations and being open and honest and transparent and figuring out a way to move forward together.

That's great. How did you actually first start those conversations? Well, so data science kind of became an emphasis for INPO because the way that we were doing business was changing. So we go out and evaluate stations every two years. That's kind of been the regular cadence for about 30 years. But a lot can happen in between those two-year assessments, right? So we understood that we needed to get into more of a kind of a continuous monitoring sort of fashion. And the only way that we can do that, because we don't want to go out and evaluate more often, it's a really costly thing to set up an evaluation team and send it to a station. So instead, it meant that we needed to figure out a way to use the data that we have to do some modeling, some analysis, and try to kind of understand what their quote-unquote performance looks like in between those evaluations. So that was kind of where it landed at a strategic level. And then it kind of became our job to actually implement that. And having recently, it was real lucky having moved right from IT to data science and being able to kind of bring that perspective to both sides, you know, I think put us in a really good situation. But I know that's kind of a rare situation to be in, too. So, you know, I think, yeah, just kind of like sitting down, you know, putting your chairs in a circle and saying, what's going on? What can we work on? How do we figure this all out together? It's just kind of the right way to start.

Humanistic psychology and data science

Yeah, thank you. Russ, I see you asked a question in the chat. Do you want to jump in here?

Yeah, thank you. I have a master's in counseling. I love humanistic psychology. I could talk to you about that all day. Does that help you in data science? If so, can you say something about that?

I thought about that, you know, kind of in preparation for that talk. And the more that I thought about it, the more kind of threads that I could see. Just because, you know, a lot of data science and a lot of working in an organization in general is kind of working on relationships and communication. And actually, you know, actively listening, you know, that was something I remember taking a class on that when I was in college. And I feel like I still, you know, try to use those skills to this day where, you know, you're actually sitting there in front of a person and you're hearing what they say and you're thinking about what they say instead of figuring out what you're just going to say next. So, you know, kind of having a real active listening sort of conversation. And then understanding that, you know, people are people and we can be inconsistent and hard to understand. And, you know, really all it takes is maybe a little bit of openness and kindness to give space to the sorts of conversations that need to happen.

You know, the sorts of kind of crucial conversations that we have. We call it, that's kind of a term that we use at INPO for these sort of intense conversations that need some sort of resolution. And, you know, in my opinion, just kind of being open to that and understanding that, you know, there's going to be some emotional reactions and that's A-OK because we're all human beings. But it's important to not, you know, just kind of react to them and just fan the flames. You need to figure out what the sources of the issue and address that.

Domain expertise and nuclear industry metrics

Love that. Thank you. Alan, I see you have a question here. You want to jump in?

Yeah, sure. Hi. Hey, David, I'm really curious. So my instinct is like performance metrics in nuclear power is really, really specific, like a super specific domain. If that's the case, I wonder how that domain expertise comes to play in the stuff you do day to day, or if maybe that's the wrong assumption and really the metrics are around efficiency and people metrics and like things that aren't really familiar. Like, I wonder how specific what you're doing is to that industry and, you know, how you operate in that kind of environment.

That's an excellent question. So the industry metrics we get, we get hundreds of data points from every station every month. And we use that to, we end up, you know, because every station can be a little different. You've got to, you know, go through some sort of, you know, kind of like normalization scheme. So we have these indicators that are a way that we normalize the data. And then that's what we can actually do some modeling on. And I have learned just how little I know about nuclear power by moving to the industry side of things and needing to understand exactly what, you know, every label, every acronym means. And it'll take a really long time to get to that point. And luckily I don't need all of that to do my day to day job. But, you know, whenever I'm working on a really important model, I've got to make sure that I'm not making some really bad assumptions about it. So that, you know, means that I got to sit down and talk with my manager, director, or some other subject matter experts in some of the functional areas here at INPO, just to kind of, you know, make sure that whatever I don't know isn't going to actually bite me sometime.

Introducing data science tools to the enterprise

I know you mentioned data science is relatively new for INPO. How have you been successful in introducing data science and data science tools?

Great question. So, you know, introducing like a new tool to the enterprise is a really challenging thing. You can't just, you know, say here are some model results and expect everyone to understand what that means. So one of the changes that we've made, and it's actually kind of part of our governance that we've written, is that anytime there is, you know, a new tool of some sort delivered, we have to figure out a way to explain it or interpret it with some other, you know, maybe some other tool or maybe, you know, kind of a help function of some sort. So, you know, one of the things that we've learned is that, you know, delivering these model results that require some subtlety and understanding is hard to do for people that don't use it on a real frequent basis. So, you know, we've added a lot more kind of like help functions into the tools that we build, the ways that we expose these model results. And then we've also, you know, kind of built some other tools to, I got to talk generally, to help you understand why the model is saying what it's saying. And we've had a lot of success in, you know, kind of thinking about things from, you know, the end user perspective and not at the data science perspective to help us figure out exactly what we can do better to make sure that they use the tool correctly and they get the insight that they want out of it.

Yeah. So, you know, what I kind of alluded to earlier is, you know, what kind of the impetus for data science at INPO was trying to understand performance in between the assessments every two years, biannual, biannual, I never remember the word. So, you know, we get data from the industry every month and we have a model that helps us gauge that sort of performance in between the assessments, right? So, the assessments are kind of our gold standard in the way that we know if the model is right or wrong, but it gives us an estimate in between for their performance. And then, you know, from there, there are other things where you can, you know, maybe kind of come up with some risk models to see if something, you know, has a high probability of happening. And, you know, we don't, obviously don't want it to happen. So, we want to figure out that risk and try to get ahead of it.

So, you know, data science at INPO, our department is actually called data science and it's composed of kind of three groups. So, one is data visualization. One is kind of data analytics and that's where I live. And then one is data quality or data management. So, you know, whenever I build a model, you know, I'm just kind of having the results end up in a database somewhere. And then we rely on our incredibly capable data visualization group to actually kind of build those visualizations to deliver it to an end user. So, you know, no one ever really gets those raw results out of the database. They get something within a broader context to help them, to help make sure that they are interpreting it correctly.

Regulatory environment and data sharing

That exists within the nuclear industry, but it doesn't affect INPO. So, there is – I remember when I worked at IT, there's a group called – it's called NITSL, N-I-T-S-L, the Nuclear Information Technology and Strategic Leadership Group, which is kind of a more nuclear IT group, working group. And they have a focus on quality assurance because there are very strict regulations that go into how you validate that a system, that like a software system is going to do what you say it's going to do as long as, you know, it's going to do what you say it's going to do and it's going to work correctly and there's no way that it can kind of, you know, get it wrong. But that mostly affects the safety systems at a nuclear station. As far as I understand, this is obviously, you know, not my wheelhouse. But because we at INPO don't work directly with, you know, kind of supporting the safety system at a nuclear power station, we don't really have to, you know, kind of go through that very, very rigorous process.

Sure, yeah. I was just curious, you know, the data that you're producing and from the assessment, is that ever shared with other federal agencies? We do share, you know, some of the content from the reports. I know the one place that we share the report, I believe it's a report, but it's definitely the assessment number, the assessment score is the company that issues out insurance to the nuclear power industry. So, our assessment score is directly linked to nuclear stations insurance rate, which gives us a little bit of, it makes it very important that everything is right. We share some, we call it operating experience with some other groups, but I don't think that we share the actual report with them. So, it's a very kind of tightly, you know, guarded thing that we share with the senior leadership at a station. And, you know, we share at least the assessment number with the insurance group. But for the most part, we kind of hold on to the actual report findings because there's a lot of ways that sharing that can go wrong.

Rolling out Posit Team and learning best practices

David, I feel like this might be a helpful conversation to open up with the group, but I know when you and I were chatting at Posit conference, we were talking a bit about how when you rolled out Posit Team, it would have been helpful to know what a bit more about what some other teams were doing with the products as well, to also kind of get ideas and share use cases with each other. I don't know if you want to expand on that a bit and maybe ask to the group too.

Yeah. Okay. So, this is kind of my question for the group is, you know, we recently rolled out Posit Team. And, you know, the modeling work that we've done at INPO has been a very particular kind of modeling work where, you know, you kind of build a big model and you deliver some results and then it gets sent to a visualization. And that's mostly it. But I know that there is a lot more that, you know, Posit Team and other platforms offer a lot more capabilities than that, you know, rather than just building a model and putting results in a database. And, you know, kind of coming relatively new into this space, it's been hard to just kind of figure out, you know, what are the different ways to use it? Like, what are the things that I don't know first? And then what are, you know, maybe the best ways to solve certain problems that I do know about, you know? So, you know, at a real technical level, you know, we have a lot of Microsoft SQL servers and, you know, trying to use those with Posit has some challenges. So, you know, like, what are the best ways to solve that? I know that Posit has a lot of really good documentation online, but I'd love to hear from people that have actually solved this problem and, you know, kind of learn what they've learned in the process, if that makes sense. So like, where would someone go to kind of learn these best practices or have this sort of conversation outside of, you know, a weekly data science hangout?

One of the things we had a lot of success with in a former life of mine was when we deployed reports and a lot of these reports were sort of meant to monitor model scores over time. This is not helping your Microsoft issue, I'm sorry. But in terms of just use cases in general, we would do things like query in the back end. So we could run a report every day, every week, perform a query, bring in this new data that we haven't seen yet, and then run that through our models and then get some model scores. And then we could track those model scores compared to yesterday's, last week's, whatever it is. And then we could use Blastula to send out alerts and we could essentially create a custom report based on what we were seeing. And that could be sent out to certain individuals associated with the account in question and that sort of thing. So it's a nice handy monitoring tool.

I could pop back in, but yeah, David, I don't know if you're using dbplyr. Oh yeah. Yeah. Okay. So you know about that. That's something where I'm surprised how many R users at my company don't know of dbplyr. And so they're like creating SQL files or, you know, they're just doing things maybe in like a 2010 view of the R world. And that's okay because a lot of them are more like, you know, proper statisticians who DevOps and MLOps isn't really something in their wheelhouse. Are you using, well, I assume you're using some kind of like get back to platform like GitHub or GitLab. Something we've recently been exploring with Posit Connect is just setting up a lot of CI/CD pipelines to sort of make the whole deployment of content across different maybe like connect instances. So if you've got different like dev or test or, you know, production instances, we're trying to sort of create a process of, you know, being hands-off, like rolling up everything through a lot of CI/CD and Posit has like excellent documentation on this stuff. So if you're not doing that, I'd definitely recommend something like that too. And it plays nicely with, you know, dbplyr also.

I guess like kind of even at a higher level. So we have this platform, you know, this weekly hangout that we could use to, you know, ask questions if it's the right context, but where do y'all go outside of this to learn about the right ways to do things or people's experiences with different ways to do things?

Yeah, for me personally, I feel like I'm asking questions all the time in public places, internally, at least. I mean, I say public, like we have so many teams channels around data and data quality and data science and analytics and all this stuff. Like I'm pretty shameless when it comes to asking a bunch of questions on these things. At the same time, like I feel like I'm constantly helping other people. So it's like, you know, there's give and take, but I think for me, it's a lot of just like on the job exposure and experience, asking people stuff that maybe I wouldn't be able to ask publicly and just like working together to, you know, get through some of these hurdles. I mean, every team has like really unique challenges. They might be using R, but they're like, you know, they're trying to map to some network drive or something. And it's like, well, we can't really do that or not as easily, but they want to fix it. So it's like, you know, I don't know. Sorry. I feel like I'm hijacking the conversation here. But like internal networking, I guess, is what I would summarize my comments to.

That really helped a lot for us. And I think Catherine from my company is on here too. I don't know if she has any additional comments. Yeah, Javier is totally shameless about asking tons of questions and everything. But I do think that you're right in asking about like, where do you go to get best practices? Because internally your team has, like, or your company has a way that they do things. So getting exposure to the ways that other companies do things, like knowing what else is out there, can be really difficult. I don't personally do too much research because I try to like not work outside of work. But it does make it difficult to figure out, like, what else is out there to then come back to those conversations if you're setting up data scientists, IT, or those kinds of conversations to bring those ideas if you don't know what they are. Yeah, I don't know what I don't know.

Yeah, the challenge for us is, you know, we at INPO are pretty new to this. So there isn't that sort of kind of institutional knowledge that we can quite, you know, reach out and question other people about it. So it's at this point, you know, either, you know, kind of talking to other people in the nuclear industry. And we do have a data science industry working group that we've set up at INPO, but across the nuclear industry. And we're starting to get some good traction with that. But even past that, you know, it's good to have conversations with people of different, you know, in different industries just to kind of understand, you know, how they might think about things differently.

NLP use cases in the nuclear industry

I'll tell you the little that I know. So I know that some places have set up, you know, their own private instances of chat GPT, because, you know, nuclear data is regulated, tightly regulated. So you have to be really careful about what infrastructure lives on. So generally speaking, we're going to want to keep things on premise, you know, in our own domains, rather than use something on Azure or AWS, if we can. So people have set up private instances of chat GPT to try to kind of understand how it could be used for their organizations. And there has been, you know, kind of some success, but as you all probably know, not everything that comes out of chat GPT is correct, even if it sounds like a good, well-formed sentence.

So stations have something called a corrective action program. So, you know, kind of generally speaking, any time there is an event, something goes wrong, whether that's someone, you know, tripping on the sidewalk, walking into the station, or there's a reactor scram, there is a report written up about that. And, you know, for a single unit station, I think there's probably 600 people working there. And that means there are a lot of these reports that get written. And, you know, at a certain point, you have a whole lot of text data and not enough people or the right technology to actually kind of trend it and track it well. So there's a lot of interest right now in understanding how to use this kind of history of CAP records to understand, you know, kind of the behaviors that might influence performance. So those are the two kind of big use cases I've heard so far.

Hiring for data science roles

Well, actually, let me shift gears for a second. I know when we were chatting before, you were talking a little bit about how you're hiring for a role. And I thought it might be good to just discuss a little bit of what you've learned recently in trying to fill a data science role too. It's challenging, as a lot of you probably know. So we did actually fill the role. The person hasn't started yet, but, you know, kind of everything's been signed. So we're good there. And I feel like I learned a lot about what it takes to hire a data scientist in that process.

So, you know, we found that we got a lot of success using recruiters versus just kind of having an open job posting, just because the market is pretty well flooded. And we wanted someone that had some particular natural language experience. So, you know, instead of going through the mountain of resumes, it was a little bit easier to have people help us find, you know, people who were looking and people who had the right sort of resume. And then from there, you know, going through the technical screen to make sure that they had the right understanding that we were looking for. And, you know, trying to find the right, especially when I don't know enough about natural language processing to put it on my resume, but I'm trying to hire people who have that experience, you know, you've got to do enough research so that you can ask the right questions. And that was, I mean, that was pretty interesting. That was a lot of fun, to be honest with you.

But it was surprising to figure out exactly what, you know, in the process, what you needed to look for on a resume to get the right level of experience, because, you know, having some spacey or some hugging face experience doesn't necessarily mean that you are right for the role. And the thing that I think surprised me most was that when we started looking for people who didn't just have like a master's degree in data science or analytics, but we're looking for things that were kind of more like applied roles where you were, you know, getting a master's in or PhD in something besides those, you know, particular fields, but you had to use machine learning, data science, NLP in order to actually achieve that PhD or that master's. There was a lot of, I found those conversations to be, you know, surprisingly useful because, you know, this person is not just, I went through a master's program and got a master's in analytics. And, you know, for the capstone, it was like, okay, you get to figure out how to put all of this together and then do it on your own and figure it out. People with kind of what I'm calling an applied perspective for data science have already been able to do that. And it's just, it was really useful to, you know, kind of suss that out in the process. So that's something I'd recommend is, you know, maybe expanding the pool of applicants past just those people that have the master's degrees in analytics and data science that I think, you know, we probably all have seen them or have those ourselves.

Building skills and putting models in production

Hey, David, thanks for the time. What are you currently working on to build your skill set? I'm thinking more on the technical side, obviously you're doing a lot on the soft skills, but what are some maybe historical stuff that has been in your industry or some other things outside of your industry that you're working on that's going to advance your skill set?

So I got to learn a lot about Linux, so tongue twister, because I was the one that got to do the Posit install earlier this year. So that was interesting, challenging. I know a lot more about infrastructure than I ever did now. I'll file that away for a rainy day. On the data science side of things, I am, where's my book? I'm reading through Max's book, Applied Predictive Modeling, right? And just trying to wrap my head around things dealing, the different sorts of models and approaches to maybe cross-validation, statistical machine learning that we'll need, because we tend to use kind of smaller data sets at INPO. So it's, we don't have anything that I would call like big data at the moment. So it's a very particular kind of flavor of data science and machine learning. So I am reading through that and then just trying to figure out different ways to actually stay up to date with new models, new algorithms, finding the right papers to read, to stay up to date with those sorts of advances. And I haven't really found a great place that kind of centralizes all that information yet. So if anybody else has some ideas about where to look, I think I learned about R Weekly at Posit Conf this year.

An anonymous question from a bit earlier was, what are the biggest challenges you face when putting data science solutions in production?

So, I mean, to be honest, we haven't faced a whole lot of that yet because we are still pretty fresh. My brain goes back to the way that we created a model that kind of estimates performance in between evaluations and deliver those results. But because that model is such a kind of a tightly guarded secret, I consider it probably the most critical piece of intellectual property we have at INPO, because if anyone finds out what goes into that model, then there's no way to build a better model. So delivering those results meant that we can't tell people what goes into the model, which means that when a station's value changes, they, of course, want to know why. That's just a natural thing to ask. And we didn't really give people a great way to explain that at a kind of a general level. So last year, we built a tool to help them kind of interpret that score and the changes to that score. And every time we would build this model, we'd get this mountain of emails asking why this and that changed. And then we built this tool that helps give context to the changes. And now it's kind of crickets every time we publish the model results, which is good news for everyone.

And then we built this tool that helps give context to the changes. And now it's kind of crickets every time we publish the model results, which is good news for everyone.

Past that, we're going to be building more kind of tools, models, APIs, but we haven't done it yet. So I'm not really sure what the challenges are. So if anyone has any lessons learned, I'm so ready and willing to hear what y'all have learned as part of kind of pushing out Posit Team or any other sort of data science platform.

Estimating ROI for data science projects

Yeah, sure. Hi, everyone. So basically, I'm trying to relate to something I'm currently going through at work where we just set up an analytics team. And before the business is willing to really commit any resources and invest, we keep getting this ask to estimate the ROI we expect from the project before the business is willing to commit any time or resources to help us with it. Basically, I'd like to ask you for ideas on how do you approach estimating the ROI from a project before you even have the resources or the buy-in to start looking into it? And follow-up question, once a project is implemented, how do you look back and estimate the ROI from a project you've implemented already?

So this is a really challenging question to ask because we are a nonprofit with a mission to make sure that the industry performs the highest standards of excellence that we can set. So coming up with an ROI, it's really hard to quantify that. And we have different ways that we do that, but I don't really maybe think that they're more generally applicable to what you're talking about from a pure project management standpoint. Ideally, I feel like what you would want to do is, you know, put together. I was going to try to answer it well, but I don't have a good answer for you. It's just, it's not something that we deal with in the same way that I think that you do.

I can share one from another hangout we had with Natalie O'Shea at BetterUp. Natalie shared a bit how when she was going through this process for getting approval for Posit Team, she focused in on one specific problem that the team had. And so the use case she gave was their consulting organization. So their sales team had to make these PowerPoints over and over again with new data and based on different industries. And to actually make a great deck for a presentation was taking hours and hours. And so they sent out a poll to all of the sales reps to ask how much time were they taking building these, and they actually put a dollar amount to it. And so I think it ended up being like $1.2 million a year in people cost. And then she showed how she could automate that with a Shiny application that the sales team would go to the Shiny app, kind of put in whatever they needed for the customer, and then automatically generate the PowerPoint, which was tied directly to their database as well. And so in using that one example, she was able to put a dollar amount to it. So maybe that's one tip is to think in your, when you're building this business case, to think about one team that you might be able to help, because then they might be the ones kind of joining onto your business case too.

Yeah, thanks for that. Only thing is that I think when it comes to automation of daily routine tasks and activities, it's a bit easier to estimate ROI as opposed to analytics modeling where you're dealing with uncertainty, right? And sometimes you're not even sure if you get anywhere with the time you invest, right? So yeah, it's not an easy task. And I would really appreciate any thoughts or feedback from anyone that's gone through that experience before as well.

Nick, this is Eduardo Castillo and I, actually I'm from the nuclear industry. I know Dave, we are a working group at INPO. And one thing I can offer from a data science perspective is that it's important to kind of go for big targets. So one of the big projects we're working on right now is around reducing diving frequencies for cleaning some of our intake structures. And that's a very big, you know, $1 million a year project where we had no ongoing data science efforts to improve performance of that evolution. And so having a big target, I think was important because even if we could cut, you know, 5% of that, it becomes a good hard dollar ROI. And the second piece is looking at things like risk, like when you have like an activity like diving, you do have a lot of occupational risk. And we're actually able to tie, you know, what the occupational risk is and how it compares to other activities that we do at the plant. So it allows us to kind of capture both the hard savings and also the qualitative software savings. And again, just starting with a big target of project that, you know, is targeting a big budget, I think it's important.

Career advice and looking ahead

So a question that I love to get to ask everybody towards the end is, is there a piece of career advice that stands out to you, whether it's been something that you've received or advice you've given that you'd like to share with us?

I knew you were going to ask this, so I had to think about it in advance. And I had like five or six different things that I came up with. The one that I think I want to stick with is be open to doing things that make you feel uncomfortable. So I, you know, as a human being, but also professionally, you know, that's kind of where the growth happens. You know, if you just do the same thing day in, day out, you're just going to kind of, you know, tread water in a way. But if you, you know, lean into the things that make you uncomfortable in a healthy way, obviously, then I think that's, you know, kind of, that's where you can grow your, you know, your professional relationships, your technical capabilities, like your mental health in a way is kind of, you know, leaning into that and being okay with the discomfort knowing that, you know, you're going to learn something from it.

The one that I think I want to stick with is be open to doing things that make you feel uncomfortable. So I, you know, as a human being, but also professionally, you know, that's kind of where the growth happens.

Absolutely. Can you share an example with us of where you're doing that?

Being asked to do a data science hangout?

Yeah. Yeah. And the other piece, and I've been a part of them for seven years, that is Toastmasters. That's something, you know, working on your communication skills is as critical as a data scientist. You know, you can, you can create the coolest, coolest, you know, model, next big app, that sort of thing. But if you can't really communicate it well, then it's, you're going to have a hard time getting other people to understand why it's so important. So figuring out ways to kind of hone those skills are really, really important, just as much, in my opinion, as kind of the technical skills.

So this year was kind of focused on building out what we call our data science environment, getting Posit built. And now it's built, it's stable, and we're ready to use it. And I'm just excited to try to figure out all the different ways that we could deliver these really incredible data science products to the industry at large, and then also to INPO internally. It's something that we've never really had and really excited to put it to good use.

Awesome. Well, thank you so much, David, for joining us today. And thank you all for all the great questions and spending your Thursday with us. I'm trying to get better about saying this, but I talk to a lot of people from companies who might not know that other teams within their company are using our tools. So if you are ever just curious and want to chat with us about it, I made a little form for myself so that it was in one place. So feel free to put your name there, and I'm happy to connect you with others within your company too. But thank you again for spending time with us today. I really appreciate it, David.