Resources

Data Science Hangout | Unity Health Toronto | Deploying & Monitoring Models Across a Hospital

We were joined by three leaders from Unity Health Toronto: Derek Beaton, Jamie Beverly, and Sebnem Sahin Kuzulugil (surprise special guest! we will be updating the hangout image!) ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.com LinkedIn: https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/posit To join future data science hangouts, add to your calendar here: rstd.io/datasciencehangout (All are welcome! We'd love to see you!)

Oct 27, 2022
1h 4min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hi friends, welcome to the Data Science Hangout. If you're joining us for the first time today, it's nice to meet you. I see a lot of familiar faces, so I know a lot of you have been here before. The Data Science Hangout is an open space for the whole data science community to connect and chat about data science leadership, questions you're facing, and what's going on in the world of data science. The sessions are always recorded and shared to YouTube as well as the RStudio Data Science Hangout site, which we can share in the chat here too. So you can always go back and re-watch or find helpful resources too. We do also have a LinkedIn group for the Hangout if you ever want to continue a discussion with someone or ask feedback or see a summary of the past week's sessions.

I will encourage people to post in there and start conversations so it's not just me talking there, but together we're all dedicated to creating a welcoming environment for everybody. So we love when everyone can participate in these sessions and we can hear from everyone, no matter your level of experience or the area of work that you focus on. There's always three ways that you can ask questions. So you could jump in by raising your hand on Zoom. You can put questions into the Zoom chat and feel free to just put a little star next to it if you want me to read it out loud instead, or I can call on you to introduce yourself and add some context too. We also have a Slido link, which I'm sure Hannah is sharing here in the chat, where you can ask questions anonymously too.

I am so excited to be joined by two co-hosts for today for the first time ever. Today we are joined by Derek Beaton, Director of Advanced Analytics, and Jamie Beverley, Director of Product Development at Unity Health Toronto. I would love to have you both introduce yourself and maybe tell us a little bit about each of your roles, the organization, and maybe also something you like to do in your free time outside of work. Derek, do you want to start first?

Introducing the team

Sure, sounds good. So I'm Derek, the Director of Advanced Analytics in a group here called Data Science and Advanced Analytics. The overall group, and I'm sure Jamie and I will share different details on this, it's a data science unit in Unity Health Toronto, which is three different hospitals here in the city. We have four different teams across the Data Science and Advanced Analytics team. Advanced Analytics is the team I'm part of, Jamie's is the Product Development. We have a Data Engineering team and a Project Management team. For Advanced Analytics, my team uses lots of exciting tools, less exciting tools, to understand data, bringing predictive analyses to different clinical problems or different resource problems in the hospital, scheduling assignments, alerting, a whole bunch of different things. And we're diving into kind of new domains, including medical imaging and a few others that we can chat about in a little bit. Yeah, and then I guess for me, stuff I like to do, I like to go running, I like to go hiking. My internet right now is probably very bad, because I'm actually at a cottage and I'm about to go hiking after this.

Sure, yeah, thanks. Yeah, I probably don't have a ton to add to the broader structure of our team. So I'll talk to my particular team, the Product Development team, which is in Data Science and Advanced Analytics. So our team is more focused on the software engineering and design side of things, so not so much data science. But once we have a model that's developed by Derek's team, how do we get that model into production and into a frontend that our end users use? So for doing that, we work a lot with sort of HTTP APIs, like wrapping models in Plumber or Flask or FastAPI, and then building frontends, Shiny or React or some of the other platforms that are supported on RStudio Connect. And then we also do some design work, so creating mockups that work with our end users to converge towards the design, if that makes sense. About my sort of like background and interests, I was more sort of on the humanities and cultural studies, science and technology studies side of things in school, and then pursued a master's in computing to get more oriented towards the software side of things. And then in my free time, I spend a lot of time with music stuff. I have a lot of synthesizers beside me here that are quite visible.

Cool, thank you both. So I know this is the first time we've ever had two leaders on together, so I thought it might be good to ask how both of your teams work together and how you started to have this great relationship, and why it's so important that both of you are on the Hangout together.

We actually have three of us here on the call. We've got someone in the audience here as well. So Derek mentioned we're kind of comprised of four teams, which starts at our data integration and governance team, which is led by Shetland here. So composed of data engineers, ETL developers, data governance specialists, all the folks who know how to get the data from a very messy legacy source systems into a format that's accessible for both modeling and for real-time applications. And then once we have data in those formats, that's where Derek's team does all the work on the modeling side of things, so training models on top of that, and analytics and reporting. And then once we have a model that's ready for deployment in a production application in one of our hospitals, that's where my team comes in to build those front-end applications. And then there's a fourth team that's composed of project managers that keep us moving on all pieces.

No, that covers it. I guess one point to add to the project management team, our VP Mohamed is frequently reminding us that data scientists aren't good at the management part. So the project management team are fundamental to making sure we can get done what we do.

That's great. And Sevnam, I didn't know you were joining us as well. A special welcome to you. Thank you for joining from the team as well. If you want to jump in and introduce yourself.

Yeah. Well, I'm not one of the speakers. I'm actually here to listen, but I was pulled. Hi, my name is Sevnam. I'm the Director for Data Integration Governance at Data Science and Analytics. I work with Jamie, Derek, and our indispensable project management team. Yes, Derek is right, and Mohamed is right. We need somebody to keep us in sync. Otherwise, we would probably fall apart. What I want to add to Jamie's really good summary is that the things he talked about, like getting the data modeling and then front-end development, it kind of looks like a linear process, but it's not actually. There is a lot of overlap. So as soon as we start looking at the data, Derek's team starts looking at the modeling possibilities, while Jamie's team starts looking at the deployment options. So we kind of work in parallel, and it's the project management team that actually kind of makes the pieces click together. So that's why we have to keep in touch all the time with great communication.

MLOps and model monitoring

That's great. Niall, you asked a question in the chat. Do you want to jump in and ask that?

Sure. It's pretty simple, but maybe you can go into some more depth. I'm just curious about how your teams are managing MLOps. You've got these models in production. How are you monitoring them? And then does the user development and product development teams play a role in creating that monitoring process, or is that fully with the advanced analytics group?

I'll jump in for part of this. So on a lot of the monitoring side, we have, I think a lot of it is on our side for now, where we have a lot of reports that come out to us. We're building a dashboard to actually centralize all of our different deployments, so we can watch what's happening with data, model performance, any sort of drift or changes. I would say for a lot of the MLOps type things, we're mostly focused on the practice as opposed to any particular tool sets right now, but we are trying to rally behind quite a few tool sets. I think a lot of these are largely out of some of the movements we've seen recently in Vetiver and the other RStudio packages that are coming out from Julia, where monitoring models, getting model cards, and having these quickly accessible to us is where we're moving. And we're moving in this direction because I think there's been a significant expansion in a lot of what we do across a lot of different data sources in the hospital with a lot of different problems.

Yeah, I would echo, I think, what their sentiment was there. I think we're kind of growing in that domain still and figuring out what that looks like for us. And I think simultaneously, partly overwhelmed, skeptical, and excited by all the things out there. Lots of things that run on a Kubernetes cluster that we don't really have the use case to have a massive Kubernetes cluster. That said, I'd say we have a pretty core set of requirements for most of our projects where we have a model, we want to version that model, we want to save predictions, all the predictions that that model generates. We typically want to save some data alongside that model. We want to have monitoring on all the predictions that model generates and saves to a database. And we want to host those models behind some HTTP API, typically. So those set of criteria, I think, are good criteria for engineering a system that can handle our use cases.

Lots of things that run on a Kubernetes cluster that we don't really have the use case to have a massive Kubernetes cluster. That said, I'd say we have a pretty core set of requirements for most of our projects where we have a model, we want to version that model, we want to save predictions, all the predictions that that model generates.

Use cases across the hospitals

I think it might be helpful to all put us in the mindset of the problems that your team is working on. What is an example use case across the hospitals?

Let's start with one of our flagships, ChartWatch. So it's an early warning system for deterioration. That, from the advanced analytics side, so that one, models were developed a while ago. They were wrapped up for Jamie's team to help deliver alerts. We get our pipelines from Shabnam's team. And then on the monitoring side, we have fairly frequently updated reports on what is happening, when alerts are sent out, or what kind of model performance we're looking at. So a bunch of different metrics to see how this early warning system is actually working on a regular basis.

And can you just expand on it for me? What does the early warning system mean? So if individual patients are potentially deteriorating, this will send an alert off to clinical teams that are set up in the alert system, or those that are there that day, to get an alert that a patient needs more attention.

We've got a few other cases, too, which are... So I'd say there's a suite of tools we have that are around this kind of alerting about a patient use case. We also have a few that are more focused on optimization models, like generating schedules for nurses. Very complicated and tool-laden task. And we optimize that to reduce the number of times a nurse is assigned to the same location, as an example. So optimization models, working on medical imaging as well. So detecting intracranial hemorrhage, and DICOM images, traumatic brain injuries, other projects.

Yeah, and a kind of suite of other ones that work more on the chart record.

That's really interesting. I see there was a Slido question about the early warning flagship product. What kind of models do you use? Are these some calculated metrics, or some kind of predictive models?

They are predictive. There were a whole bunch of candidate ones a long time ago, and then we landed on one of the best performing ones. And fairly recently, this early warning system has been moved away from us and to a company that we're collaborating with. But it's predictive modeling.

Legacy systems and data integration

Mark, I see you just put a question into the chat. Do you want to jump in?

Sure, yeah. I was just wondering, I know from personal experience that healthcare is pretty slow to update to more modern tools. So we end up with a lot of technical debt. I'm wondering how you, can you go into more detail about how you integrate some of those legacy systems with some of the more modern tools that you're using? Is it like workarounds, or is it kind of like pipelines into? Just kind of wondering what your approach is there.

I think I'd better jump in at this point. So yeah, our hospital systems are very fragmented, and no, we are not on Epic. So we don't know if these things play nice or not. We're going to find out in a couple of years, I hope. We are looking for a new EHR. We don't have one yet. So may or may not be Epic. I'm not really sure. Right now, what we have is pretty old systems. And we started, we actually started the journey by having coded pipelines, where code was attached to the modeling code. So the first piece of code goes into the systems, pulls data, merges it, cleans it up, and then passes it to the model, which was not really good, honestly. Because the same kind of information, say the patient's temperature, is relevant to like 10 different models. So we would be pulling that same temperature for 10 times for each patient, which overwhelmed our already old and overwhelmed systems. So a year and a half ago, I guess, we invested in a logical data warehouse, a data federation system, where we pull the data, cache it for a little while until the next pull, until the next scheduled pull. And all of the models are getting their data integrated, cleaned up, harmonized data from the logical data warehouse, the federation system. Does that answer your question?

Yeah, so it's kind of, just from understanding, it's kind of like a staging warehouse in between. Sort of, yes. So it's a, it's a smart AI supported, actually, smart federated system. So it optimizes queries on the go, we can pull data on the fly for not so frequently used systems, or cache the data for the systems that we use frequently. And we can do all kinds of data haggling in there. So yeah, you can call it a staging layer between the actual data and the models.

Travis, I see you had a similar question or a follow up question. I know the part about Epic was answered, but do you want to jump in?

Yeah, like Mark, asking from a place of experience with a couple of large hospital systems where Epic or Cerner type products don't tend to integrate well with more modern applications, like RStudio products, let's say, or even, you know, Flask type Python things. And so we've had ML teams trying to produce outputs, sounds like you solve the input problem well enough with like this sort of staging area. That's, that's a cool solution, because that was another thing. You can't really hit the live database of the EHR system, because that's needed for inpatient care. And if you hit it with a large model, things will fall over and bad things happen. That's cool, staging works. But then once you produce the model, how do you get it back in front of the care team, like the physician or the coordinators? If it's not on fire for like, like a pop up firing thing inside of Epic, or other EHR system, they tend not to see it. And they tend definitely not to want to go to, like a second browser window or somewhere else to find information, because they simply don't have time for those things. So how do you, how do you solve like the accessibility portion in the clinic side?

A few thoughts to add to that. I think an unsolved issue for us as well. We try a lot of different things for different projects, and a lot of it is kind of project dependent. So yeah, our older legacy systems, we, it's typically pretty hard to get any kind of data into the systems. So there are times we have to, we do have to provide a separate website that, in some cases, we can link from the EMR. In some cases, we can actually embed kind of actions within the EMR. So if they, we're exploring, we're doing this for one of our medical imaging projects, actually, where if a radiologist is within a viewing facility, and they're looking at a patient, and they're within a viewing, tool viewing, a CT scan, they can hit a shortcut on their keyboard, or right click and click a button that will open with context to our website. And then that's a domain we can control. But it's, I think, an uphill battle constantly.

Alerting is especially perhaps challenging, because there's, there isn't really a consistent alerting approach across the organization. Some units of the hospital have, you know, designated cell phone that they provide to the lead nurse for a shift. On another unit, they don't have that. Or it's, you know, the resident has an application on their personal phone that's rated to, for, you know, secure information, and we could target that. But it's very heterogeneous.

I guess one other tool we've been developing over the last year and a half, it's called our Operation Center, which is kind of an in-house hospital command center type tool, similar to some of the market products that are out there. So it's a place that folks in the hospital can go to see patient flow information where patients are being admitted, discharged, transferred, census of current patients and their needs, which is all fed from the EMR systems, but it's our own front end. So we've kind of developed that as a front end platform where we're hoping to centralize a lot of our analytics outputs. So it's not two different dashboards you have to go to for your, for our like data science outputs and predictions, but can be at least just kind of one centralized tool.

To add like a little bit more to that too, we work really closely with the individual like clinicians or clinician groups that, or other staff on understanding what might work. So there is a lot of customization that will come from like Jamie's team. And in other cases, sometimes it's really tricky to get say like some other website or some other awarding mechanism that's outside of the EHR. So we call in for help from other teams that will help handle like what this means for individuals with these deployed tools. So how do you get everyone up to speed? How do you get them trained? How do you get that information back out there? And we also rely on basically kind of leaders that are on the other side of these projects. So the people that come to us with these projects kind of have to be the champions to then get everyone in their units or their groups to be on board with what will actually go in place.

Project intake and prioritization

Alan, you had a question, and I agree with you. Yes, the range of these cases is really fascinating. Do you want to jump in and ask that?

Yeah, sure. Hey, everybody. I think I'm mostly just reacting to the breadth of things that you're describing that the team works on from stuff that's operational to like really like much more deeply clinical stuff and then super technical stuff like imaging, and it's a really impressive range. And so I wonder, you know, I imagine that within the team, you probably don't have the like domain expertise to be really deep in a lot of those things. So how do you get the partnership that you need? And this may stem from, this may sort of bounce from, you know, the last comment that Travis was making about, you know, champions of the work and stuff. So what does that partnership look like? And what does, because there's, it's something to have so many customers, understanding like who to work with and when given limited resources, I've got to think is a challenge sometimes. So this probably gets to the project management team that you work with, but also speaks, I think, to like, how are you situated in the organization such that you're the place where the work comes to, like at the right time, you know, strategically, how does that work? You know, thinking just all those aspects of like how do you operationalize projects coming in and doing work that people need?

So I'll speak a little bit about some of the data and analytics part where, so on the team, actually across like several of the teams, we do have some like subject matter experts that have experience on the other side of data science. So we have like a really diverse group of people that have a lot of experience in this domain. And I guess going back to the previous point about the partnerships, when clinicians or staff or other individuals in the hospital, in the network come to us, we require really like a lot of time and we spend quite a bit of time like working out the early parts of the project, frequent working group meetings to make sure that both sides are in very frequent communication to ensure that the technical, the clinical, the operational sides are all in touch and actually understand the problems with the data, the problems that we're trying to solve and say some of the challenges of bringing some of these things to deployment.

And like another small note here too, I guess, building off of like the breadth of stuff here, we all kind of have like little mini teams inside our teams too. I think it's fair to say sometimes the mini teams are like one person, but we have like little mini teams of expertise within each of the teams.

I'll just maybe add to that briefly. So just to call it explicitly, like all the projects we work on originate from our end user groups, from those clinical champions. So we have like an intake process, an intake form. So our end users come to us with this form filled out where we ask, you know, what's the current state? Like what's the problem you're trying to solve? What's the desired future state? What kind of metrics would you measure to say whether we've improved on something or if we've gone the opposite direction and made it worse? And then where data science fits into that, it's not all projects are solved of data science. I think that's kind of a bit of a luxury to have because we don't have to, well, we have very engaged clinical end users and we don't sort of have to do that sales pitch at the end of a project. As Derek said, there's lots of time vested from the clinical side to lose inherent interest in seeing a successful deployment as we get there.

I may add a few points about the prioritization part you asked. So we are a strategic part of the hospital. We are not research. So being in the hospital as a part of the hospital puts us under the mandate of the strategic objectives of the hospital. So when we're deciding on prioritizing the pipeline of work that is coming to us, as Jamie and Derek mentioned, we take into consideration stuff like feasibility and things, but then we also evaluate them based on the strategic goals of the hospital. And then we have two criteria. Are we improving patient outcomes or are we improving hospital efficiency? If a project does none of these, we just don't touch it.

Are we improving patient outcomes or are we improving hospital efficiency? If a project does none of these, we just don't touch it.

That's great. Thanks for all of those. I really appreciate all those different lenses on that process and hearing more about the organization and how things work. It gives me a bunch to think about with respect to our teams here. Thanks.

Yeah, I thought that the outline there is really helpful, too, like how you get the projects from the clinical champions, like what's the problem you're trying to solve? What's your desired state? I see there was a Slido question from a bit earlier, anonymous one. Could you give us some idea about the scale of your operation? So like how many models and how many data apps do you currently have in production?

So our team in total is about 30 right now, that's including the four teams. And in terms of applications, we have production, I think, like 30 or 40. And projects on the go right now, I would say 8 to 10.

React, RStudio Connect, and the tech stack

And I know something that we talked about previously was one of the eye-opening moments for you was when your team discovered you could publish React applications. And I was just curious of the use cases that you described, which is the one that's using that framework.

Yeah, I can speak to that. So I kind of came from a more software engineering, like full stack background where React is pretty ubiquitous. We were developing this operation center project. And we were originally developing in our Shiny, and then it felt that we wanted to leverage some of the React ecosystem for one particular component. We used, I forget what it is, is it Shiny widgets or something similar? A R package that lets you author React components and bind them as Shiny widgets to use in Shiny applications. And we were using that. And then that was really doing the bulk of our UI front-end complexity. And we decided, OK, might as well just write this in React. So yeah, I'm not sure why it took me so long to realize you could do this. But we discovered you could deploy React applications to RStudio Connect and have them talk to APIs just with fetch requests. So our front-end stack now is like a TypeScript React docs front-end that makes API requests. And we find it to be nice and quick.

I want to check to see some of the, for one of the Slido questions there was, for legacy systems integration, do you depend mostly on custom scripts or third-party platforms and add-on? For scripting, what's your preferred language and why?

For legacy systems, as I mentioned earlier, we are using this data federation tool. And I think we managed to connect to almost all of our legacy systems other than one that was very badly set up. So even the vendors themselves couldn't figure it out. So for these one-off cases where we can't actually use the legacy system, we either go with R or Python go to pipelines. Scripts that run on a server and then feed into either the model or the logical data warehouse itself.

Cybersecurity and IT relationships

Travis, I see you just put a great question into the chat there. Do you want to ask that?

Sure. I think we all have these. I was asking, can you tell us a good cybersecurity says no story and how you got around it? This is particularly prevalent in hospital systems.

I'll toss over to both Shem and Jamie in a minute, but it's not many no's. Because we do a lot of stuff completely internal. So we're in the data center. We're on this infrastructure that is approved by IT and security. Where there's some no is some cloud stuff. And it helps to have a liaison or spy, someone from our team, tightly connected to IT and security to make sure that we know what's going on. And we have frequent communication with them.

Yeah, actually, I have system administrators in my team, apart from IT, because the hospital is mostly a vendor shop, but our team inside is a Linux island. We have our own sysadmins, and one of our sysadmins has the task of keeping our relationships with the rest of the hospital in a very pleasant level. And you may have realized that my title is data integration and governance. So we actually took over the data governance portion in the hospital as well. So we are managing that. And that kind of helps us do what we need to do in a safe and secure way, while also helps us getting less no's. It's extra work for extra permissions, if you want.

That's great. I see Richard, you said you do that exact same thing. I'd be curious to hear a little bit more from you too.

Yeah, it's very similar, actually, strikingly similar. I'm in a large home care agency in New York City that is integrated with a health plan and a hospice and a long-term care system that's managed at home. And a lot of the no's that we have from cybersecurity have to do with cloud. Having data, having to cross that boundary, get translated from one virtual environment to another. But yeah, so our cloud engineer on our team sits on the cloud engineering, all of their conferences for the larger IT department. I worked inside of a data science department in this organization. So he sits in both roles as the liaison and keeps us on board with what their concerns are and vice versa. And that way, we in data science don't try to do a lot of things that we know they're not going to go for. And so that keeps us focused on the things we can do and kind of keeps that friction to a minimum.

Public-facing deployments and infrastructure

I had a question as a follow-up from when you were talking about the whole conversation of on-premise or in the cloud. Are there any use cases from anybody on the call, as well, where you have both internal applications that are behind a firewall, but then also other applications that may be exposed to the public or other stakeholders?

I think we have one such deployment. We developed a tool a while ago for predicting ED volumes, the number of patients waiting in the ED, which we deployed at one of our hospitals—actually, two of our hospitals, and then also created a deployment for a hospital outside of our network. That's the one that was running in a Docker container in a special part of our infrastructure that we could open up to the internet to be accessible from this other site. I don't know much of the specifics about how all that was configured, though. I think we had something on Shiny apps at one point, too, which was more public-facing.

Because, Rachel, that's a great question, and it's definitely one of the cyber-says-no areas that I've never found a good solution to in a hospital system. They usually say, even if there is no data in this application, we can't make it outwardly-facing because it's running on our servers, which then do connect at some point to our data. So, they want to stand up a whole new infrastructure to host, in some cases, what is an entirely computational simulated calculator or something like that. So, is that what you all's strategy is, or is it different?

Correct me if I'm wrong, Jamie, but I think our public-facing or external-facing applications are running on a VP LAN. I think that's correct, yeah. So, it's a virtual private LAN. So, IT and security manages it to their own satisfaction, and we make use of it. So, reducing the friction between us.

Dependency management and tooling

I see somebody asked on Slido, I'm assuming you use the RStudio platform, but before that, how did you manage dependencies, for example, package versions?

I think that was before my time. Everyone was pretty good at it when I got here. We're getting better.

Yeah, I guess, I mean, all of our R projects use RF very heavily and Conda and stuff for Python, and again, Yarn for JavaScript. And yeah, RStudio Package Manager as well. I'm not sure before about RStudio Package Manager. Sorry, I'm the oldest on the team. And yes, we invested in RStudio very early on, so I don't think we have anything that's not deployed through RStudio of one version or another. We have one Docker deployment. We also use GitLab, like on-premise, which is kind of important for us, too, because we can't really afford the risk of accidentally pushing sensitive things to GitHub or other cloud-based providers.

Model monitoring and drift

But the question in Slido is, how do the model monitoring, so data, model drift, look like in your work? What tools do you use, and what kind of metrics do you look at?

COVID ruined everything. So anything that was before COVID, COVID ruined everything. So we see different types of drifts there. We see it with waves. What we're monitoring in terms of, like, metrics is more project specific. So we don't have, like, a singular metric. In many cases, a lot of, like, the predictive algorithms are, it's very important to have good, basically good performance to avoid, like, false negatives, or to make sure you're capturing true positives. So we'll focus on, like, PPV and NPV, or false positive, true positive, false negative rates. But there's no singular way of doing this. It really does depend on what kinds of metrics, and what the problem really dictates. So we do have different ways of monitoring different models. Which is why we're trying to bring it all into one dashboard right now, too. That's some work we're doing behind the scenes, where try to bring a lot of these in one spot, so that it's a little bit more centralized, instead of living in inside of each project individually right now.

Team structure and evolution

I see, Niall, you had asked a really great question earlier that I missed, and love to go back to that on the team structure, if you want to ask that live.

Sure. I'm interested in some of the benefits and detractions of having the teams broken out the way you do, if you feel like they're... Yeah, your thoughts on it. And also, if this is something any of you helped create, and part of that process, we sort of got a one-stop shop team structure. And it sometimes works, and sometimes I'm asked to do things I have no idea how to do. So yeah, curious about your take.

I can add a few comments, maybe, about how it formed, maybe, and should them check me if I say anything that's wrong. But the team started, I guess, four years ago, roughly. It was primarily, or entirely, data scientists. So I believe the original team was three or four data scientists, more positioned within research. And then the second team to kind of grow was Shevnam's team. So the data integration and governments, building out the data warehouse. Following that, I believe, we realized the need for project management. And that kind of formed organically. And then the last to join was my team, focused on design and software engineering.

Yeah, I'd say that the structure works fairly well for us. I'd say there are places at the edges of our teams where I think we're still kind of figuring out what that looks like. There's some questions about MLOps earlier, and I think that's sort of one space that touches modeling, it touches data engineering, it touches software engineering. And we're not really sure where to pass that ball.

Yeah, I just want to add that until... How long have you been with us, Jamie, Derek? I think until that time, probably two, three years ago, we were actually one big happy team with no distinction, with no solid distinction between modeling and data engineering and product development. But then we had a lot less projects. All of us almost were jack-of-all-trades. But then the number of projects just exploded once we were moved from research into the hospital, as Jamie mentioned. And that's when we had to kind of delineate some of the responsibilities, because being jack-of-all-trades means actually you have to context switch like every three minutes or so, which is not productive at all.

Just to follow up, Shevnam, you said it was one big happy team before, but what was the process for splitting out the teams?

Yeah, we mainly divided ourselves based on our strengths. So yeah, and the project management was, at least our project manager, who was with us from day one, which is a really big comfort.

Yeah, absolutely. I see Alan just commented on that too and said it was really interesting to hear how the PM team was relatively early in the formation.

Derek, sorry, just cut you off. No, I was going to say that I was brought into the team with this structure already in place. I've liked it. I've liked compared to how I used to do things before, I'd have to, like Shevnam was saying, like do lots of things all over the place. It's nice to have that focus. But as Jamie said, especially around like ML Ops, where it feels like we maybe are missing a little something is like bridges and hooks between the teams for like those critical points in like development or monitoring or maintenance, where it's good to have people that know a little bit of, say, two sides, instead of trying to have someone that understands everything. But we have quite a few people that are interested in like crossing different domains and expertise across our teams.

Docker and versioning

There's one other question. I know we're just at the top of the hour. It's okay if I ask one more. Someone asked, curious, I think you mentioned Docker. Do you use Docker or depends solely on RStudio's versioning? In which cases do you use Docker?

I can probably speak to that and start off. So I probably use Docker as a catch-all for application containers, but maybe using Singularity or Optainer or something other than that. Where we're kind of interested and see use in Docker, I think a lot has to do with reproducibility of environments. So having your dev staging prod all kind of consistent and built from the same image. Also some of the flexibility it allows with regards to like stack and technology. So it's pretty quick and easy to raise a new service through Docker container. What we're figuring out how to do is orchestrate containers that becomes the next kind of piece and not just have a mess of static coded ports on the various servers. In terms of versioning, I'm not sure Docker has like a big impact on versioning. Like we've been trying to do everything through source control where I think trying to get closer and closer towards infrastructure as code. So anything such as a Docker file that you can have version control that defines how your environment is created. It's based on those configurations. It's sort of what we're aiming for.

Great, thank you. I just want to double check that I didn't miss anybody's questions. I think I got to everything that was on Slido. But again, feel free to raise your hand if there was something that we missed too.

Thank you so much, Shabnam, Jamie and Derek for sharing your insights with us and all your experience. It's awesome to see how all the teams work together too. If people wanted to get in touch or reach out to you, what's the best way to do so? Is it LinkedIn or Twitter?

Either, I'll drop an email here too. Okay, awesome. Yeah, LinkedIn is great. I think it's on the website already. Yes, it will be there too. We will update with Shabnam's information too. Thank you so much for joining as well. Appreciate all the great questions too. Hope everybody has a great rest of the day. Thank you so much for joining us.