Matthew McDonald @ KBRA | Data Science Hangout
We were recently joined by Matt McDonald, Senior Managing Director at Kroll Bond Rating Agency (KBRA) to chat about “punching above your weight and running a small, effective data science team. Speaker Bio: Matthew McDonald is a Senior Managing Director at Kroll Bond Rating Agency, responsible for managing the Quantitative Modeling team. Matt joined KBRA in 2015 to Build out KBRA’s Model Risk Management framework. Before joining KBRA, Matt held various modeling roles at GE Capital, IBM Global Financing, priceline.com and PriceWaterhouseCoopers. Matt holds Masters degrees from Columbia University and the University of Connecticut, and a BA in Mathematics from Colgate University. ___________________ ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co LinkedIn: https://www.linkedin.com/company/posit-software Twitter: https://twitter.com/posit_pbc To join future data science hangouts, add to your calendar here: pos.it/dsh (All are welcome! We'd love to see you!) Thanks for hanging out with us!
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hi, everybody. Welcome to the Data Science Hangout. Happy December, everybody. If this is your first time joining us today, so nice to meet you. I'm Rachel. I lead Customer Marketing at Posit. If it is your first Data Science Hangout today, I encourage you to say hi in the chat so we can all welcome you in here to anybody joining for the first time too.
The Hangout is our open space to chat about data science leadership, questions you're facing, and getting to hear about what's going on in the world of data across different industries. So we're here every Thursday at the same time, same place, unless it's a holiday. But if you're watching this recording on YouTube later, the link to add it to your calendar will be in the details below as well. We're all dedicated to making this a welcoming environment for everyone. So we'd love to hear from everybody, no matter your years of experience, titles, industry, or languages that you work in. It's totally okay to just listen in here if you want, or maybe you're out on a walk or on a lunch break or whatever time of day it is for you. But there's always three ways you can jump in and ask questions or provide your own perspective too. So you can raise your hand on Zoom and I'll keep my eye out here. You can put questions in the Zoom chat. And if it's something you want me to read out loud instead, just put a little star next to it or asterisk so that I know. And then lastly, we do have a Slido link where you can ask questions anonymously too.
Introducing Matt McDonald
So thanks so much, Rachel, and thanks to everybody for joining today. And thanks to Posit for hosting the Data Science Hangout. So my name is Matt McDonald and I live in Stamford, Connecticut with my wife and daughters. And I tell people I'm a statistician when they ask me what I do. And I've been involved in data science for quite some time, coming up on 20 years, I would say. I work for Kroll Bond Rating Agency. So I think we rebranded recently. So we are now known as just KBRA, I would say. And so what does KBRA do? So we provide credit opinions. So we're a financial services company providing credit opinions in the form of credit ratings. We've been around since 2010. So we are a post-financial crisis credit rating agency. And our competitors are companies like Moody's, S&P, and Fitch.
I've been at KBRA since 2015. So I just surpassed my eight-year anniversary. And when I started here, I was in charge of model risk management, which is something that if you're in the financial services or if you work for a bank, it might be something that you're familiar with. There's a need because a lot of these companies that I've worked for in the past and that I work for now are regulated. There's a need for controls on the model. So we have people who are dedicated not to building the models, but evaluating the models. So I started my career here at KBRA in the model risk management space. I sort of created that framework and rolled it out in the company. And now I'm in charge of the quantitative modeling group. So now I'm sort of on the other side of the fence. And I'm in charge of building the model.
So I run a small team of data scientists, econometricians, statisticians, financial modelers. And our job is to provide models to credit analysts. So these are models that these people – no, these people – that are used in the credit rating process. So a lot of them are like statistical models. You can think about classification models, logistic regression type approach. And other ones are like stress testing models. So lots of times we'll be looking at scenarios that sort of could happen but are bad and plausible. And we want to get a sense of what would happen to some financial portfolio or something along those lines. So we're looking to see when things get bad, how bad will things get, and how does that translate into our analysis of the credit.
Sometimes we get engaged to sort of help people sort of develop these scenarios, like what would an adverse scenario for interest rates look like. And we get involved in that. And then there's sort of cats and dogs type stuff around data analysis. So we'll be drawn in and engaged to help people analyze sometimes large data sets, sometimes not large data sets. I'd say that we're not really in the huge data space. We're kind of dealing with data sets that fit into CSV files that we can look at. So that's kind of what we're doing. Those are the types of models we're doing.
Team building and tools
I always follow the sort of guidance of like hire people who are good and are talented and that you like to spend time with. And so far, I've been quite lucky.
And, you know, I mean, we have, I mean, shameless plug for Posit. So we use the Posit Teams platform. Right. So that allows us to do our work in a common environment. And we, you know, we have controls in place that help us make things repeatable. We're using Git. We're using Renv. If we're building in R, then, you know, we're making sure that our environments are sort of repeat or we can build them up. I mean, we're not 100 percent there, but we've been experimenting with things like Vetiver and Plumber for deploying models via API that are then usable by people.
One of the things that we we sort of have been really working on and really focusing on is like establishing sort of ourselves. I mean, the conundrum that we have is that we're a small team. Right. And we kind of sit between a technology organization, we're building software and and a in credit analysts who are doing like the work of credit analysis. Right. So like the tool gets between those two folks and between us are like kind of radically different. Like I would say the analysts are largely using Microsoft Excel for a lot of their their work. The technology folks are using Python mostly for like in a software development context. And, you know, I'm an advocate of using the right the right tool for the job. So we're using Python and R in more of a scripting context to sort of do our analysis and build our model.
Breaking down barriers between teams
Yeah, I mean, it's like it's a balancing act, I would say, between you know, reaching out to people and operating on their turf, you know, talking their language, but also working very hard to sort of retain some sort of line line of ownership of things. Right. So, like, you know, I would say, like, when we do deploy an API, we've sort of had some success and done some experimentation with, you know, providing people with some code, VBA code in their Excel spreadsheet to call our API so that they don't need to learn, really learn a new skill to use our work.
And like going the other way, you know, some of our projects, some of our models are not really that that sort of like regression based model where all I have is a set of data and I'm going to create a prediction. Some of them are really more software type models. They're more complicated. These are like simulation models, Monte Carlo simulations where we're going to that's a that's a pretty standard in credit risk modeling, like a pretty standard approach. And so, like, you need to build software to do that. And when we, you know, I want to I don't want to build own and maintain software. That's sort of a job unto itself. So we do engage with technology.
I'd say that we've been had a lot of success with, like, learning the tools that are employed by technology and trying to figure out how we apply those to our work. So, like, everybody on the team has been learning Git. We really are getting very good at Git and we find that Git is a super useful tool, not just for, like I said, we need all of our work to be repeatable because we could be sort of engaged by a whole bunch of people who are asking us to explain what we've done. So Git is super helpful for that. And all of our code is like transparent, inspectable. People can see all the changes that have happened over time.
But also we're using some of the project management tools that they have. So things like JIRA and Wikibase documentation things are we've had a lot of success with.
Explainability and model choices
Yeah, no, I think that there's no like regulatory need or sort of like nothing dictating what kind of like algorithms we can use to build our models. But I would say we definitely sort of stay on the side of simpler, you know, maybe maybe slightly less performative models that are explainable because it's more of like not only do I have to explain it to the credit analysts, but like they they probably need to explain to like, you know, a lot of other people how they arrived at their decision. So if we have just an absolute black box, that doesn't really work for us. Although we have like used some of those approaches to help us like benchmark our models. So, you know, when you, you know, you have kind of like a matrix of like model performance and and interpretability, I would say, would be kind of a graph I've seen in the past. And we're kind of like on the interpretable end of the spectrum. But, you know, lots of times we'll ask the question, like, what what are we leaving on the table as far as like model performance when we make that decision? So we'll like employ like a more black box type approach to say, like, look, if we didn't care, we could explain this to anybody ever. What what could we achieve? And we kind of use that as a in our decision making process.
Using Quarto and publishing results
Another thing that we've had a lot of success with is when we're doing like more of an analysis or helping somebody sort of maybe we've actually written a few research pieces, mostly internal research pieces in we've been using Quarto quite a bit. So Quarto is a very, very useful tool to create documents that are sort of very closely interwoven with the underlying data.
And that's been a super useful tool. And that kind of gets us out of the business of emailing spreadsheets and Word documents and that type of stuff. Like I much, much, much prefer the pattern of publishing a document. We publish it to our Connect server and sharing a link with somebody. And then if I need to update it, I can update it there. And there's even some really nice features that we've we've seen about like you can have the same code, the same Quarto document generating the web page that someone's looking at and downloading it as a PDF. And the PDFs look really nice. So we've had some really good success there.
Learning Git from the ground up
I would say we've been learning from the ground up. And, you know, I was thinking about this recently that I must have taken 10 different times that like said, all right, I'm going to learn Git. And I go, you know, take some sort of online course and sort of learn the fundamentals. And it never stuck. Like, I just was like, I don't really get this too much. And it wasn't until actually we were having help from, you know, the folks in our technology team on how to set up our repos. I would say like none of it really stuck until I was put in a position or I put myself in a position where like I had to use it. And then once I, I mean, once we started collaborating across the team and I, that's when I really saw the power of it because, you know, I'm from a place before Git, I mean, back in my day, it was like, you'd have some code and then you'd, if you needed to change it, you just were praying that it didn't break. And if it broke, you kind of were sitting there like thinking, oh boy, I hope I remember what I did so I can undo it. And it's like really liberating to not have that kind of be carrying around that mental luggage all the time about, about what have I changed over the last two hours? And, and could I, could I sort of recover what I had before? So, I mean, I'm a huge Git advocate right now.
So, I mean, I'm a huge Git advocate right now. There's a really good, good paper out there, Good Enough Practices for Scientific Computing. I think Jenny Bryant, who's a Posit employee is a co-author of that thing.
There's a really good, good paper out there, Good Enough Practices for Scientific Computing. I think Jenny Bryant, who's a Posit employee is a co-author of that thing. And that's a really good, good, she also has a Git for R, I think, or something like that, or some book like that. Happy Git with R. Yeah, that's the one. And that's like, you know, even nowadays, if I'm doing something and I'm not collaborating with anyone, and I'm in RStudio, I'm starting up a new project. I mean, I'm clicking that Git repo button every time because that's like, that's my safety net. So I'd say like, I don't have really, I mean, we were lucky we had people in our organization that helped us get over the hump. But, you know, the best way to get familiar with it is use it. So it's kind of a catch-22, I think.
Using Shiny for model communication
Yeah. Yeah, we do use Shiny. One of the things that I get a little bit nervous about with Shiny is it's such an amazing tool. I would say that we've kind of limited our use of that to more of the model development piece. So when we're building a model and we want to communicate to the users or the model owners or the subject matter experts, the people who know this stuff, how this model works, if I put in this input, what kind of output, what kind of sensitivities am I seeing? Shiny has been a really great tool for that.
My concern about Shiny is if I build something that is super awesome, which is really accessible with Shiny, like you can build really amazing things, my concern would be that I don't view myself as somebody who writes production code. So if it were to become part of a core part of people's day-to-day lives, I'd get nervous about something along the lines of getting a call at 11 p.m. that the Shiny app's down and can you fix it, Matt? And I'm the only guy who knows how to fix it. So it's a bit of a double-edged sword. So we do try to really leverage our technology teams when it comes to building core functionality, software functionality that people need. So it's like with great power comes great responsibility type of a message I'm saying there. But we've had great success with Shiny. I think Shiny, it's a really great tool. But mostly in the prototyping and sort of early results stage, I would say.
Collaborating with Excel users
Well, we, we have, we have experimented a little bit with like writing sort of VBA code that we can share with people that would sort of allow them, give them access to, in a non-programmatic way, sort of to an API. Like if I were to give, if I were to publish the best API in the world, you know, I, I think there's probably 80 to 90% of the people who, who I work with just, it wouldn't be particularly helpful for them because they don't know that, they don't know how to interact with that.
JD Long did the keynote speech at the Posit conf last, last, I don't know when it was, recently. And, you know, he, he's a big advocate of like, you know, being mindful and empathetic for people because ultimately people, you know, they're just trying to do their job and they just want to get things done. And my job is to try to help them. So it's, it's definitely a challenge though, because what I, I do really want to sort of maintain some sort of, some sort of line between like my work and their work. And, you know, I don't really want to become their, you know, Excel support line type person. So that's kind of a thing I'm always trying to balance.
LLMs and security concerns
Yeah. So we are, I think that's in our future and that's something I'm really eager to, to sort of engage with and have that kind of assistance with. ChatGPT, GitHub Copilot is, I've sort of dabbled with like at home, but like, you know, we, as an organization, you know, I think there are some security concerns, you know, we don't generally like to, we, I mean, we are dealing with sometimes non-public information. So we are very kind of locked down.
I work at a law firm, so I feel your pain. We have our own version of ChatGPT hosted in Azure that's just for us. So I mean, I will tell you, it is so much better at writing Roxygen comments for functions for a package than I ever could be. Oh, I can't wait. We have not, we have not yet engaged with that. That sounds like an awesome, awesome use case of that technology.
But we haven't done that yet. And, you know, I sit right next to the head of security and, you know, I'm talking to technology and that. I do think that like at some point, everybody's going to have that tool because if you don't, then you're kind of just not performing as well as your competitors. Yeah. We're just not there yet.
Establishing trust and ownership
I mean, it's like it's a balancing act, I would say, between. You know, reaching out to people and operating on their turf. A lot of the sort of difficulties sort of we've encountered being in this role between like the credit rating analysts who are doing their job, they want to rate the, they want to do the credit analysis and technology who's also doing their job is that, you know, it really opened us up to it being just people not really understanding what our role is. Right. So like we, lots of times we really do get confused with technology. Like we'll get a call from, I'll get a call from senior leadership, people who are very senior in the company and say like, I need this thing that does X, Y, and Z. And they're not describing a model. They're not describing like really anything. I, I, I'm, I do, or I'm really good at.
So, you know, that sort of establishing, like trying to continually hammer home, like what described to people, what is it that I, what would you say you do here? You know, I, I, that's a, that's something that I, I, um, I've sort of encountered a lot like that, that clarity and making sure that people understand that. And then just that idea of like trying to really, because we're, we're a smaller team and, you know, we do have limited resources, um, just establishing those lines of ownership, right. And helping people without like, you know, doing their job is kind of a thing that, that can be difficult too.
Yeah, I mean, it's like, you know, I've, I've had the opportunity to sort of do things that are sort of have that overlap between stuff that my company needs to get done and stuff that I'm interested in doing. So that's been really good. And it's been nice to sort of be able to, to sort of call the shots along the way.
Building trust through demonstration
Yeah, I'd say like early on, and it was, I used the Shiny app to get it done, right? So early on, this was before when I was doing like the model validation stuff. So I kind of had, I was starting to get insights into like, what are the sensitivities? What are driving this model? How does this thing actually work? And I had just learned Shiny. So I was like, I took that knowledge and I sort of built a Shiny app that allowed people to like move a slider, press the button, see the graph type of stuff. And I would say that like people's socks were knocked off. They were blown away. So those types of moments are like, not just useful because now they have something that they can look at the sensitivity, but like from a more personal perspective, like suddenly people were saying like, oh, Matt kind of seems to know how to get things done. Maybe we should trust him. So I think it's like looking for those types of opportunities. And it's never like a, there's no playbook there. It's just, you got to kind of keep your head up and you got to know what you're capable of and deliver. And that's really useful.
It's never like a, there's no playbook there. It's just, you got to kind of keep your head up and you got to know what you're capable of and deliver. And that's really useful.