Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Pozit Data Science Hangout. I'm Libby Herron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I would love to introduce our featured leader this week, Jim Weiss, Chief Risk Officer, Commercial and Executive at Crum & Forster. Jim, I am so glad to have you here today. Could you introduce yourself? Tell us a little bit about what you do, and something you like to do for fun.

Yeah, hey everybody. Thanks for having me on. Good to be in the hot seat. It's getting a little warm. Yeah, Jim, I'm a 44-year-old data science person in the U.S. property casualty insurance industry. I lead a small group of data scientists focused on ML and ML ops. They're all way better at data science than me, and also have responsibilities for monitoring emerging risk longer time frame issues. What I do for fun when I'm not doing all that stuff is like watching professional wrestling, love WWE, wearing an AEW hoodie right now. So if any professional wrestling fans in the house, holla.

Background on Crum & Forster and insurance data science

Sure, Crum & Forster, a couple of good books. A couple of good books you might check out are the Once in Future CNF by our CEO, which is the least self-referential corporate biography you're ever gonna read. And also the Fairfax Way, which is Crum & Forster's parent company, which is a little bit more self-referential. But essentially, we're a specialty property casualty insurer, and we mostly serve commercial businesses. So if you buy insurance on like your home or your car, it's like that, but it's like for businesses.

And where data science factors in a lot of ways, but one example where my team often factors in is just trying to find a good, fair, equitable price point for each policyholder who applies for insurance with us. So for that pizzeria example, I want a price that basically aligns with their risk. So I might be looking at stuff like how much pizza do they make? How many customers do they have? Where are they located? How much foot traffic is around the pizzeria? And we're typically using a regression model. So I told Libby before this, like love regression models. So feel free to challenge me on that.

I actually do have a background in PNC insurance, but I think most of us are from wildly different backgrounds. So what kind of data types would you see in your day-to-day or your team would see in your day-to-day?

Sure. Like no problems ever truly solved permanently, right? I guess to kind of take it back up a level to the data question, insurance is a super data rich industry. Historically, a lot of the data has been dark. Taking it to the world where I work and operate the US property casualty commercial insurance industry, a lot of our data we're getting starts in like just unstructured documents. So we're like getting PDFs, we're getting like spreadsheets, words with just information about what kind of insurance the person wants or the business wants, the attributes of the business, et cetera, pretty non-standardized.

In the previous several century history of the industry, a lot of times that's just been keyed in by people. As Gen AI in particular has become more proliferated throughout the industry, Gen AI is doing more of that work, but ultimately coalesces more towards structured data sets, which have what policyholders paid for their insurance, some of their risk attributes. And then most critically in a lot of cases, and for a lot of my team's projects, whether or not they had a claim and how much that claim cost.

So types of projects I'll typically plug in type of problem I'm trying to solve is might be looking at a portfolio of insurance that hasn't been profitable for us for some reason or other, try to figure out why using data science techniques such as regression. And then if we can isolate factors that are contributing to unprofitable pockets of our portfolio of policyholders, then we can either make pricing adjustments, potentially decide to exit that market, or most ideally for everybody, identify things about their risk profile that they can adapt and evolve.

And maybe to give a specific example from one of the many, many types of insurance Crum & Forster sells, I would point to cyber insurance, because there might be unfavorable risk attributes in a policyholder applicant. They might have poor website security, but we can look at all the attributes of the website, build a model that figures out why a particular type of website or website hosting structure is propense to losses, and then either introduce some sort of discount or surcharge based on your hosting footprint, or alternatively recommend ways you could change your hosting footprint in order to make your website safer and reduce what we're going to have to pay out in insurance claims.

Regression modeling and the Tweedie distribution

Love the question. Thank you, Russ. I think the most popular loss distribution in the history of insurance would be called the Tweedie. Has anyone heard of the Tweedie distribution, named after Tweedie Bird, the Looney Tunes character?

It's really good for insurance problems, because when you're looking at insurance profitability, there's at least two dimensions driving them. One is whether or not you're going to have an insurance claim, whether or not your policyholder is going to have an insurance claim, and the other is whether or not your policyholder is going to have an insurance claim, which logistic regression might drive at. Or you could look at, like, a Poisson, which gets into, like, how many claims your policyholder is expected to have. But then oftentimes of, like, greater significance in our world is how much the claim is going to cost. So what makes Tweedie a beautiful thing is it kind of combines those into a single distribution with a parameter that decides how much weight each of those two things get. And you can either, like, grid search that or just judgmentally apply a selection. The most popular selection in the history of insurance is 1.7.

The most popular selection in the history of insurance is 1.7.

But at the end of that, you get kind of a scorecard of the different risk factors in your model and what they're influencing. The one downside, I would say, is the model coefficients don't really provide attribution into whether it's, like, the claim propensity or the claim costs that's driving your outcomes.

So sometimes the best way to solve the problem is to realize you're not the person to solve the problem.

Bias, confounding effects, and DAGs

So I actually wrote shameless plug — I wrote a paper on bias management in property casualty insurance. Either just Google Jim Weiss bias or go to my LinkedIn, click a link. And it looks at confounding effects between protected classes in society and insurance outcomes. There's an evolving body of legislation and regulation in the US that essentially challenges insurers, especially personalized insurers, private insurers, insurers who are selling to individuals. This evolving regulation and legislation challenges insurance companies to identify these potential correlations with protected classes and root them out of their models and business practices. So from that perspective, identifying potential confounding effects, and specifically ones that are deemed by regulation and legislation and just general morality to be problematic, is becoming critical for insurance data scientists' work.

That's kind of an outlier example, though, where we actually need to give specific, dedicated, attributed thought to things like confounding variables. In a more general sense, for regression models, how we're typically dealing, confounding effects is either... we might put a control variable, dummy variable, in the model to siphon off some of that signal. We might just... general confounders we do want in the model, we'll look to not have multi-colonial variables, whether that's through selection or some sort of dimension reduction like PCA. And probably most common in our world, because we're stretched, right? So we're not going to do a PCA. We can't explain a PCA to our stakeholders. We'll probably just do a penalized regression, and that in and of itself will weed out some of the multi-coloniality for us.

Now, with respect to an ongoing basis and, hey, did we pick the right stuff? And do we have to take a remediation? From there, I think it just gets back into classic model life cycle management and just looking at your drift, looking if it's still maintaining its predictive power over time. And as soon as we see red flags in our leading indicators, getting back into that model and taking remediative action as soon as possible.

Career advice

Yes. This might be self-evident, but it wasn't self-evident to me until I was in my mid-30s, having my midlife crisis where I was under a mountain backlog at my prior company, and all my stakeholders were mad at me. And I was pulling all-nighters to keep up with my backlog. And I guess what I found when I was going through it was I really found out who my friends were, and it surprised me a little bit because it wasn't necessarily who I thought it would be. It wasn't even necessarily STEM people. I remember one of my coworkers at my prior company who really helped me through it was a lawyer named Bob Becker, who is not on this call, I'm pretty sure. And he basically gave me a lot of advice for how to manage a backlog. And it turns out there's a lot of knowledge and practice and discipline in the legal profession that can actually be applied back to managing a STEM backlog.

So I guess the advice I would kind of cultivate that into is if you see somebody struggling, help them. And that doesn't need to be STEM help. Like that can also just be like emotional help, right? And that pays back in a lot of different ways because A, they're going to have your back when you're going through that midlife crisis. But B, you establish this more diverse network that brings a broader set of perspectives into your context. So if you provide a lawyer, emotional help, then you don't have a lawyer in your network who broadens your context and gives you more tools in your arsenal to solve problems. So that would be my advice.

Using AI at work

A lot. I mean, we're vibe coding everything now. I would say, I don't know how weird we are. My sense from the public reporting is we're not super weird. But at least in my work and with my team, we're still looking for ways to go from kind of POC to production and develop those really high-value use cases at an enterprise level.

Libby, I'll give a kind of a quick weird case I have that might be helpful to people. So we have Copilot and all the foundational LLMs in our org. I like to use the LLMs to psychoanalyze myself. So I think it's very important to me as I interact with my stakeholders that I have a consistent philosophical core that I'm articulating and presenting to them. So what I like to do is to expose all my transcripts and communications to Copilot and then evaluate kind of the ethos behind how I'm interacting with them and use it to understand kind of my own philosophical core and evolve that. So I think it's a good introspection tool and thought partner in a way that folks might overlook if they're purely focused on transactional use cases that deploy to kind of their customers' use cases. You can be a customer as well, and it can help you on a personal growth level.

Absolutely. All right. Well, I will end us, if you can, lightning round this one last one. It's an anonymous one that says, how does your team span expertise in such varied fields? For example, you talked about cybersecurity versus climate change and all kinds of things.

I think working in a diversified org helps a lot. My team is all actuaries who are basically insurance statisticians. So we're not diverse at all, right? If you're in a smaller company, I've never worked at a small company, so I don't have a lot of advice, but I guess my advice is work for a big company where there's a lot of subject matter experts and utilize them.

Fantastic. Yeah. And meet people, make friends, right? And talk to a lot of different people about a lot of different things. I'm very lucky that I get to do that. Well, this was so much fun, Jim. I think that you being so wacky and fun and like loosey goosey made this just the most fun ever. Everybody asked fantastic questions. Thank you for indulging me in my very specific property and casualty regulatory question that nobody else cared about. Thank you to everybody who asked a question today, who answered our polls.

I am so excited to say that we have a fantastic guest next week as well. If you would like to join us, it's Nikolai Breikov. He is the manager of advanced analytics and outcomes at the Children's Healthcare of Atlanta. So come join us next week. And then next week at the Data Science Lab, which is on Tuesday, we have Joey Marshall, who is our own Data Science Hangout crew member. He is going to be using Cloud Code Live to work through a Tidy Tuesday data set. And the way that Joey works is fascinating to me. He doesn't type anything. He uses dictation. It is like next wave, so fun. Come join us on Tuesdays at the Data Science Lab, pos.it slash dslab. Take care, everybody. I cannot wait to see you at Conf, whether you are in person or virtual. Don't forget that registration is open. And I will see you on Tuesday or next Thursday. Bye.

Regression models still rule in P&C Insurance | Jim Weiss | Data Science Hangout

Transcript#

Background on Crum & Forster and insurance data science

Regression modeling and the Tweedie distribution

Unusual variables in insurance models

Handling model changes and stakeholder management

Bayesian vs. frequentist approaches

Tools and software

Fraud detection and co-opetition

Bias, confounding effects, and DAGs

Career advice

Using AI at work

Featured software#

pointblank

rstudio