Data platform modernization in insurance | Kshitij Srivastava @ Milliman | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hi everybody, welcome to the Data Science Hangout. If we haven't met yet, I'm Rachel, I lead customer marketing at Posit. Posit is an open source data science company building tools for the individual team and enterprise. I'm so happy to have you all hanging out with us today.

The Hangout is our open space to hear what's going on in the world of data across different industries, connect with others facing similar things as you. And we get together here every Thursday at the same time, same place. So if you're watching this as a recording in the future and want to join us live, there will be details to add it to your calendar below.

And I love getting to see all the conversation and connections being made in the chat. So I just want to remind people if you are interested in connecting with others, I want to encourage you to say hello in the chat and introduce yourself, your role, maybe some of the things that you work on or things you do for fun. We're all dedicated to keeping this the friendly and welcoming space that you all have made it. If you're hiring, please feel free to add any open roles in the chat as well.

It's also 100% okay if you just want to listen in here, although we love getting to hear from you live. So there's three ways that you can ask questions or provide your own perspective. So you can raise your hand on Zoom and I'll call on you to jump in. You can put questions in the Zoom chat and put a little star asterisk next to it if it's something you want me to read out loud instead. And then lastly, we have a Slido link where you can ask questions anonymously.

And so with all that, I know you all have heard that spiel a bunch of times so far. Thank you for joining us. I'm so excited to be joined by my co-host today, Kshidij Srivastava, Director of Technology at Milliman. And Kshidij, I would love to have you introduce yourself first here and share a little bit about your role, but also something you do for fun too.

Yeah, thanks, Rachel. Thanks for having me today. Good to chat. Great to meet everyone today. My background is in data science. Spent around, closing to a decade at Milliman, but started my career as a data scientist and my focus was, I mean, started in the big tech firms, but moved to insurance companies or insurance consulting to be specific, you know, very close after I started in data science.

And when I joined, I was working really with actuaries. So not sure how many folks here know the actuarial workflows, what actuaries do, et cetera. So I'll give a fair background about the actuarial profession first, because, you know, that's very important to my own career. But yeah, in Milliman, it's a traditionally actuarial consulting firm, but we're trying to expand to sort of serve other areas of the insurance business from actuarial now.

My own role has sort of started from data science, moved to data engineering, now leading a team of data scientists, data engineers, you know, DevOps technology professionals on creating applications for actuaries. So I'm happy to talk about sort of data science applications and within the broader insurance space, but more kind of specifically on the actuarial space.

Oh, yeah, I mean, my work has, in the last couple of years, I'm sorry to say it's been very hectic. So I've been moving geographies quite a lot. And that has left very less time for me to sort of engage in my hobbies. But you know, when I have free time, I like to be outdoors. I like to hike. I like to sort of go out. Very recently, I think I've actually met some folks at the Data and AI Summit in San Francisco. So I remember while I was there, I had one day which was, you know, I didn't have any meetings there. I took an 11-mile hike and, you know, it feels really good.

But yeah, I feel like outside of work, you know, that's what I do. I have two young girls, so I spend a lot of time with them at home, sort of, you know, hanging around. And both are pre-K, so spending time with them, sort of preparing them for kindergarten.

Insurance data science overview

So I think it probably, I mean, I don't know how many of our attendees are from the insurance space or have kind of exposure to the insurance space, but I'll briefly go over the insurance space in general, sort of what are the various functions at an insurance company, how data management, data sciences can help in, you know, decision making at each of these functions.

And then I'll jump briefly to sort of what has been a recent trends that we've been seeing in this space and cover the areas where innovation is happening and sort of where we see opportunities. So kind of going back to the basics, an insurance company, so in general, insurance is pooling of risks. So there needs to be measurement of risks to be able to ensure it. So I think that's typically the actuarial function within the insurance company is responsible for collecting data and then sort of defining and measuring risks.

And this happens through sort of collecting a lot of data, collecting external data on and then actually sort of define actuarial assumptions, which are basically what are the risks that we're seeing with this particular product. So let's say if it's a life insurance product, there are various types of risks. It could be modality risk, which is, you know, one of the primary risks with a life insurance product, but also lapse risk and your premium persistency risks, risks that people who have signed up to pay premiums will not pay premiums. People will lapse out of policy. So these are various types of risks that each insurance product has.

And so like to study and measure each of these risks is a very data intensive process, as you can imagine, specifically if in a lot of situations, companies are lucky if they have prior data that they can rely their assumptions on, but in a lot of situations where a company is interested in launching a new type of product. So we're kind of offering new types of guarantees that has not, you know, that the industry has not seen before, or there are no prior examples for them in those situations, it's tough to have data to back your assumptions.

So typically there are sort of a lot of data science use cases in that area, part of the work that, you know, Milliman does with a lot of our clients is to help them design these behavioral assumptions. And so we use a lot of machine learning models, we use sort of traditional statistical regression models to study, I would say our usage is more on the traditional statistical side rather than sort of more machine learning oriented, because in general, actuaries typically want to understand sort of the drivers of risk and sort of put a formula to it rather than, you know, rather than sort of walking through a black box.

So I'd say we typically sort of help companies decide these assumptions through the use of data science. That's primarily, you know, what I got started doing at Milliman, this was 10 years ago. Now, so that's the actuarial use case. Now there's multiple other things that happen. So once you have these assumptions of how modality risks would look like, there's typically actuarial modeling systems that actuaries use to sort of project these risks further down the future. So 30 years down the future, what are the results that I need? What is the presence value of the premiums that I'll get?

So actuarial modeling is another very deep area in actuarial studies and you know, typically my company helps with those types of models as well. Now, these are not machine learning models. These are cash flow projections models. So we're taking sort of some assumptions and the present state of the policies and then projecting them through various economic scenarios, you know, 10 years, 20 years down the future.

So traditionally, I think that's also a very data intensive process, but I'd say there's less data science there in the traditional sense of what data science implies. It's more simulations, it's more, you know, projecting things. So I think insurance companies who have their own inbuilt actuarial modeling systems are a very, very big users of compute, of cloud compute. And so in each of these projections, there's very large amounts of compute that's needed.

So if you're aware about systems such as Integrate or Atlas or GGI Access, these are traditional systems that are used for cash flow projections. But more modern systems, you know, we started sort of doing some of this work on five, six years ago on RStudio Posit Connect platform. And I know that a number of our clients are thinking about using more modern database systems such as Databricks and Snowflake for doing actual projections. So yeah, that's an interesting area. You can see sort of the innovation happening there.

Now if you move away a bit from actuarial, there's underwriting. So I think the way risks are measured and sort of is through underwriting, there are underwriters within the insurance company and underwriting is a great example of how data sciences is operationalized, used within an insurance company. In general, I work within the life and annuity space. So in the life and annuity space, underwriting is, as you can imagine, very data intensive. And typically in this space, there's lots of usage of third party data sets and like unstructured data sets.

So companies might be relying on sort of RX data or prescriptions or, you know, medical history, financial history, those types of things that are typically obtained through third party companies. And then sort of using very advanced machine learning models to determine risks. I'd say the use of less traditional statistical models, but more sort of machine learning approaches is prevalent in the underwriting space. Typically, if you move away from the life and annuity space to the property and casualty space, that's where you see a lot of innovation happening in the underwriting space with the use of images and videos and, and sort of, you know, geospatial data and whatnot.

Shiny applications in production

So Shiny is something that has been quite transformative for our team at Milliman here. And in fact, some of the applications that we've developed on Shiny is actually in production used by, you know, some of our clients to study their risks. This is, I would say in both areas, typically what we have done is we have put the results of the models that we develop on policyholder behavior. So things like how would people lapse out of their insurance policies? What is the timing of when people will start withdrawing from their annuity contracts?

So typically these are the models that we develop on a large amount of data that we collect from insurance companies. And then we allow these companies to visualize the model. So let's say you can visualize how withdrawal rates or lapse rates within your own company versus in the industry. And that kind of gives you a sense of the type of risks that you're insuring and maybe some actual decisions to be made later.

But yeah, so for those types of use cases where we're putting the results of a business process, like an actuarial modeling or policyholder behavior models, and then letting actuaries sort of play with that data through a dashboarding setup. I think Shiny has been quite helpful. We have tried other types of dashboarding platforms, but the flexibility that Shiny allows us is amazing.

So one of the things, for example, is when companies want to look at their mortality data, they want to sort of define their own custom buckets. So some companies want to define buckets from 10 to 20, or maybe 10 to 20 is a wrong assumption, maybe 40 to 50 years old. But then some companies want to study their assumptions in five year thresholds of 40 to 45. Now Shiny allows us to sort of programmatically set up these things in a way that's quite natural.

And in Shiny, you can also sort of generate UI components programmatically that I think we've found that functionality very, you know, amazing and kind of compared to sort of some of the other alternatives, which has Tableau and Power BI. I think there's more flexibility here.

We have tried other types of dashboarding platforms, but the flexibility that Shiny allows us is amazing.

So yeah, to answer your question, Rachel, we've used Shiny a lot in displaying the results of our studies, but to what's the other extent in actual modeling also, when an actual model runs, these are thousands of economic scenarios of how, like what macroeconomic situations would be there. And then for each of these situations, there's a projection 30 years down the line. We've used Shiny to sort of let actuaries visualize the results of these projections also.

I think the innovation starts from the data platform itself before we move to data science.

In the modeling space, if you move away from data management to modeling, I feel like there's increasing innovation happening in terms of using advanced methods, advanced deep learning methods to study policyholder behavior using third party data. And as you sort of bring in more third party data, there's various deep learning approaches. So if you're using image or sort of forms to inform your underwriting models, then a lot of innovation is happening there.

So yeah, the other driver of innovation that I've been seeing is generative AI. At least in the traditional brick and mortar companies, a lot of companies want to leverage generative AI and they need to fix their data situation first to be able to use gen AI methods. So a lot of scope, a lot of innovation is happening there in terms of modernizing the data management platform.

Moving away from Excel

Excel is a great tool for quick, dirty ad hoc work, right? And it's quick to sort of prove concepts and you can... The flexibility that it brings is kind of unparalleled. You don't have to do any setup. It's all self-contained and can handle considerably large amount of data. It kind of gives you the tool to do these things. But when you're dealing with very large amounts of data or when you need to have governed processes, I think that's when we found that there's a need to move away from Excel.

So for example, a lot of companies use Excel for actual real assumption setting. And these are very large Excel files with hundreds of graphs in them and pivot tables. And it's hard to keep track of. And it takes 15 minutes sometimes to sort of open one of these files. And so there's a case to be made to move away from Excel, at least for situations where there's a large amount of data and there needs to be a governed process to deploy the results of whatever is being done in that Excel.

So I think that's been the driver of the movement away from Excel. But I think you'd be surprised at a very large amount of companies sort of still using Excel in some of these very large processes like assumption setting. But as the data size increases, as they're kind of brought under scrutiny by regulators, et cetera, that's the push to move away from Excel to use some of the modern methods.

Career growth advice

Well, yeah, I think I mean, it was probably 10 years ago when sort of there was it is probably true right now there's lots of opportunities to sort of make impact if you're good with data within an organization. Something that I found useful for my own career mobility is to be flexible in terms of how to provide that business impact.

Data platform modernization in insurance | Kshitij Srivastava @ Milliman | Data Science Hangout

Transcript#

Insurance data science overview

Shiny applications in production

Statistics vs. data science in actuarial work

Third-party data and vetting

Deployment and release cycles

Load testing and Shiny in production

MLOps and model development

Innovation in the actuarial space

Moving away from Excel

Career growth advice