Data Science Hangout | Moody Hadi at S&P Global | Unlocking Business Value with Data Science

Transcript#

This transcript was generated automatically and may contain errors.

So S&P Global really has basically five business lines now. There's the Rating Agency, there's the S&P 500 and the Dow Jones Index Business. There's PLATS, which is an energy focused basically side of the company, like a lot of research on fracking and oils. And Market Intelligence, which is basically kind of like a data analytics vendor. And then the fifth one is relatively new, it's called Sustainable One. It's completely an organization focused around ESG data and finance. So in that area, I run the new product development. So I'm technically under the product management group. So I leverage a lot of the technology folks in data science, a lot of folks in what we call content, which is like automation teams that do more RPA work. And of course, my own team, which is technically generally more quantitative, but in a more user oriented fashion, I should say.

So when we say when I'm talking about new products for S&P, it's typically products that take about six months to a year to go to market. So they're not like feature enhancements or things like that, that are like much more transformational. So for example, we've launched a financial statement extraction system that basically take a document that's completely sitting on your desktop, and then you upload it and it becomes part of our platform. Our sentiment analytics is for Chinese companies in the native language, simplified Chinese. So things like that, that are not typically what you might necessarily associate with the S&P sort of product offerings. So yeah, we use a lot of, you know, we have R in production. We also use, of course, a lot of the NLP tools in Python. So we kind of cross all the usual buzzwords in the data science world. So hopefully that gives you an idea.

Excitement in data science

Yeah, sure. I mean, yeah, like I started kind of in the quantitative side for about like in about two decades now. So the exciting part I see in the data science community is the ability for folks to get a lot closer to the domains that they're working in. And I think a lot of the work that RStudio has been doing, especially on like making building applications easy in R actually helps sort of get that point across. Like in my world, obviously, in the quantitative world, you have a lot of folks who kind of just sit behind, look at numbers. They don't know what those numbers mean and how their client will use them. So I've seen over the last at least five years, there's a lot more transformation there where the sort of the upcoming data scientists actually try to think beyond the numbers and how the client will use them.

I think more and more technical basis, like ability to like visualize information very easily with like very few lines of code. Like a lot of what I do on the prototyping side, we use Connect, we use Dash. But the net result is there's very few lines of code that you basically have to write in order to get something that looks like a legacy, like a JavaScript based platform, but kind of actually more cutting edge, I would say. And that's important, especially when you're trying to kind of get to the last mile, because typically in bigger corporations, to build sort of those type of applications, they go on this Chrome agile release cycle takes quite a few months to get to that point just to show a client something interactive.

And then the other thing that excites me, third thing is I think more on the machine learning side that neural networks, obviously very, very big black boxes, but they can mine a lot of data and take advantage versus say structural models, right? The explainability side of it has become a lot more, I would say a lot easier to actually do. Like we use Lime a lot for model agnostic explanations, like simple things like that kind of go a long way to sort of like, here's a nice time series, but why? Like that question gets kind of answered nicely, and then ties back to the visualization, because we kind of build things that try to simplify that answer without having to, without having the customer to reach out to support or to other sort of sides, or, or worst case is having the client have to jump on a call. You kind of want things to be more on demand self service.

Yeah, yeah, that makes a lot of like, so for me, it's two sides, right. So one, like internal where and by internal folks, I mean, like our salespeople demoing something to a prospect. So we just, it's a new product, we're still kind of assessing the market appetite, the addressable market, basically. So we build it in house, you know, comes over the VPN, it's going to simple URL and connect, typically. And they would go out and talk to clients and showcase basically the visualizations, well, not really visualization, but like the workflow. It's really workflow based tools, right. And then depending on the feedback, that's when we kind of figured out what the best channel to put it on, whether it's a feed, or it's a desktop offering, right.

The other one, which is more interesting, at least to me, is that the external piece. So we have done, we do, we have something called the incubator at marketintelligence.com. That's an external facing, basically, pre-prod environment, if you will, it's, we can sign up clients and prospects on it under some, you know, SLA type of deal. And that is completely, whoever you sign up, they go in and do their business using that application. And that's also serviced by, in some cases, Connect, depending on how many users we have concurrently, and other cases by like Dash and Plotly. But again, all of those have some lifetime to them. Eventually, they, within six months to a year, they go back into the platform offering, which is all in JavaScript, basically.

And thinking about those, I guess, as follow up to that question, I see there's a question on Slido. Do your clients set standards for which tools and approaches they need you to use? Or do they just care about end results, and you're free to choose?

It depends. They have standards more, not, I mean, not the workflow, necessarily, workflow needs to be, quote, unquote, intuitive, right? So that unfortunately goes through a lot of like iterations, right? Until it becomes intuitive. But like, yeah, depending on a product, so like, in like credit analytics, for example, you can't just basically arbitrarily decide to build a neural network to do, you know, implied ratinging. Because that comes with, although it's not a ratings product, it is not a ratings, S&P ratings product, it's lowercase letters that, that goes through a lot of audit, a lot of stability criteria. So that you can't just arbitrarily decide and change the model, like, but yeah, so they're like NLP things that I don't think they care, like we have the text and the transcripts filings that we have that does basically takes an earnings call, converts it into, you know, machinery double text, like from voice, and then runs a bag of words, regularized bag of words against it. Now, that could have also been done with a neural network, just regularized bag of words was easier to explain. The China sentiment analytics, on the other hand, is completely neural network based, right? And with like, model explanation on top of it.

Biggest challenges

I mean, right now, honestly, I would say the biggest one is hiring, like, we need to hire more sort of qualified data scientists. Like, we obviously have a team, we have several teams. And I think that's one of my biggest, like, items to address for basically 2022. On the more technical side, I think explainability. Like, this is a problem, especially, like, as we start dealing with this big data sort of purge, or like this ability to mine big data, for lack of better term, like, it's always nice to see a time series of some input, but the reality is the end user typically won't take that as gospel, right?

So, trying to kind of explainability, like, so, trying to kind of explain why, and especially with the subjectivity part of it, like, you know, we have folks who have an opinion about a particular market and other folks who have a completely contrary opinion about the market, yet you're trying to set something that works for both. I think trying to be very transparent and sort of unbiased, this is important. And that's where the model explanation comes into play. At least so long as both sides understand where you're coming from and it's defendable, it's self-consistent, then you don't run into this sort of problem. But that is sort of evolving. Like, in finance and, like, financial engineering, I mean, you can tell, like, there's some form of absoluteness to the values that you produce, right? Like, a lot of structural models, a lot of implied implicit assumptions.

I think the question, there's no such thing as a better model. It's more like, what's the context, and is it self-consistent, right? Those are the two things that you want to keep in mind.

So, there's, you know, somebody wins, somebody loses, right? In some of this sort of weak predictor area, there isn't that. The only thing that you can kind of tell is that I've looked at something, and this is how it works, and this is why I put it at that level. I think that piece, people forget about, like, especially sentiment analysis. They think there's some absolute measure, but really, it really depends. I think the question, there's no such thing as a better model. It's more like, what's the context, and is it self-consistent, right? Those are the two things that you want to keep in mind.

Pick the one or two that actually succeed. And then for those ones, sort of do, like, what I do with the incubator, like, build a Shiny app and do it the way you think a client would do it, and then show that. Because that sells it more than just saying, I can do, you know, you know, a soup of basically technical jargon, right? That doesn't help anybody.