Resources

Dmitri Adler & Merav Yuravlivker | The shift to data: Industry trends in finance | Posit

Posit Finance Meetup The shift to data: Industry trends from banks to hedge funds to federal agencies How has the finance industry shifted towards data in the past 5 years? Will all analysts need to program in Python in order to have a job in the future? Join Data Society co-founders Merav Yuravlivker and Dmitri Adler as they discuss the trends that they're seeing in financial institutions from banks to hedge funds to federal agencies. By the end of the session, you'll be able to speak about specific industry uses and walk away with concrete steps you can take to ensure you're riding the data wave. Speaker Bios: Dmitri Adler is the Chief Solution Architect and co-founder of Data Society. He has deep expertise in building predictive models and algorithms for forecasting macroeconomic conditions, healthcare outcomes, trade flows, and business performance for a variety of government and private sector clients. Prior to starting Data Society, he advised the U.S. Treasury Department on the structure of the mortgage market after the financial crises while he was at J.P. Morgan and developed expertise in applying machine learning to financial modeling and investing. Dmitri has worked with large financial institutions and agencies to build custom software, assess financial risk, and integrate machine learning applications into their operations. Merav Yuravlivker is the Chief Executive Officer and co-founder of Data Society. She has deep expertise in developing effective professional development programs and assessments to maximize the capability of an organization and empower the workforce. Prior to starting Data Society, she built her career at educational institutions that include Teach for America, Kaplan, and the International Baccalaureate Organization. Over the past seven years, Merav and her team have saved organizations millions of dollars by incorporating data analytics skills and best practices that educate, equip and empower an organization’s workforce to achieve its goals and expand its impact. Data Society specializes in providing industry-tailored data science training and AI/ML solutions that enable Fortune 500 companies and government agencies to educate, equip and empower their workforce. Since 2014, the company has trained thousands of professionals with the skills needed to solve complex challenges, realize new opportunities, and take their careers to the next level. Data Society was recognized as an Inc. 5000 2021 fastest-growing company and named a top EdTech Company to watch by Forbes. For more information, visit www.DataSociety.com. Link to slides: https://github.com/RStudioEnterpriseMeetup/Presentations/blob/166452d28d61ef33faf8980c8f0f43426e72926b/The%20shift%20in%20data.pdf

Feb 8, 2022
1h 2min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Thank you so much, Rachel, and let me go ahead and get my screen share. And I saw we have somebody who is in the woods. I saw that was the first comment there, so hello to the woods, and I also see we have folks from all over the world, so we're really excited to be here with you today.

And this will be interactive, so I'm glad to see that there are so many comments already.

So today we're going to kick off and talk about the trends that we're seeing in the data space and specifically related to the finance industry. Before we get started, I always like to set a few expectations for for these types of virtual webinars. What we recommend is to find a quiet place, maybe one where there's not so many kids, dogs, cats, you know, whatever you have around the house. Also ask, as Rachel mentioned, please stay on mute unless you have a question or would like to add a comment. Please also silence any alerts from your cell phones. And then last but not least, we always encourage participation.

We do have some questions throughout this presentation. As Rachel mentioned, she will be sharing out the link to Slido. So we'll have about 20 to 30 minutes of Dimitri and myself presenting. And then we really want to turn it over to you to guide the conversation and make sure that everybody leaves with something actionable today. So if you have questions, please feel free to ask them.

Before we dive in, I'd like to set an introduction for myself and my co-founder, Dimitri. I'm the CEO and co-founder of Data Society. Dimitri is the chief solution architect and co-founder as well. We started Data Society back in 2014, really with one mission, which is to help professionals use data better. We do that with our custom data science training programs that we deliver to large organizations, federal agencies, Fortune 500 companies with specific focus in finance as well. And then we also have a solution side of the house where we build custom software, predictive algorithms and also support digital transformation efforts.

So over the past eight years, we've seen a lot of how data is being implemented in the finance space. So we're bringing in a lot of our experience today to share that with you. What you'll walk away with today, examples of how data is used in finance, trends of data uses in the finance industry, as well as steps that you can take to start integrating data into your operations, because we get that question a lot. And it's really important for us to help other organizations become more data driven. So you'll definitely walk away with some ideas that you'll start to think about how you can use that in your day to day.

Case study: non-traditional data in financial risk

With that, I'm going to turn it over to Dimitri to kick us off. Thanks Marat. So let me kick off with a case study of how we've leveraged data for a financial services use case.

Broadly speaking, what we're seeing is that there's a trend towards using a lot of non-traditional data sources in order to make financial decisions. And that is impacting every aspect of finance from underwriting to bond trading. In this case, we had the Inter-American Development Bank who came to us with the following question. They fund a lot of infrastructure projects around the world, primarily in developing countries in Latin America. And they wanted to mine lending agreements and infrastructure proposals in order to better evaluate the riskiness of those projects and of those funding packages.

They said, all we have are PDF style documents that describe the terms of those loan agreements. Can you help us extract data from them in a standard way and then help us build a risk model such that when new organizations apply for funding grants or loans effectively for infrastructure projects, we have a better framework for determining whether or not they're high risk and therefore what the appropriate monitoring structures and levers need to be.

So we built a bespoke tool for them that did exactly that. It connected to a large repository of infrastructure proposals and associated loan documents, extracted key terms, and then looked at the historical outcomes of those projects in order to build a risk model and say, here are the key factors and elements inside those loan packages and inside the effectively project plans that help us forecast what is the likely outcome. Is a project going to be over time, over budget? What is the likelihood of success? And so on and so forth.

As a result, we developed an application that they could pull up on their desktops and rerun the workflow to load, extract additional information, loan information from documents, and then price subsequent loans. This type of use case of leveraging natural language processing plus machine learning in order to arrive at a risk model is something that we're seeing happen with increased frequency across the financial services space.

Key use cases across financial services

So broadly speaking, the types of use cases that we're seeing emerging across the board are generally fall into a handful of buckets. Risk analytics, identifying consumer or borrower behaviors and trends, the mitigation of operational risks, meaning the actual operational risks of a bank, of an insurance company, or any other type of specialty lender. Fraud detection and identity validation. And finally, payment and transaction processing. Those are some of the biggest buckets.

Fraud has been an increasingly growing problem across the world to the point where there are, in fact, actual fraud factories that have been developed in some countries like North Korea and Russia, that will literally build up the credit histories of a virtual identity, start to borrow money through, so they will steal some identity information, right? You all probably have heard of the hack of the Office of Personnel Management in the United States, right? So they'll steal information. Sometimes they'll buy it on the dark web, and they'll start to literally build up a credit history for somebody.

Over the course of years, they develop a large borrowing capacity. And then when they quote unquote, fund the account, they will borrow that sort of large target amount that they had. And then that is when the fraud actually occurs. And so that type of corporatized systemic fraud is a growing problem. And banks and financial regulators are continuously looking for solutions to combat threats that are not necessarily imminent, but equally are large and systemic.

And banks and financial regulators are continuously looking for solutions to combat threats that are not necessarily imminent, but equally are large and systemic.

In terms of identifying consumer behavior, there is a big push towards identifying non-traditional sources of information about population movement and population behavior. So what happened was, in the pandemic, people stopped showing up to places in person, right? This virtual presentation is a case in point. And so banks started to ask themselves the question of, do we need physical branches? And if so, how many and where? And so we've seen use cases where banks are pulling cell phone data to start to understand the new behaviors and patterns of physical movement, juxtaposing that with their general ledger to understand how the local bank locations are being utilized and whether or not there's enough transaction volume to justify actually having a branch open, whether it's from a deposit growth standpoint or a loan growth standpoint.

There's a variety of technology products that are increasingly used to transfer value, right? We've all heard of cryptocurrencies and the latest trend in non-fungible tokens, which is basically pieces of code that serve as a proof of record that somebody indeed created something and somebody else owns it. What's interesting about the utilization of technology as a proof, as an evidence of something occurring, is that if you think about the nature of markets, it's all about supply and demand. And where there's a constricted supply and an excess demand, you start to have price growth. In fact, the largest, the best performing asset class over the past 30 years has actually been fine art paintings, effectively, right?

So it's a constrained supply. You're not going to get any more impressionist paintings. And you're looking at prices of works skyrocket from $2 to $20 million over the past couple of decades. And so the ability for technology to serve as a proof of record that something indeed happened is opening up spaces for artists and other creative use cases to actually leverage technology and then finance to actually grow and create value. That's a fascinating trend that's poised to remake a lot of the financial space.

Venture capital and skills demand

And so what that's fueled is a huge growth and influx of venture capital. So in fact, last year was a record year in terms of venture capital invested generally, and most of it going especially into financial technology. So you're seeing a lot of investors who have identified this trend and are saying, how can we create entirely new markets that unlock sources of value that were previously closed? So if you're in an investment community, there's a lot of money being thrown at the sector, or whether you're looking to start a company, you actually arguably have the best chance of being funded if you're in the financial technology space.

So this huge well of opportunity and this huge influx of money is creating a lot of opening for skills and demand for skills. So in fact, in 2021, the largest apt and technical skill demand is for R and Python programmers in finance. And that demand right now is hard to fill. So you've had immediate price appreciation for labor of 10 to 20% for people who are in finance and have these types of programming skills. So the skill requirements kind of go up and down the chain.

So if you're thinking about what does it take to actually have a technology application function, there's the mechanic of collecting information, storing information, changing its shape, analyzing it and presenting it, right? And there's a discrete set of technical skills that are necessary to power it. So skill requirements include knowledge of databases, certainly cloud databases, knowledge of data structures, and an understanding of how to find patterns and detect statistical significance.

So in fact, when we started Data Society back in 2014, I myself came from the financial services side. So I started my career as an investment banker, of all things, and then I worked at a quantitative hedge fund for a while. And so what we did is built complicated, what now you would call machine learning, but basically forecasting models around a variety of financial portfolios, whether it was portfolios of loans or energy portfolios or royalties of pharmaceutical products. And so we refined the quality of our analyses and placed a lot of bets using sort of analytical methods that at the time were not standard at a lot of the funds in the financial services space. Today, there's been a broad recognition that some of those analyses are no longer appropriate to do in Excel, right? You want to use proper statistical software to do that, hence the huge uptick in R and Python.

And there's an explosion of both proprietary and open source tools that are available to you guys. So there's a link on this slide for a CRAN repository of over 160 different packages for the financial services industry. I've used a lot of them. They're really powerful. As long as you understand exactly what they're doing and why they're doing it, I highly encourage you guys to have a look. And what I've learned is because the power is in automation and just higher quality prediction, they can really help you move the needle professionally.

FDIC tech sprints and operational risk

So one of the emblematic displays of how data is changing the financial services space is what the Federal Deposit Insurance Corporation, so effectively the quasi-government organization that is tasked with ensuring the deposits of savings and loans organizations across the United States. So if you go to their website that talks about tech sprints that they champion, so it's here on the slide for you, you'll read about several tech sprints that they've deployed over the past few years. And you'll notice that they're all about leveraging technology to address pressing banking problems.

There are cases where they've asked the community to identify ways of help the underbanked, right? So there's roughly 7 million people in the United States that are towards the bottom of the sort of income demographic who are underbanked and need additional financial services that they don't have access to. So they've asked the community, how can we reach these people more effectively? Again, think back to identifying, validating identity in a digital space, and then using technology to facilitate payments, right, which is effectively what those folks need.

And finally, one where we participated for which we developed a risk framework that is based on something called CAMELS. CAMELS is a standard risk measurement framework for banks that the FDIC uses when they evaluate banks for their soundness. The sprint for the FDIC tasks participants to think through how data can be used to alleviate operational risk. So what happens if somebody doesn't show up to work? What happens if the bank gets hacked? Does the funding, do the funds of the clients, of the depositors remain intact? Do loans get evaporated?

And so we designed a framework that leverages operational metrics such as the bank's data systems, their ability to defend against a cyber attack, their data redundancy that protects against key personnel departing. So we create a framework and a technological foundation to do that. Now, to enhance the framework, what we also showed is how to develop a quantitative model to leverage data that's not traditionally used by regulators, by banks themselves, to look at their business. So I know this font is a little small, so we can probably distribute this presentation afterwards, but we actually built a quantitative framework for leveraging things like census data that provides monthly updates on local economic conditions. Local commodity prices.

So, for example, a lot of small to medium sized banks have, well, a regional coverage, right? And for a lot of them, they have heavy exposure to agriculture. So when there is a precipitous drop in a particular crop localized to a certain area, the entire portfolio of the bank may be at risk. And yet commodity prices are not a traditional metric that are used to evaluate the health of banks, certainly not by regulators. So we assembled a technological architecture using Google Cloud and showed how to input data sources from commodity markets, from U.S. Census, from social media, from digital media, all in order to create a more encompassing framework that helps evaluate a bank's risk, not just from a financial perspective, but also from an operational one.

So these are all some of the key things that are changing the mechanic by which the finance industry works, increasingly connecting it digitally and creating a much more automated way for managers and regulators to understand the landscape of money.

Audience discussion: data sources in use

So based on all the information that Dimitri shared, the fact that we're starting to be able to combine different types of data sources that finance hasn't really leveraged before. One of the questions that we have for you, feel free to answer this in the chat, is the types of data that you're using today. Because throughout this presentation, we're going to transition a little bit more broadly about how you can start to think about what it means to be a data driven organization and why that's important.

Crime data, transcripts, census, CDC. Health care. Yep. Somebody is actually using sentiment analysis on social networks. That's great. Real estate, student data.

So one of the really powerful use case for graph analysis, effectively, community detection, where. If you are able to get transaction data from any kind of network, think of, for example, Swift, the organization that facilitates international payments, international bank transfers. You actually can understand what are the key nodes and therefore key nexus points for a financial system. And so understand what are the key risks in a financial system.

What's interesting is that there was an analysis done by ProPublica back in 2010 of memory serves and they were doing a retrospective on the financial crisis back in 2007 and 2008 and sort of trying to figure out, well, how did it come to be this way? And what they did is collected data from Bloomberg data that was publicly available to everybody, by the way, during the during and leading up to the financial crisis back in 2008. And what they did is collected information on something called CDOs or collateralized debt obligations. So those are basically pools of loans. A lot of them are mortgage loans, but there are all sorts of pools of loans, car loans, all sorts of different types of loans.

And they looked at how different pools of these CDOs, right, different CDOs, what they own. And what was remarkable is they showed a web that showed that all these different CDOs, first of all, were issued by a handful of investment banks, like 20. And a lot of them own pieces of each other. So one CDO owns pieces of another CDO and so on. And so you could actually see clear as day that there was, in fact, a network of these collateralized debt obligations such that it took very little for one CDO to default before it set off a cascade of defaults across the collateralized debt market. So such that something like a 1% change in the value of one CDO actually could impact trillions of dollars worth of debt. And it had to do a lot more with the structure of these securities and how they were priced and owned them rather than anything else.

And so when I saw that sitting back at JP Morgan's desk at the time, it became very obvious that had somebody been doing this analysis on open data all along, they could actually identify choke points in the financial system, identify this type of systemic risk much, much earlier. And so I suspect the same types of opportunities are available today. So for any of you doing analysis in the financial markets, I would encourage you to have a think about how you can use a graph analytic framework in order to understand the financial system better.

And so when I saw that sitting back at JP Morgan's desk at the time, it became very obvious that had somebody been doing this analysis on open data all along, they could actually identify choke points in the financial system, identify this type of systemic risk much, much earlier.

What does it mean to be data driven?

So thank you so much for telling us all about that, the data that you're using, and hopefully that's maybe sparked some ideas about additional data sources that you can use. You know, one of the big trends that we're seeing in finance, and I think Dimitri alluded to earlier, is the fact that more and more institutions are focused on becoming data driven. But a lot of times this term is used in a nebulous way. And so one of the pieces that we've found to be most helpful is to be able to define a little bit better, what does it actually mean to be data driven? How do you know if you're data driven and how can you identify the key pieces to work on within the organization?

So the way that we've identified that is across two different axes. We have our data literacy, which is the overall knowledge as well as the governance and oversight within an organization. And then also the data infrastructure, which, especially in finance, is crucial to ensure that data is accessible appropriately and that it's also stored securely.

So thinking about data infrastructure, we've identified these three pillars. Data collection, data storage and data access. So in terms of data collection, you know, I saw a lot of evidence about the data that folks are using with its customer transactions, CRM data, student data. So presumably there's some collection that happens on a continuous basis. If not, that's definitely something to look into. But even more importantly, making sure that the data is collected in a way that is then easy to analyze and to store. Is also pretty crucial. So if you have a bunch of data that's disorganized, maybe there's a lot of missing values that makes it a lot more difficult to pull insights from it.

So ensuring that the data collection that you have is not only collected in a timely manner, but then also stored well and then stored in a way that makes it easy to pull insights from. The second pillar is data storage. I think everyone on this call, especially if you're in the finance industry, understands a lot of the information that you're dealing with is personal information, whether that's credit scores, spending histories, bank account information. Ensuring that your data is stored securely is paramount to that.

But at the same time, the third pillar, data access, is also very impactful because if your teams cannot access the data that they need to make timely decisions, then that makes it difficult to use. And that makes it difficult to use the actual data that you've collected. So, you know, you can start to ask yourself, the data sources that you're using now, is it easy for you to access? Is it easy for your colleagues to access?

And then on the other axis, we have data literacy. And with this, you know, one of the biggest challenges that we see is a communication gap between the data and the non-data professionals within an organization. So becoming a more data-driven organization really starts from the top, and that's in terms of data leadership. Do executives actually champion data utilization? I would say most do, but maybe they don't understand the resources that they need to allocate or the time they need to give people to better understand how to use data. So if you have a data champion in your organization, if you have one person who is helping set that data strategy, then that's a good sign that your leadership is taking this very seriously.

The second pillar that we have under data literacy is data governance. So I think especially in finance, you probably have a lot of guidelines where you can access the data, how you can access the data, who can see the data. Do you have these guidelines somewhere? Does everybody have access to them? Are people using data in a way that's uniform to them? So making sure that across the organization, people are using data in the same way so that it's easily accessible to all and easily transferable and readable. So something simple or something straightforward, if you don't have a data dictionary in place, putting one in place already can help you ensure that all of your variables are defined in the same way, which makes it a lot easier to clean and then to analyze later.

Last piece is data knowledge. Does your staff, do you, do your colleagues know how to ask the right questions about data? Do they understand how to interpret the insights about data? Without that, it makes it really difficult to start to become more data driven and to use data to inform your insights. So, you know, a lot of ways that we address this is specifically about developing programs within organizations to help them become more data driven and to drive that continuous culture of learning.

So as I'm going through this, maybe you have some questions or maybe you've identified, hey, what's a key pain point? Like, maybe there's data that you're unable to access, or maybe you're seeing that a lot of your colleagues are having difficulty asking questions about data. You know, one source of data that we see a lot are a bunch of Excel spreadsheets that are live on somebody's laptop. And then the minute that person leaves, those Excel spreadsheets disappear, right? That's something that we refer to as dark data. So this is something that can be ameliorated with data governance or data storage.

So if you are wondering, where is all of this data? Does all of my data live on Excel on my own laptop? Maybe start thinking about how you can transfer that to a secure space that's maybe that's on a cloud environment so that others can access it besides you, because that type of data can be increasingly more important.

Steps to start with data analytics

So talking a little bit about this, it all starts with us, right? It all starts with everybody in the room. So how can you start with data analytics if you're not already doing this already? Make sure you're asking questions about data, ask for metrics, make your metrics specific, make it measurable so that people understand what you're looking for. And that can help drive the insights that you find and then the decisions that you make behind that.

For inventory, you know, we ask what type of data do you have access to? I'm curious how many of your colleagues know what type of data you have, right? So finding the information that you have access to, maybe you'll discover that you have this whole other database you weren't even aware of. Maybe you find out that you're already collecting data about something you had questions about in the past. So especially in a lot of large organizations, we find that most people don't know what's available to them. And that's not just in terms of data. That's also in terms of tools. You know, maybe your organization already has RStudio enterprise, right? And you weren't aware of that. So asking those types of questions can help you better understand that as well.

And then, you know, for collaborating and talk to your colleagues. What we've seen is when you get a bunch of people in the room to start talking about the challenges that they're facing data or the data they have access to. You'll see that a lot of you are working on the same issues. And to be able to work together on that, you're, you know, instead of doubling the work because you're both solving it individually, solving it together tends to make that work go faster. And you might also see that there are other, you know, aspects and other data sources that you didn't realize you had.

How can you support data literacy? So beyond actions that you're doing yourself, what can you do within an organization? Doing these types of lunch and learns internally, perhaps. Maybe you'll find this useful. Hopefully you do. So bringing in other experts to speak about these trends, to speak about how data is being used, to speak about these different types of use cases to inspire others around you to start to incorporate data. Go to data conferences, setting up training opportunities based on skills gaps. So where do you want to improve your skills in the data spectrum? And even planning events such as data competitions can be a really nice way to identify top talent. And then also encourage others to start to think about how they can apply data to their work.

Ethical use of data

So before we finish up the presentation, this is an area that we always like to emphasize both in our training programs and also in our presentations, which is about the ethical use of data. So this is a case study that maybe some of you know. This was one that was done by Target. So essentially about 20 years ago at this point in time, Target had a lot of information. They're a huge retail store in the United States for those who might not live here. And they had a large customer base of data about purchases, past purchases. And wanted to start to predict which of their customers would be pregnant because they know that when an individual is pregnant, they tend to get more set in their habits. And if they can catch somebody during that stage, they tend to be more loyal customers throughout their life.

So in order to do that, they had their data scientists sit for about two to multiple years of data to better understand, OK, which of their customers became pregnant. And then what were the key factors there that they can use to then predict who will become pregnant? So they did this analysis. And at the end of this analysis, they found some key indicators included buying ginger ale to help with nausea. Stopped buying wine was one of the factors that had a big correlation, as well as buying prenatal vitamins and things like that. And based on their predictions of who would become pregnant, they started to send out targeted flyers. And you'd think this is a great use case of data analytics and it is a demonstration of an effective use case. But what ended up happening is they sent one of these flyers to the parents of a 15 year old who hadn't told them that she was pregnant.

So that became a big case that happened with a few other families as well. And it started to bring up the question of it's one thing to be able to mine customer data for insights. But how can you make sure you're using it in a way that doesn't cause harm, doesn't have ethical implications to it? So it's important, especially given that we're in the finance space, to think about how we're using customer data and how we need to be mindful of that fact.

Now, interestingly enough, what Target ended up doing, it's not that they don't use that data for insights anymore. What happens is that they start to put lawnmowers next to baby cribs. So they just are a little bit more subtle about how they advertise that in the future.

Speaking about ethical considerations, another one that's popped up, especially in the past few years, is about biases that exist in our current data that inform our models. So, again, it's our responsibility to ask these types of questions. There's one where it's, you know, how does the data reflect the society that we're in today? And if it does perpetuate any stereotypes or biases, how can we build models for future prediction or, for example, granting loans that will mitigate some of those biases and risks so that we don't perpetuate that? This is a difficult question. I'm spending a few minutes on it, whereas there's a much larger conversation to be had. But the important point to get across here is, as you are building these models, as you're involved in these models, make sure to ask these questions about, you know, accounting for biases before the data is being put into the model, making sure that the model accounts for that, and understanding how to mitigate that so that you can make the best decisions for your customers without introducing, you know, some biases that might already exist in the data.

Building data literacy in financial institutions

But on the other side, there's that data literacy component of that, and this is an example of how a lot of financial institutions today are starting to realize it's not only the infrastructure they need to have in place, but it's also the appropriate staff with those skills. So we've done a lot of work in the finance space with that. One of the use cases specifically is with Discover Financial Services, where they actually developed a pretty robust infrastructure. And what they realized is they had a lot of new hires coming in that maybe had some of the prerequisite skills with a good foundation, but not all of the skills that they wanted to, you know, that they needed in order to be effective.

So we work with them to build an onboarding program that helped train them in R, as well as in other tools, including SQL and Snowflake, so that by the time that they left that onboarding process, they felt much more comfortable within the infrastructure that they had, so they could be much more effective off the bat. We see a lot of this happening, and it's not just the onboarding process, but because hiring is so difficult right now for data scientists, they're a rare breed, and I'd say especially in the finance industry, it can be really difficult. A lot of financial institutions are now turning to this training aspect, whether they're doing it internally or they're bringing somebody else in, to help develop that skill set across the organization.

So I'll finish up with this question here. I know that there's some other questions in the chat that are more specific, so I can pause here and just say, first of all, thank you for coming to this talk today. Rachel, thank you for organizing it. We love RStudio, especially the Hangouts, and we find that, you know, just the community has been really supportive and lovely, so we want to make sure this is valuable for you, and now is your time to ask us questions.

Q&A

Thank you so much, Merav and Dimitri. That was great. I see a lot of questions coming into the Zoom chat, and then we also have the Slido link if you want to ask questions anonymously there as well.

I think, Rachel, the first question was from you in terms of what kinds of organizations provide employees with technical skill development versus expect you to figure it out on your own. It's tough to paint with a broad brush. We work with a number of financial services organizations that invest heavily in new employee onboarding and continuous employee development. We work a lot with Discover, Inter-American Development Bank, Capital One, all those organizations I know have robust new employee and existing employee training programs. I know we had a very robust one back at JPMorgan, so it's tough for me to say, well, this type of organization likes to do this versus not that. So it's tough to put them in buckets. But I know a lot of organizations do have great programs. And based on our experience, those who do not have something like that in place are increasingly looking to deploy those types of programs. We're seeing a lot of interest.

And I'll just add on to what Dimitri was saying, where I think almost every organization that we speak to or work with has some sort of online training component. And that can be really helpful, I think, especially for a lot of folks on this call who maybe already have a foundation in R or Python and maybe just need to scale up in one or two things. But we found that a lot of people who are new to the space tend to get really overwhelmed with the amount of information that there is. They're not quite sure where to start. And that's where organizations are starting to bring in something that's a little bit more tailored and a little bit more structured to help both increase accountability, also develop that community of sharing and collaboration. And so doing that kind of in tandem with one another helps build that robust learning culture.

So thank you, guys. It was a brilliant presentation. Thanks for that. So my second question was around one of the challenges that we face is figuring out how best to have access into standardized schema documents like XBRL and then pulling that into R for some sort of standardized financial modeling and analysis. Do you have any tips or experience that you could share there?

Yeah. So in our experience, it's been less about the domain and more about the approach. So XBRL is a very useful tool to publish financial records data. But when we are then taking those financial records data and trying to come up with an analysis, it's much more about, for example, a data structure that lends itself to text analysis. So as I mentioned, Neo4j has its own structure for meta-tagging and relating data in non-star schema ways. If you're looking at graph analysis, it's a different approach. If you're looking at classification algorithms, they all have their own sort of data formats that you would need. So unfortunately, I haven't encountered anything specific that sort of is universal for financial modeling. It's more specific to are you looking for portfolio analysis? Are you trying to figure out the Sharpe ratio for your portfolio or are you looking for churn metrics or something else? So it tends to be more analysis style specific.

Neo4j, I've used here and there, but my curiosity is more of, is there some sort of rule of thumb when scaling up Neo4j? Because we all know sort of graph network type tools have a scalability problem. The more vertices, the more edges, everything explodes sort of exponentially. So I was just curious if there's a rule of thumb around number of CPUs, amount of RAM as your network scales for sort of making Neo4j run in a malleable way on a cluster.

Yeah. So I am unfortunately not the right person to ask. I'd need to have our director of engineering on the line for this question. But I know he figured this problem out last year. So I can connect you guys afterwards if you drop me an email.

And just letting everyone know that I did put in both my email and Dimitri's email in case your question doesn't answer or you want to learn more. Or I think it's Eugene's question, you need a more specific answer. We're happy to connect you with the right person and continue that conversation as well.

Gregor, I see you asked a question around R and Python too, if you want to jump in. Yes. Thank you very much for the presentation, Mirav and Dimitri. It was very interesting. And I would have liked to know, because I know Python is also more and more used in the finance industry. And I would have liked to have your impression and feelings about how R and Python evolved or are adopting. And also the second question I asked, what do you think are the two or three other main languages that are entering the finance data driven analysis, please?

So you're right in that the sort of Python user community has grown somewhat faster because it's grown on the backs of software engineers. For example, I started as an R user. And so that's still my first love as far as programming languages go. And I find that, for example, R's syntax tends to be simpler than that of Python. So if you have a software engineering background, picking up Python can be pretty trivial because you're already familiar with the notion of data structures. If you are starting from an Excel or sort of VBA background, then Python might look somewhat foreign and R is going to be a lot easier for you to adopt. Most people who are in finance actually started in some form of spreadsheet software, right? So to them, transitioning into R would actually be easier.

And in my experience, R is going to give you just easier, faster options for prototyping, proving your point, doing the analysis, creating interactive visualization, standing up a light application with something like R Shiny. So I'm a huge fan of R, and that's the main language in which I program in. I don't think that the fact that there's been such a huge uptake in Python, for example, precludes another language like R from being just as widely used and helpful. Ultimately, I've learned it's about your personal preference and what gets the job done faster. Those two, I think, still dominate. There are going to be some languages that come into play when you're talking about scale systems, but I would attribute them less to finance. There's nothing about finance and the structure of a different language that make them better adapted to finance, right? But when you're talking about some scaled applications, especially for transaction processing, Scala is obviously important, right? But that has to do more due to its sort of software engineering and data piping capabilities rather than because it's better suited to finance in some way, shape, or form, in my experience.

Based on your experience, how do you find convincing or working with teams where the business side are very Excel, well-versed, and they might find R a lot more palatable. But the technology groups then come in and say, okay, no, we are not very comfortable with people using R. On the software side, there's a much bigger preference for Python. Working with clients or even within your organization and going through this maze of what should be encouraged or discouraged, or can you even have both of them working? And what kind of relationship should there be between R and Python or between business and technology people? How do you navigate conversations like those?

Sure. So I would push on the word comfortable and who is comfortable with what and who is in the right lane in terms of being comfortable with what, right? Software engineering is going to be responsible for secured scale code. But the finance team is responsible for the analysis being correct, right? And so in my experience, the way that I see it is if you're writing a piece of code, you better be confident that it's working correctly from just a mathematical standpoint. And if the finance team is using R and is comfortable in R, then that has to be the code that actually runs the financial analysis. Now, there are lots of ways to call an R routine from Python and a Python routine from R. So they are interchangeable and lots of hooks around it. So it shouldn't be, you know, everybody must use one language or another language. There are lots of ways to make the two compatible.

And in fact, we do that regularly in our line of work, right? There's a lot of packages that are just better in R than in Python and frankly, vice versa. And you shouldn't limit yourself to one or the other. So then, you know, the conversation is, look, if you're writing most of the workflow in Python, why can't you call this R subroutine? Because our analysis works in there. And if the engineering team wants to reconfigure the ETL in Python, well, that's their prerogative, right? They're thinking about the compute time. But there shouldn't be sort of a debate about, you know, do you need to change the actual mathematical analysis from one language to another? That doesn't feel like a good use of time given the interoperability that's available.

Just scrolling through the questions, I see Davin, you had asked a question. Yeah, I'm just really curious. You talked a lot about the financial sector having trouble finding qualified staff to bring on full time. And what do you see for trends in that type of analysis being available for freelance workers or outside organizations, consultants, things like that? If you just kind of comment on that, I'd appreciate it.

Sure. I don't have any stats to speak about the trend on freelancing specifically. But generally, I try to think about things from first principles. And when a person or a tool does and does not make sense. So I'll answer this question kind of the same way that I answered the R versus Python question. There are great use cases for when to use somebody who is a freelancer. You're usually paying them on a per project or an hourly basis. And usually they would be more expensive than a W2 employee. And you can usually get them onboarded for that very narrow kind of use case much faster than a W2. And so there is a reason to utilize a freelancer, for example, or an outsourced firm when you need to get something accomplished really quickly. But it's an in and out kind of task.

If you're talking about somebody who is a permanent hire. Well, there's a legal question about do you want them to be a 10-9 or W2? It's actually more of a legal and HR question than a substance question. And then finally, if you have a capability that's ultimately going to be core. You probably want that to live with your full-time staff who kind of bet their careers on working with you for the long run. So in my experience, whether you're outsourcing or freelancing. It's great to get initiative going to make sure that the capability is online quickly. But then equally have a plan to kind of have some full-time staff. Having said that, I don't have a command of the trend of freelance use. But I bet if you went to someone like Upwork, they might have some data analysis around that.

Emilio, I see you had asked one in the chat too. I just wanted to know if you have any preference regarding ESG applications, ESG-related applications into credit risk. I know it's a very young field, still working on progress. But I would appreciate if there is any R library I can start learning or any other kind of reference that might be useful.

More specifically, there's a big push to implement climate impact into loan portfolios right now. But it's limited to mortgage. I want to know if there's any other kind of type of loans that might be already being modeled or any effort around that.

Yeah, so I could talk on this topic for hours. I'll try to keep it short. There is a really big push across both regulators, investors, and industry participants around how to measure climate exposure, climate risk. That is true for both energy producing assets. So infrastructure is a big area where this applies. And equally loans to companies that have meaningful climate exposure, not just insurance companies, but businesses based in coastal areas. I know the SEC is doing a lot of work around the appropriate amount of financial disclosures that would be required from regulatory perspective with respect to ESG or climate change risk. We wrote a big white paper for him about that topic. So there's no standard framework that I'm aware of right now. They are all, but there's a lot of research and they're all generally sector dependent. So you mentioned mortgage, certainly, right? Again, for coastal areas. Energy generation assets. That's very much true for them. But I'm not aware of a framework that sort of says, here's exactly how to think about it pervasively.

Just seeing a comment from Brian about the fact that most finance output ends up in Excel and PowerPoint. So using R Markdown simplifies that task. I couldn't agree with you more. We're very heavy users of R Markdown in a lot of stuff that we do from training to the actual sort of solutions, custom solution projects that we build. So I couldn't agree with you more. I'm a huge fan of R for financial services applications. And I think that there needs to be a sort of intellectual acknowledgement that there is software development and there are good tools for software development. And there are tools for finance where you need to get to an analytical result quickly and be able to communicate that easily with other decision makers. And I find that R is much better for that second use case. So from my standpoint, it's not about what's better R or Python, right? They're both great tools, but they have very distinct use cases and they have very underused libraries for calling the subroutines from each other and creating interoperability. I think it's much more about that than what is better.

And there are tools for finance where you need to get to an analytical result quickly and be able to communicate that easily with other decision makers. And I find that R is much better for that second use case.

Well, thank you so much for having Dimitri for an awesome presentation. And I'll give people a few more seconds if you want to jump in and raise your hand with other questions. But that was great. And it's awesome to get the community together.

But thank you all so much for joining today. And we're having Dimitri. I'll work on getting the recording together. And if you have the slides, you could send over to me. That would be awesome, too. Yep. Well, we'll get that to you. And thank you so much for putting together such an awesome group. Super interactive. Lots of questions, which is exactly what we love. Hopefully, everyone feels like they're taking away some new information today. That's really our goal is to just empower people to use data better. And however, we can help do that. We're happy to.

Yeah, I echo that. Thank you, everybody. Marwa, thank you very much for the kind words in the chat. We really appreciate all of you for dialing in and all the really good questions and the great discussion. Thank you very much. And of course, thank you to the R team, to Rachel and Kevin for organizing this and welcoming us and putting this whole production together. Obviously, this wouldn't happen without them. So thank you so much.