
Data Science Hangout | Jonathan Regenstein, Truist | Relationships with IT and Non-Data Scientists
We want to help data science leaders become better. The Data Science Hangout is a weekly, free-to-join open conversation for current and aspiring data science leaders. An accomplished leader in the space will join us each week and answer whatever questions the audience may have. We were recently joined by Jonathan Regenstein, Head of Data and Quantamental Research at Truist Securities. Working with IT and building relationships was a focus in our conversation with Jonathan and he included a few tips for building relationships with non-data scientist colleagues. Find a partner within the IT organization and talk to that person at least once a week. IT can help you communicate value proposition along the way as well. "It sounds crazy to say this in the world of data science, but relationship building was critical to what we did, especially at a bank. Thousands of request for new technology. There's no way to avoid going through all the security scans and check marks that we have to go through. We want to make sure we have a good partner who is going to help us do that" 0:48 - Start of session 10:52 - How should data science leaders work with IT? 46:20 - How far out Data Science Leaders should be planning projects with IT 48:20 - How do you become a champion of data science within your organization? 1:02:11 - Your responsibility as a data science leader is to work cross functionally 1:04:17 - Data Science Leaders: Your business cares about the value, not how you got there. ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio-pbc Twitter: https://twitter.com/rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
It's great seeing everyone. The goal is to give you all the opportunity to learn a little bit about data science leadership from an accomplished data science leader each week and from each other. We virtually have no agenda, so please ask questions. We'll start with a few light topics just to get things going. Don't be shy. Turn on your video camera, introduce yourself in the chat, ask some questions. We have people monitoring the chat, so we'll get to your question if you ask.
All that said, I am so happy to introduce this week's expert. He's not exactly a stranger to RStudio and no shortage of impressive things about him. Jonathan, why don't you introduce yourself and tell everyone what you're working on and your background.
Sure. Thank you. Thanks for having me, Rob, and RStudio. Thank you for having me back. My name is Jonathan Regenstein. I'm currently the head of data and quantum mental research at Truist Securities within equity research. And before that, I worked at RStudio for a little over four years as our director of financial services. So really, really good to be back and to see some familiar faces again.
About the team at Truist Securities
And I can give a little background on what we actually do, which is a question we get asked a lot. I'm sure a lot of data science teams do. We're a pretty new team within Truist Securities. And if the name Truist doesn't sound familiar, we used to be called SunTrust and BB&T. And then those two banks merged together to form one big bank called Truist, which I don't think is a real word, but it is now.
So that's who we are. And yeah, so we're formally called the data science and engineering team. And our job really is to bring what we call, and this is a pretty savvy crowd, so maybe I don't need to use these buzzwords, but we say we're bringing modern data science tooling to the world of equity research. So equity research has existed at Truist for a long time, probably 30 years. And basically, it's market research at the sector level, at the individual ticker level. So people making recommendations on various stocks, making recommendations on sectors and market moves.
And so we have about 35 experts who have been doing this for a long time, but they've kind of traditionally lived in the world of Excel spreadsheets and Excel models and traditional market data. And the goal of our team is to try to bring modern data science tooling, so things like R and RStudio, but also tools like AWS and Snowflake, which haven't traditionally been used at the bank. So there's kind of a tooling aspect to what we do. There's a data aspect to what we do.
So the word of alternative data has become very hot in the financial world, which is really just data that's not traditional financial and market data. So you can think of NLP-based data or things like app downloads, something we work on a lot is just taking our internal banking data, like credit card data, and just mining those for insights about the markets. So that's kind of our role, is to take modern data science tooling, take alternative sources of data, and bring all that stuff to the traditional research process, which has been rolling along really well for the last, I don't know, call it 30 years, but we're just trying to kind of bring new processes and new ways of doing things and kind of augment what's already been a really strong business for the bank.
Something pretty, some pretty, some sources of both excitement, but also sometimes difficulty is that we're kind of the first data science team to arrive in our business line. So the good thing is, we get to be the tip of the sphere on a ton of the cool stuff that we're working on. I mean, just AWS, right? No one was using AWS before we got here. I mean, let alone putting RStudio Server and Connect on AWS. So that was a far cry.
So we get to kind of pioneer all that stuff and bring these new practices, but at the same time, sometimes the pioneers don't survive that journey. So we're constantly fighting the pioneers fight too. We're saying we want to do things differently and it takes a lot of time. So we get to work on really cool stuff, but then we also have to overcome a lot of just the kind of inertia of the way things have always been done in the past. So it's pretty fun.
It's a really good challenge that we get to take on and yeah, it's been really exciting. I mean, another really cool thing about being kind of the first data science team to land here is that we really started with a green field in terms of our coding stack. So we didn't have to come kind of have debates about Python and Julia and R. We just kind of said, we're going to start with R first and RStudio first, honestly, and just kind of go from there.
So that was another kind of lucky thing that we had from being, again, the first data science team, but no shortage of conversations we've had to have about, well, I mean, hey, what is R? Okay. We've had that conversation many times. What are these packages? What is CRAN? Where are these lessons coming from? Okay. How do you manage your packages? Is there a package manager? And we say, yes, RStudio has a package manager, okay, let's go examine what that is. Why can't y'all just use Tableau? We've had that conversation a million times of why we prefer to use Shiny.
So we've been just digging and digging and digging and really trying to bring workflows into the way that we want to be doing them. I mean, we've even started using, frankly, R Markdown for almost all of our reporting and just getting the templates to look and feel the way people wanted the templates to look and feel. I mean, that in itself took quite a bit of time because everything needs to kind of flow into everything needs to be in the right format to just get into our corporate template manager. So we've just been slowly taking on these battles. And then of course, trying to add value along the way, which is always the most important thing for us.
So those can be small things. Those can be, when I say small things that can mean like providing data that's clean, that can be really valuable to a team that isn't used to cleaning data. It could be, you know, running models and trying to put together some forecasts for people. So doing the kind of cooler data science-y type stuff. But you know, we've definitely learned a win is a win. So if it's going to be a simple win from cleaning data, we go for it. It's going to be a more complex win from, you know, trying to do some forecasting. We'll go for that too.
Team size and makeup
Could you maybe just give everyone a sense of the size of the team that you lead? And maybe just a quick note on the makeup of it. Yeah. So we're a pretty small team. There's four of us right now. So one member of the team is just, he's not officially called a data engineer. It's really funny. We don't have that title at the bank. It doesn't exist. So I think he's just called like a research associate, but functionally, he's really a data engineer. So he does a lot of like data harvesting, you know, web scraping, cleaning, getting stuff into databases, just the way we need to be able to access them.
And then there's kind of two, I guess you could just call them more traditional data scientists, in addition to myself, who are of varying degrees of seniority. One of them is just recently out of college. The other one has been out for quite a while. And within Truist, we have something called the Data Science Accelerator Program, which I didn't know it existed until I got here. But this teammate went through our Data Science Accelerator Program, which is, it's actually a pretty cool program. We can talk about it later. I had nothing to do with instituting it, so I can dispassionately say it's a really good program. It's like an 18-month, basically, rotational program where people, you know, apply as if it's going to be a full-time job at the bank, but instead of just choosing a job, they enter the Data Science Accelerator, which is really a chance to do three six-month rotations around different parts of the bank, and learn different skills, and just meet different people, see different business lines along the way. So that's where one of our teammates came from.
Working with IT and overcoming inertia
Yeah, I think, nice to meet you, Jonathan. Appreciate you sharing all that background. I think, certainly, perhaps you triggered a lot of people when you were discussing some of the hesitancy that other groups that maybe are kind of clinging on to some of this legacy process or legacy technology. So, my question for you, or maybe if you don't mind sharing some of your strategies for kind of working with that and maybe overcoming some of that.
Yeah, definitely. And it's definitely ongoing. So, I don't want to give the impression that we've got it all sorted out. So, I mean, for better or worse, I mean, step one was really finding, like, a really good partner within the IT org who really, you know, wanted to work with us and was really invested in what we were doing and in our success because that's kind of the person that communicates upstream.
So, yeah, that was definitely a really important step one. And we have, we talk to that person at least once a week. Usually it ends up being two or three times a week. It's just like a, I don't know, it's been a really important, valuable relationship that, I mean, now it's been, you know, over a year that we've been working together. And, you know, he wasn't one of those teammates that I kind of referenced, but he's a really important part of the team. So, that was huge because he kind of helps us communicate the value proposition.
And then the second part of that, I guess, was really, like, trying to get small wins along the way and frequently. So, like I was saying, we'll take small victories. We don't hold out for when we have to, you know, change the world with a new piece of data science. If we can use AWS to, like, really quickly deliver data to people, you know, that's a big win for us. Especially, like, we'll put Shiny on top of it and we can make data accessible to 20 people. And people get that. That really resonates.
So, like, simple things that resonate with people have been really, really helpful for us, too. But, I mean, it sounds so crazy to say this in the world of data science, but relationship building was, like, critical. Just critical to what we did. Especially at a bank, you know, we're a really big company. Thousands of people. Thousands of requests for new technology. You know, there's no way to avoid going through all the security scans and check marks that we have to go through. So, you know, we just want to make sure we have a good partner who's going to help us do that.
But, I mean, it sounds so crazy to say this in the world of data science, but relationship building was, like, critical. Just critical to what we did.
And, like I said, I mean, it's still ongoing. Like, we're constantly having those conversations and refining things. But also, I mean, I'll even take a step even further back than that. This is probably more relevant, you know, for people who are starting something and new. But it was something I really asked before I even came over here, which is just really make sure that, like, senior leadership was bought into what we were trying to do. That was really, really important. Because when push has come to shove on a few things, we've had to really just kind of go ask someone really senior to say, we just need this push through. And we always explain why. But sometimes we just kind of need a push through.
So I guess that comes back to relationships, too, but it's a little different from the everyday weekly relationship we have with our IT or I guess you could really call them functionally our DevOps team. So, yeah, you know, showing value up the chain has been really important for us, too. And, I mean, I don't want to I should also say, you know, our leadership was already kind of invested and supportive of this, which is why we're here in the first place. So I think that's important also, honestly. Like, you know, to the extent that anyone can control this and you can't always, but it's always good to be either at an organization or in a part of the organization where the senior leadership is really supportive of data science and what we're doing. So that's been, yeah, that's been really huge.
Upskilling vs. specialization
So I'm a lead data scientist at Nestle, so likewise a large corporation. And we've had some form of statistics statisticians for almost as long as Nestle has been a company. But in the more recent past, I would say in the past two to three years, we're trying to modernize, do I guess what you would call more standard data science work streams. So that includes hiring folks with new different skills, myself included. And so my question, which you spoke to a little bit about earlier, but is really kind of thinking about, we tend to have many more people that are more statistically minded. They can do programming, but they're less adept at kind of doing the productionalization steps necessary to get something kind of end to end. And so just thinking about if it makes more sense to kind of upskill the folks we have to kind of have people that can kind of take things end to end, or to have really a smaller number of specialized people with maybe more knowledge on just that final piece of getting something through to production.
Yeah, that's a really good question. It's something we have been thinking about a lot. We haven't honestly quite got to the point that we're quote unquote putting things into production, though we have had to go through a lot of the process for having our production environment set up. But we've kind of been down a similar journey just for a lot of the things we've been doing. And my opinion on this, I would love to hear what other people think, too, on the line. But I would say that the end state will be a person who has specialization in that field, honestly. I think upskilling is important, but more as a way so that people can communicate with each other.
We have found that people here are so busy that it's very difficult to do a thorough upskilling, even in terms of just coding and data science. What we've tried to do is upskill people to the point that they kind of understand how best to work with us, and we try to learn from them how best to work with them on what they want. But ultimately, I think if it's going to be something that technical, I would say you'll need someone with those specialized skills.
I'm Moudi Hadi. I run new product development at S&P Global Market Intelligence. I actually do have a bunch of apps in production in both R and Python. So this is kind of a bit close to my heart. Because as data scientists, the initial release is always exciting. Everybody's keyed into it. I also run a team that's about four people. Once things go into production, we've learned the hard way that we do have to kind of give that over to some form of DevOps slash engineering teams to sort of keep that going. And we've had to train a lot of the skill set and the technology side of the equation to at least understand how to control some of these processes, because future enhancements need to be extended. And to be honest, I can't dedicate people to do minor enhancements, because I have to move on to the next thing.
Yeah, no, that sounds right. I mean, I think your experience is probably more instructive since you already have apps in production. Yeah, but I think we're kind of getting at the same theme, which is that time ends up being a major limiting factor for how we approach, whether people are going to be up-skilled or kind of gain, you know, people are going to be up-skilled or kind of specialists are going to be brought in.
Getting IT up and running
I see kind of a related question about, you know, working with IT, which I think touches on this, and, you know, how has IT kind of gotten up and running? So, I mean, I think, and Moody, again, your experience is probably instructive in this, but, you know, we started out, you know, a year ago, and really still to this day, we still have this environment. You know, step one for us was kind of getting the permission to just be our own DevOps team in the beginning, inside of a lab environment where we could do things quickly and fast, and then in parallel with that lab environment already set up so that we could start doing the work we needed to do, you know, in parallel to that, we kind of formed that relationship with IT, and then we could kind of slowly build up our dev and production accounts, but we had the lab environment that we were administering ourselves that was, that kind of worked the way we wanted it to work, and then we also kind of had a model that we could show to IT along the way.
Yeah, I know exactly. I mean, you know, I used to have a full head of hair before, right? So, you know, it's just, yeah, we started like a lab, right? The internal lab sort of environment, and then invariably, we basically Dockerized a lot of basically our work into sort of RESTful services, and by that nature, kind of making the production meant that we have to kind of transition some of those services to the software engineering team. So, like we've been lucky, like there was big, I think about four years back, there was a big drive to sort of upgrade the skill set from like Java and C and .NET to, or C Sharp to basically Python, right? And then we obviously had a strong audience in R as well, and so a lot of the software engineers have actually kind of, the ones that do analytical work, they at least have a background in Python, and they know a little bit about R in order to be able to maintain it and extend features.
But yeah, I mean, it took four years or so to get to this point versus like, like my first app was in 2014, and that was a batch operation that to this day, I still maintain myself, so, you know, so... But I think Dockerizing things and making them much more like RESTful calls makes the whole process seamless. Of course, you've got to have some form of exception handling in place, but that's much easier for an analytical software engineer, or like what used to be called a quant developer, to sort of maintain in production and extend than like necessarily the data scientists, who, to be honest, a lot of their job is more exploring data analysis and building and forming a model, right?
Like ShinyConnect is kind of interesting, because once you build a very nice interface, it's very hard to sort of hand it over to the JavaScript developers. So, I'm kind of in the boat where I'm learning what to do there. But yeah, I mean, it takes a bit, right? But the key is like it does come top down from the CTO and, you know, to give them the sort of like, you know, our studio conferences, I think Lander has some conferences and things, like awareness, awareness, right? So, that they know to operate the skill set, so they're ready to sort of take over that more. What I consider more business as usual, like version 1.0, you're a guy that involves version 1.1, 1.2, 1.5. It's the software engineers already doing the work, right?
Yeah, and I think, I mean, depending on what type of org everyone's in, Moody mentioned like a four-year process. We're about a year into our process, and I can envision it being about a four-year process to actually get there. So, you know, big financial orgs, things just take time to get to this point.
Language choice: R vs. Python
Yeah, thanks, Robert. I'm Ashish. I'm working as a senior data scientist at Madison Petroleum Corporation. So, Jonathan, you were talking about, you were deciding on whether to use Python, Julia, or R, and then you started with R. This is kind of a similar story. We started our data science team about six years ago with two data scientists. One of them were using R, and one of them were using Python. And as we grew the team over the years, now we have about 15 data scientists, and now the scale is tilted towards Python, and we have about more than 10 people using Python and about a couple of them using R.
And part of the reason I'm asking this question is we didn't want to, you know, contain the creativity or talent pool that we can avail by restricting the use of one language versus other, but it seems like just there is much more talent or availability of Python resources. And also, at the same time as Moody and you were talking about, we are also in a multi-year process of setting up platforms, environment to, you know, operationalize. And it seems like we are trying to set up Azure, and we have looked into some others as well, including DataRobot and everything. It seems like R is getting like a second-class citizen treatment from most of them. So how do you manage that?
Yeah, so, I mean, again, my experience might be a little bit different. I mean, you know, I kind of knew I was going to say from day one, but really like day zero that this was going to be an R-based team, you know, and because that's what I think is the best tool for what we do. And we're a very research-intensive job, and I think R is like a superior tool chain just for research. But we do sit next to kind of the Quant development team, or I should say we will sit next to them when we're back in the office. Right now we just have a lot of Zoom calls with them. And they're the team of pretty heavy Python users, and they haven't really found a great environment for what they do. So they are probably going to end up just cloning our R production environment and just using it as a Python production environment using the RStudio tool chain.
The talent pool has actually been a pretty interesting experience for us. So like I said, we have this internal data science accelerator, and I definitely would say that if there's 20 candidates per rotation, all of them will have Python on their resume. It's a very, very ubiquitous language. But then we will find a handful, maybe four or five, who are really passionate about R. And they're excited about it. They're fired up about it. So we've been able to find super high quality people, even though maybe there's fewer of them.
It's actually been kind of an interesting experience because it's almost gotten to the point now that we just tell the people who run the program, listen, if there's any really, really passionate R coders out there, just send them our way. So there's probably 15 teams competing for all the Python coders, and then there's like us and one other team competing for the R coders. So we kind of get the cream of the crop for R coders.
Well, let's see here. It would impact us. Well, I don't think it would impact the end results that we could deliver, I guess, at the end of the day. I mean, I kind of feel like there's probably really skilled Python coders who can do what we do in R. I wouldn't be one of them. I can tell you that. So that would probably be the sign that I'm back to the R world somewhere.
I mean, that being said, especially for what we are, our end product is research reports and typically visualizations. So that's usually what we are building. So from that sense, I actually don't know if I don't know if there's a huge population of Python coders who really want to dig into what we're trying to do and spend as much time as we do trying to build reproducible charts and things like that. And also, like I said, we use R Markdown for all of our reporting because we just think it's the best templating tool out there. Obviously, you can use Python within R Markdown, so that would be a possibility.
I guess I don't know enough about the Python world to say whether there's some equivalent R Markdown out there. I mean, every once in a while I'll see on Twitter, someone will say, hey, this is just like R Markdown but for Python. But then other people are like, nah, not really. It's not really like R Markdown for Python. But I'm not a Python person. So for what we do, R is the superior choice.
And I guess, you know, I wouldn't be here doing it if I didn't think that. And Shiny has actually also been really, really useful for us, because I would say the kind of blocking and tackling of what we do, which a lot of people really just want clean data. A lot of times we're just building Shiny front ends to SQL databases so that people can like intuitively sift through data, see some interesting charts, possibly download those charts or just download the raw data behind them. So we found Shiny to be like really useful for that. Again, I'm sure there's Python coders out there who could build the same in Python. But Shiny works beautifully for us.
Using reticulate vs. JupyterLab
Yeah, hi, I'm Travis. I'm a data scientist at Parsons Corporation. So I've been kind of like the only data scientist on my team. Recently, there's some other team members who, well I should first say we recently got our studio workbench. And then I've been tasked with also getting Python up and running. And so I'm wondering how much sense it makes to really commit to using reticulate and Python within our studio, or should I just, you know, use like JupyterLab or something on, on that server to keep it separate.
Yeah, so we're, we're actually exploring both paths right now. Both reticulate and JupyterLabs for like I said there's a quant team that sits next to us that's just a pure Python team. And they're, they're not really going to be using any, any R. From my experience with, with them and what they've gravitated towards. They're not huge fans of Jupyter, they kind of like R Markdown better. So as long as the the package management is not too painful though I think they're used to it being a little bit painful, which is just kind of part of the deal. I think they'll get a better experience when we've just tested things out with R Markdown and reticulate, honestly, but, but this is really just like, you know, an end of one.
They're not huge fans of Jupyter, so I found some. Again, I'd love to hear what other people think about this that, you know, hard, really experienced maybe hardcore Python coders. I don't know that they like it but they're not really that fired up about it. And the reporting capabilities in R Markdown usually just people like those so much better. But we have to do a lot of documenting and a lot of reporting. My team does and their team does too because we're doing a lot of a lot of trading, and they want to have a nice. I don't know, a nice record of what's going on. So that's the route we're planning to take is R Markdown reticulate route with that team.
But again, that's more driven by their, their preference which isn't really that skewed towards Jupyter notebook. In fact, I would say if, if they were already in love with Jupyter that, you know, this would be kind of a nice to have, but they haven't found a great Python environment yet. So they're kind of intrigued that there's something new out there. But yeah, the package management is, is, is the thing if that can be solved for, for the Python guys I think they'll, they'll be psyched. But that's as far as we've gone with that. Yeah, we've done a few kind of demos of, you know, here's how our studio workbench works. Here's the Jupyter hooks. Here's all our Markdown and reticulate work and here's how we can publish both up to connect. And the R Markdown via reticulate, I think it was just, it was just a newer, a newer fancier tool also, you know, they've seen Jupyter labs a million times, and have already kind of made the decision that it's not necessarily the tooling they want.
Leveraging AWS and building a lab environment
I work in pharma so we have a very large org lots of different groups that are doing statistics and data science and working with it quite a bit, and we are adventuring in somewhat new frontier with at least for my to my group we're augmenting a lot of the shiny apps and our pipelines are developing with AWS and the various services within it. So I'm kind of like the pioneer in my immediate group on a lot of this I'm trying to, first of all, learn the best ways to do it myself but also be able to empower my teammates to get up and running fairly quickly so I didn't know, in your experience, how have you been kind of making sure that your data scientists on your team is able to leverage these technologies very well I don't know if you have any advice about that.
This this kind of goes back to the kind of lab versus production environment. So we were able to get a lab environment set up for us where we had a lot of freedom in AWS. It's still behind our VPN, it's totally locked down. Like you can only get to it you know from behind our firewalls etc. But within that we kind of got some freedom there and then we just started spinning up our, you know, a couple of servers and we kind of built our studio server and, and, and we built a couple of databases and AWS ourselves. It was really helpful for us to learn exactly what we needed. And, and we've been doing that for a year now. And now that we're setting up a proper like prod and dev environment, we can kind of show it to to our IT team and tell them to just basically clone it. So it has kind of a double payoff like we got to learn the skills, and then they get to clone it. We really just learned about it by following through the AWS docs and then doing some of their trainings.
The biggest roadblock we hit was honestly, Git and GitLab and GitHub, those were not blessed technologies here. So, so we had to use AWS's CodeCommit, which was, I'm, I'm convinced we were the only team in the world using it, because I couldn't find any documentation on it. Stack Overflow anywhere. We eventually got GitLab. I shouldn't say we got it blessed, we found another team that had already bought it into the enterprise. And then we were able to kind of hook into that so so now we're using, we're using GitLab for our, for all of our versioning but but that that was a big roadblock for us.
But the, I think the real key was just was, we needed an environment where we could learn and kind of get things set up the way we wanted them so getting somehow getting the freedom to have that lab set up, you know, for us, it was, it was, we really just kind of told the truth, which was, you know, if, if we have to wait for the dev environment it's going to be six months before we can do anything, which, which was true I mean that was the, that was the actual timing it would have been. And that would have been really we were just would have been sitting around for for six months, so we needed the lab setup. And when they were able to get that, get that done but, and then from there we could kind of play around and learn how we wanted things set up, but beginning the lab setup was was critical.
Negotiating with IT and planning ahead
Yeah, I would say you're definitely not alone in that struggle. One thing we were able to, to, I guess negotiate or just kind of get squared away up front was that was that we kind of said okay well once we get this lab environment set up, you know, we're happy to migrate to a fully managed account by it if they want it. But we kind of said, like, we're not migrating until you know we've confirmed that it works, because we can't just have everything break on us. So I guess we've kind of proceeded in phases like get the lab environment. Get it producing value for the business and then say well we can't just drop this, you know, it's something else viable in place. So yeah, you just kind of ladder up.
I wouldn't say a year from now. That's, that's, that's kind of way too far out. Yeah, hypothetically, yeah, because if, if it's on the calendar for a year from now that means realistically it's probably two years away. But yeah, I would say three to six months, because that's three to six months is really just how long it takes to to, I don't want to say get anything done, but to have like a start. A start to finish new implementation, like done and blessed and handed off, it's three to six months that even things that seem you know simple, like there's so many stakeholders involved, especially and we're, we're part of a bank so we always have a security layer that has to bless everything that we're doing.
We always don't want to go too far beyond three to six months. Now that I think about it, there probably is one or two things on the calendar that's like in a year we need to have this stood up. That, that mostly relates to kind of onboarding the Python team that sits next to us, because that kind of can't start until our environment is like fully built out. And then, since there's, there are different team there's like a different security layer that has to go into that. So yeah, three to six months is where we try to keep things.
Being a data science champion
I have a lot of thoughts about it. Maybe the truncated version. Yeah. I mean, so the, the first thing that was pretty that we did was we tried to seek out other people who wanted to champion this stuff like the more people that we kind of found you know like minded people up and down the chain you know like very junior people and very senior people to who really wanted to champion this. So that that was definitely step one. Was finding other people. And then that was how we found this data science accelerator that we, we, we didn't know existed. So if anyone's at an organ. And you don't have a data science accelerator program it's been super successful at truest.
And, and most of what we're communicating is not really data science. The value proposition of what we're doing, and why we're doing it very we always have one slide on like our, our technology stack that encompasses, you know, our studio AWS snowflake data, data vendors, that's all kind of put on one, one slide of like a 10 slide deck. So, other than that we're always just talking about the value we're bringing, how it's going to help non data scientists that's like a huge theme at the bank, you know, it's always about lifting up teammates, you know so if data science can help enhance other people's work that's always like something people are very very supportive of upskilling to, you know, a lot of people are interested in that but, but it's a lot of work to help upskill people but if, if you can find people who are willing to take on that work.
I think it's, that's a big win also but that's really a lot of work. It's very time consuming. Especially, I mean a big organization that that's been like a nine month project, just kind of getting on the training calendar of okay we want to start upskilling people on data science, teaching people about our etc. Creating the content and doing it that's like the easy part, getting it, getting it on the docket has been the hard part but, but I definitely think that trying to position data science as something that helps other people has been really important for us. Because, again, there's, there's some hesitancy about data science there's some hesitancy about this new thing this new technology that could kind of take over what other people are doing so.
I definitely think that trying to position data science as something that helps other people has been really important for us.
Yeah, positioning data science is something that helps other people and finding other like other little pockets of data science champions has been has been a good way for us to approach things and help us figure out who we need to talk to about different initiatives. We've done this. You know, this is our fifth time doing this now and each time communication has come up as a school, a core skill set, or, you know, a core activity.
Project management for data science
Hi everybody I'm Alejandro from Italy. I work for a kind of data driven company it's called Chattavet. We have this, like, problem with management, in the sense that we sell products, and we make products that are deeply statistical models machine learning models, data science pipelines ad hoc projects. And I feel that we have this problem with project management, the senior managers handle things in the typical project management way. And on the other hand, I have some support from it with more software development, kind of management, and I feel that neither the two words fit with the data science, a kind of projects. How do you handle these I, from what I heard, you have more or less this approach of data science not software development only.
Um, yeah, so yeah that that's a really good question it's, it's, that was actually strangely enough another thing that I didn't really anticipate being a challenge here, but it was definitely something we had to overcome, or I shouldn't say overcome was more there was no kind of infrastructure in place. And there wasn't even like an informal way of doing things around that. Yeah, very data sciency approach, and we just leaned really heavily on on GitLab, honestly so that was a, that was honestly a big step forward for us there, there really was no, there wasn't even version control when we got here. So we had to kind of bring that in and then once we were able to get version control blessed and have it be part of part of what we did we we really just run all of our workflows around version controlling GitLab, and we're a very small team there's there's only four of us.
And we have kind of autonomy in the way that we do things. So, we just kind of run things. Honestly through GitLab and we have our own. We have our own. I guess you could call it project management but really it's just kind of a project workflow that the four of us use and you know as we add more, you know, more members to the team will refine that workflow and everyone will kind of will kind of use it from there. So, I mean, I guess this is an area we got maybe a little bit lucky we got to kind of start from ground zero. And we didn't really need to rewrite any rules.
I mean I on the project management side I mean I know what you mean it's tricky right because, like at the end of the day, you're trying to target deliverable and some of the early pieces, you're kind of exploring so you don't actually know what you would actually end up developing to hit that deliverable in production. I mean, I mean I struggle on the exploratory data side I mean you might have a bit of a different use case because you're selling that product immediately. Like, so I tend to like do small iterations and show value early, and then say hey you know what the next iteration, we're going to have to try something completely different.
We kind of buffer a lot early on, on the, on the, like the model design piece. And then, because implementation typically after that it's quite quick. And then project managers typically understand that kind of that that piece but yeah it's it's tough right because you don't have something prescriptive that you've got to implement, which is what most project managers are used to, you know.
We have this, like, different languages or objectives that managers want concrete things and you say okay we've made lots of progress in the model, a fine tuning or in the exploratory or whatever. And, okay, so it is ready. What percent of ready. I don't know we are making progress.
Yeah, it's tough I like I buffer so I do small iterations and then after, like, three or four. I would assume that we basically have something close that can be prescriptive but like the last one is typically running over a historical time period to make sure that if we were to actually implement this in production on a wider scale to run, but the first three or so are on a much smaller sample size. I mean it's look it's not bulletproof. You know, because the folks are used to much more of a prescriptive style of implementation right so.
One thing I ran into with, like, scrum is an agile is people don't seem to remember what agile is supposed to mean. They make they take it as more of a, we have stand ups and we have specific tasks so having those conversations has helped somewhat be like well the purpose of agile is to allow iteration and we don't always know because it's exploratory data science from your regular computer science and product development.
Looking ahead: cross-functional relationships
I mean, so I guess I'm repeating myself a little bit, but, but for for us at the bank, it's, it's, it's all about like the cross functionality that that we're working on right now. So you know any year we probably won't be able to to kind of hide behind the fact that we're a new team, and we need to run lean and we need to have time to do stuff. So like our enterprise data organization, our IT team, even things like our recruiters, you know, in the talent pipeline like like all those things are going to basically become little, little parts of what we do. So I think for us, that's the next step over the next year is strengthening all that stuff.
You know, I think we've. We're a year in and we've reached the point that everyone gets what we're doing they kind of see the value in what we're doing. But for long term success. I think the cross functional relationships are going to be a big focus, which is a lot of time. There's a lot of meetings. A lot of time with hands off keyboards, which is just like a different phase of what we're doing.
Yeah, I mean, see I'm trying to think it's a really good question. There's, at least for us there's no shortage of tasks that need to be done in that area. And it's very very time consuming. And I find, honestly I find most data scientists don't want to do it. It's not really why anyone would say they want to be a data scientist, right. You know the the fun stuff of data science, you know, no one's really going to ask about what you're actually working on. So, I guess, I would just say, you know, do it put your hand up and do it and begin get involved in those conversations.
I spend as much time talking to our enterprise data organization, as I do the people who are like technically on my team. And I spend as much time talking to it who again they're technically a separate team from then actually like talking, I'm not talking about data science with them. Yeah, I'm talking about the infrastructure we need and things like that. If you're not, you know, you are Jonathan right if you're someone more more junior. How would you go about that.
Yeah, like so like, so to, like, like and I think Jonathan and I probably pretty similar in that sense I think you actually said it earlier Jonathan resonates what you said, talk about the value not how to get there, right I never talk about like to the business, we talked about like a product being released, or some revenue potential, like to the folks who are paying it and we're not really talking about our Python or even a model at that point, it's just the fact that there is some opportunity that we're going to go after. And the fastest way to go after that is this right and this is sort of how we will take the data and then combine it and then push it out. And then you've kind of committed to some timeframe so the business will understand that. Right.
Now, on the other side, when you like to talk to the sort of the tech and DevOps, I think maybe I'm a bit lucky in a way because they've kind of gone warm to the whole like our Python, like environment like we also use AWS and then we can kind of build against a more, I guess a more rigorous like source code management stuff. Now, yeah like Jonathan I guess to your point like yeah when you sort of align with those folks. This conversation is not about data science for us about the products and then, you know, can we improve for example indexing on some of our databases to enhance like in a poll for a job right or trying to look at the data lake, we have like, we have the opposite problem I think we have a lot of data lakes.
You almost never talk about funny enough you almost never really talk about like the cool stuff you've done like which is the model piece until like you've kind of documented it and you're for me, if you're on a sales call with some selling to Jonathan for example I talked about the model piece. But that's like, you know, but that's really, that's it. You kind of have to be to move up I guess, or to talk you have to have speak different languages one data one software engineering and one business domain, but to sell yourself to get a project and move up you're talking business, 100% time, you know, business and cost.
You kind of have to be to move up I guess, or to talk you have to have speak different languages one data one software engineering and one business domain, but to sell yourself to get a project and move up you're talking business, 100% time, you know, business and cost.
People can find me on LinkedIn. I'm constantly posting little our snippets on there. So, yeah, that's the best place to connect with me and feel free to reach out anytime anyone wants to talk about our finance, etc. That's what we do. And shameless shameless plug for you you've, I think you've written a book. Yeah, I mean my, I wrote a book a long time ago, called reproducible finance with our been three years now originated as a series of blog posts on our views. So thank you Joe record, if you're on this call, but if you're not, maybe you'll see the recording. That was a long time ago but we might we might crank out another book this year to kind of update some code and talk more about the macro economy. Yeah, looking forward to connecting with everyone.
And while I while I do that we do these every, this is your first time joining we do these every Thursday at noon the zoom link is always the same. So maybe exciting news is we're working now and just getting the recording sort of uploaded in full to to our YouTube channel, so folks can can go back and watch them that that due date or we're trying to be able to do that sometime within the next few weeks. But yeah, thank thank you so much Jonathan for joining us and dropping some knowledge. Sure, my pleasure. Thanks everyone. And thank you Moody for the contributions. Hopefully see everyone next Thursday.
