
JD Long | Empathy in action: Building community of practice for analytics in a global corp | RStudio
Talk from rstudio::conf(2019) JD Long, Vice President of Risk Management & Data Philosophy at the global reinsurer Renaissance Re will share his experience with creating a "Community of Practice" for analytics inside of a global corporation. The theme of "empathy" will be recurring as he discusses how he worked to create a supportive learning environment focused on helping analysts "kick ass" regardless of their tool set. This means creating a community that's supportive of Excel, SQL, Python, and, of course, R. About JD Long: I build models. And according to George E. P. Box, my models are wrong. My skill is understanding when and where my models are useful. I'm an experienced risk and data scientist with a background in insurance, reinsurance, market risk, and stochastic modeling. I'm the guy who can build a Monte Carlo model, help parallelize the model to run on Amazon's cloud services and then stand in front of a general audience and put the work in context where everyone understands. My super power is thinking probabilistically, understanding risk, and communicating clearly. I have a history forming bridges between IT and business teams
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Hey y'all, I'm JD Long, and I thought I'd pick up our lecture from where we left it off last year at my session. Actually, it was a little bit uncomfortable, because I thought I was going to come in here and totally troll the crap out of you guys, because I'm going to talk about Excel and spreadsheets, and I got nothing left, so I just made up another presentation in the last five seconds of the keynote. So let's just kind of go together and see where this goes.
Now, I'm vice president of risk analytics at a global reinsurance company, and don't worry, nobody else knows what that means either, and it was very confusing for my daughter when she was growing up. So I'd go to parent-teacher conference and things, you've got to explain, like, oh, what do you do? And so I just had shirts printed up, because it made kind of everything easier to explain.
So I'm very empathetic to the role of spreadsheets. My whole life has been around spreadsheets and programming. I got to give a quick disclaimer. I'm going to mention that I work at Renaissance Re, and I usually never mention it, because the stock trades down every time I'm associated with a company. I'm not giving financial advice, and I'm not giving forward-looking statements. My junk is on GitHub, though. Feel free to get this presentation. Anything that went into it will be there.
Helping Excel analysts kick ass
So this was my great trolling comment I was going to lead with that I thought was going to be provocative, but kind of seems lame in comparison to what we just sat through. So to migrate Excel analysts to coding, help them be better at Excel first. This is kind of what I'm going to progress with today, and, you know, if you fail to migrate them to R or anything else, at least you'll have better Excel, and that's not nothing.
So I'm coming from an environment of a workplace that's largely based around Excel. Now I am suspicious the business analyst work environment is not that different than analysts in nonprofit or people who work in maybe finance or other places, other types of organizations. So I'm going to speak kind of my first-person experience, which is around business, but I think a lot of this is applicable.
Some of you all may know me. I run my mouth on Twitter way too much. Been involved in the art community there for years. Paul Teeter, a friend of mine, he and I are working on a second edition of this book for O'Reilly. We're in final editing, coming soon to find booksellers near you. And no presentation at this conference would be complete without an Allison Horst diagram. Now unlike everyone else who is lifting things that Allison put on Twitter, this was a custom piece. Allison and I discussed what I was presenting, and she came up with this, and I think it's fantastic.
So corporations tend to be sad silos. We've got our business user, Clippy and Excel and Word, and they all go to the same parties together. And we've got, you know, GitHub is over here with the Coterie-type people. Jupyter's a little concerned because he's been reading all the crap on Twitter about how notebooks might not be a good idea, and he's a little nervous. And our man has beautiful plots but is all alone over in the corner. And I am here to tell you it doesn't have to be this way.
So we can have a totally different relationship with these tools, but it's not going to be the default. And I am convinced the reason we end up with everybody in their own little silos is largely we're tribal animals, right? We look and we say, oh, I am a coder, therefore I value using these tools this certain way, and the way you're using the tool looks illegitimate to me because you don't have unit test and you don't have this, and you don't always use version control the way I would use version control, and therefore I'm more legitimate than you, and yes, totally what we just listened to earlier today.
I think we need to broaden our tent a little bit, and this is very much the theme we just heard in the keynote, that we're using some similar tools, we between programmers, analysts, business analysts, people who use Excel only, we're using some different tools, but there's some overlap. And I'm not going to be using Python or R or any of the coding tools in quite the same way that the professional developers who I work with use, but I'm still using them, and it's still production, and what's important for us to think about is nobody wants to learn coding. All we want to do is we want to kick ass.
All we want to do is we want to kick ass.
And if you go into helping other people learn a new tool and your mindset is I'm going to use a pedagogical method to profoundly change the way they think, nobody cares. Like nobody cares. They want to be a superhero, and it's important to think about how do analysts think of themselves. They're solving problems, they're doing heavy lifting, and they're doing it all while it's on fire, and they are not going to take the time to necessarily learn a completely new philosophy about how to think about the world, because they're too busy fighting fires and wanting to kick ass.
So if you approach helping people get better at what they're doing, we need to think about how we help them, because we often don't exactly help them get there in a way that's meaningful to where they are.
The suck threshold and Kathy Sierra
Now, who in here has heard of Kathy Sierra? Anybody? Because I'm totally ripping off like the rest of my presentation is like Kathy Sierra rebranded. She was a developer advocate before developer advocacy was even a thing. She wrote a number of books on Java, comes from a Java background, but all the way down to the phrase kicking ass comes from Kathy Sierra here, and I'm just recirculating it for our use case, but she has this wonderful graph about learning, and there's this suck threshold, and what we're trying to get to is the passion threshold. So what gets people feeling passionate about any tool is they feel like they're kicking ass with it, and the suck threshold is when it is horrible, nothing makes sense, and we can't figure out what to do next.
And if we start trying to teach, especially to this firefighter who's dead lifting flaming weights and is worrying about the whole company coming down, and if we spend trying to teach them here's five different types of objects you may encounter, you're just moving along the time axis, and you aren't moving them up the ability axis. So I would like to, my kind of key takeaway here is as you're trying to talk to people in your organization about upping their skill, whatever the skill is, get them doing things they perceive as useful, way more important than learning a body of knowledge.
Named tables in Excel as a bridge to coding
Now back to my whole, okay, help people migrate Excel by helping them be better at Excel, let me give some specific examples, because that's a bit oblique. This is Excel in Mac with a dark theme, because I'm cool. Very hip. Tell my 11-year-old. Okay, so we've got this workbook called Iris workbook, because we've got to use Iris, right, even though it's Excel. I'm blending cultures here.
We've got this Iris workbook, and I've introduced a sheet called Iris sheet, and it has a table in it, a name table called Iris table. Now the concept here that I think is very important for users of Excel to begin to use is this idea of a named table. Who in here is familiar with this idea of naming a table in Excel? Okay, cool. So a big chunk of people. I have noticed this is used very inconsistently, and sometimes analysts who are experienced with Excel don't even know this exists, and it's important to help them understand the value of this, because you can do really neat things like reference columns in a function. That feels a little bit like using a programming language, right?
So this is my other provocative idea that's no longer provocative. We're learning programming in a reactive functional environment, and that's good. This isn't a problem. This is a good thing. And we're helping an analyst or a traditional Excel user begin to think about a data frame and named columns in it and doing operations on those. And this is a vectorized operation, and it's super. They can then use these things other places, like use that table name when they make a pivot table. This is a natural segue into things we do in R.
The 10% better approach
One of the other things that's very tempting to do when you know the tools that we all know is to say, oh, oh, I'm going to take your workflow, and I'm going to help you fix it. Right? And I'm going to help you. You know, we're going to just do it all over in R. The philosophy that I have adopted after failing miserably at that and basically being told to go sod off is help people take the hard parts, the high friction parts, and build some automation around that. Don't feel like we have to change someone's life completely. Make an incremental change.
We've started in my team calling this the 10% better approach rather than trying to completely fix a workflow. Every time we do it, we try to make it 10% better. We have made more progress through the 10% improvement approach than we have through any number of projects that were grandiose and ultimately failed because the scope was too big. So I can make anything I do 10% better.
You know, but that R environment, though. So one of the challenges for new R users is setting up the environment. So where I work, we've taken it out of the system. We use JupyterLab anyway. So we have a JupyterLab login, log someone in, gives them the R kernel, right? That's pretty familiar. And it authenticates the user, mounts their drive space, loads their scripts. But we've got an extra menu option in R's because we just launch RStudio from here, you're authenticated, and a user can, like, play with R without having to put a request into IT, without having to work out those database connections. You know, it's just there. And the ability for low-friction play is important.
Now, I agree it's not enough to just, like, have it there and expect users to gravitate and start doing things with it. But if they don't have to ask permission, like, they don't have to get a service desk ticket in order to get a piece of software installed, the probability of them playing, experimenting, doing something goes way up. And especially if you've already got database connections defined or file paths or whatever makes sense in your organization.
Pulling data in and out of Excel with R
Now, Excel can be a bit of the vice grip locking plier tool, right? It's a fantastic tool. Does all kinds of things. Make dashboards with it. It's fantastic. We can put a user interface on top of it. It's fantastic. And we use Excel, right, in a lot of the same ways. But this is the assembly line, obviously, right? So this is a professional production environment. I can't tell from the picture, but I guarantee there's 25 pairs of vice grip locking pliers in this assembly floor, right? The principle is the tools, it isn't, oh, a professional production tool versus not a professional production tool. Vice grip pliers are used in production, right? They just aren't holding the doors on with them, right? They're being used for very specific purposes.
Let's give an example of a workflow that I have found helpful in illustrating pulling data in and out of Excel, right? Because remember, the objective is not get someone out of Excel. It's how do we make their workflow easier? Well, a common piece is do an ETL process in R, plop the data in an Excel spreadsheet where you're refreshing existing data, right? And a lot of the tools kind of assume you're exporting to Excel from nothing. But I have a world where analysts have built lots of good something inside their Excel.
So hey, let's load OpenXLS, XLSX, and let's drop in our iris data. We can do that with the right table function. And the thing I would like to point out is the two lower parameters, table style and table name. This is where we get named tables in Excel by using this feature. So if they have built in their spreadsheet formulas already, referencing tables and columns as I have been trying to encourage them to do, we can replace their data. Now best practice, I don't illustrate it here, delete the first table out and then drop the new in. Because otherwise, if you drop fewer rows in than were underneath it, you got bad data peeking out the bottom. Not good. So delete the first out, and that's not shown in this workflow. But then write it in, give it a name table, and it is in their environment where they're used to working with it. And we have just made their process hopefully at least 10% less painful. This is the entry drug. The first hit's free.
Questioning the automation ROI fallacy
I think right now Mike Smith is giving a presentation showing this exact same XKCD. So this is getting some legs here. We had a logical fallacy in my team for a long time, or at least I perceive it as a logical fallacy. We looked at this, and this is the whole, if it takes a long time to automate a task, but it doesn't save you a lot of time, don't do it, kind of seems obvious. Well, this illustration, like so many things, is a model. That's a model of reality that isn't actually reality. And it's wrong. Because all models are wrong by definition. They're actually fairly useful to the extent the assumptions hold. Let me tell you a few of the assumptions that are wrong with this image.
So one of the assumptions is all time is of equal value. So we can't take, you know, five hours to fix this because it doesn't save us that much time. It doesn't save us five hours over the course of some time period. The fallacy there in my world is unequal time value. So right around the end of the quarter, we have a flurry of reporting activity. And that time is incredibly precious. August, however, nobody's there. Who cares what we do in August, right? So we can take time in August. And if we can save a few minutes or a few hours during our very, very busy period, that's a great return. And it's because our time isn't of equal value.
The other thing is the current frequency of reporting is what we would want to do in the future. And we have found this not to be the case at all. Because we have automated things, we discover we run them more often and we want to run them more often. We want to get the analysis that we thought we only needed once a quarter. Oh, if we can get that daily, well, that might change how we do things. So it ends up not being stationary.
All time spent doing analytics is of equal utility. Some tasks are so awful and tedious that your analysts will quit and go work to Facebook instead of doing them, right? Don't make your analysts run away. Help them automate those manual processes. Not because they take a long time, but because they suck their joy. And the other assumption that's fallacious is that automated and manual workflows produce the same product. This is not the case. The product produced out of a automated process should be more accurate if you're doing it, right? And if it's not, it's at least inaccurate at scale, which is fantastic.
Not because they take a long time, but because they suck their joy.
The idea of our 10% better approach came from Dan Harris' book, 10% Happier, about meditation. And we used to call it actually our incremental improvement process, but then we had a different project called incremental and we can't have a naming clash and naming things is hard. So anyway, we ended up calling this process 10% happier. And we have found that it has truly, sorry, 10% improvement. We have found that it has made us 10% happier. And we're more time in this dynamic environment where our tools are getting along and the people aren't in silos. And this is obviously not where we live every day, but it's the state that we're aspiring to be in more. So with that, I'll wrap up. I want to point out the art here is by the wonderful Alison Horst. Shout at her on Twitter. She's fantastic and wonderful and did these. We dialogued only online and she came up with this wonderful artwork. So I got to thank her.
Now I've got a minute for some questions. Let's see. We got mics. Oh, go for it. So I had this custom made and it was, I realized as I was coming up here, I'm like, oh, you know, I guess I should have done like an Etsy store or something because I kind of think of it. This may resonate. This may be other people's love language as well as mine. We got other questions. Oh, right back here.
The question was, do I have other tips or resources? I'm afraid that I don't and the ones I do have are super focused on the workflows that we do. But I think this is a neat idea and if someone would put together like an online resource of this and I'll give some thought to it, I think that would be fantastic because I'm sure there are design patterns that get repeated over and over. Like my example of popping data out and popping it in, I'm suspicious that's a common design pattern. But sorry, I don't have anything.
