It's Abstractions All the Way Down... - posit::conf(2023)

Transcript#

This transcript was generated automatically and may contain errors.

You know the drill now. I'm about to introduce the next keynote speaker, JD Long, with a poem.

Introducing JD Long, a legend in the field. Open source wizard, his knowledge unsealed. Since 2002, he pondered with might. Open source tools, can they take flight? Look at risk management, he knows it well. 13 years deep, with stories to tell. With R and Python, he crafts his art. Jupyter Labs, our studio, all play their part. In Richmond, Virginia, he makes his home. With wife and daughter, they brightly roam. A recovering lawyer, a philosopher teen, in this vibrant family, dreams are seen.

You know, it's funny. I didn't realize Hadley was doing the poems before everyone. And this is like, I'm an artist who's been put out of business by Chad GPT, because in 2000, I don't know, 13 years ago or something, I spoke at the R and Finance conference here in Chicago. I did a lightning talk, because that's run out of here in Chicago.

And my gag was, I did it in Seussian rhyme, and it was hilarious, right? Because nobody comes to a freaking finance conference and gives a whole lightning talk in Seussian rhyme, and now anyone can do it, and they hardly have to work at all. Ah, so frustrating.

Well, I just want to say hi to you all. I love being here. It's exciting to be back in Chicago. I helped start the Chicago R User Group many years ago, and it started as, we had some guys, they were all guys at the time, R and Finance guys and myself, and we were going to Jake's Tap and talking about R, right?

Fantastically good times, and we decided we should just invite other people, right? We're having good conversation, and what we'll do is we'll do what we would like to participate in, but we'll include other people. And it's so lovely to see that the Chicago R User Group is still, you know, alive and well and kind of following that ethos of, you know, share the things you're doing and include other people, and that's really kind of an R community ethos and also, you know, an RStudio slash Posit ethos, and it makes for a fantastic community.

So speaking of that, like four years ago, I showed up at the RStudio conference at the time and gave a presentation, and I wore, you know, a shirt with this pattern on it because I tried to explain to people what I do for a living, and this was the best way to explain it because I'm an agricultural economist who works for a global reinsurance company, so it's easier just to say spreadsheets and bullshit.

So I need to, I do work for a financial services firm, so I work for Renaissance Free, and I am not my employer, nor am I representing them. I'm not discussing reinsurance here, and all the ideas here are mine unless otherwise stated. This isn't business, this is pleasure.

I was trying to think how long I've known Hadley and JJ, and I knew them separately before they were working together, and I stumbled on this. Back in 2010, for Hadley Wickham 's birthday, so almost 13 years ago last week, I sent Hadley a copy of Generalized Additive Models, an Introduction with R, which had been on his wish list in Amazon because he was a poor, starving faculty at Rice University, and he was maintaining open source software, and not only that, he was answering my stupid questions on Stack Overflow.

And I wrote in the gift note, thanks for helping me kick ass with Plyer, now that's the precursor to dPlyer, Plyer and R, I appreciate the tools and the help you've given me on Stack Overflow. And this is the kind of community we have, right? And if anybody is wondering how you become a keynote speaker at Posit, the price is $68.78 and 13 years.

Abstractions and leaky abstractions

I had no idea when I started forming this presentation that Jeremy was going to be one of the keynote speakers. But I really enjoyed listening to JJ and Jeremy do this thing they called a two-way AMA, which I didn't know what that's called. I've been calling it having a conversation, and I'm just not really cool, right?

And one of the things that jumped out is they made the comment of if you think about how you help both new users ramp into things and make experienced users productive, you provide these abstractions, and there's a dial of how leaky you want the abstraction to be. Now, a bunch of us who've been around software, maybe worked in software engineering, but definitely talked about to software engineers, know this idea of abstractions and leaky abstractions.

In this community, a bunch of us come through clinical sciences or we come through other fields that aren't computation first, and I was thinking this term is really powerful, both abstractions and the idea of an abstraction leaking. These are really important concepts that I think we should more widely ingest.

So when Hadley contacted me and said, hey, you want a keynote? I'm like, yeah, what's themes or whatever? And he's basically like, I don't know, you use Python and R, maybe like some, I don't know, do whatever you're thinking about, right? Well, it just happened, I was thinking about this, and this has been one of the things I've been thinking about with my team and the people I work with, is how do we talk explicitly about abstractions, leaky abstractions, and how we deal with those leaks?

So let's talk a little bit about abstractions and leaky abstractions. So first thing I did was go back, I thought I knew where this came from, and I confirmed that while the term leaky abstraction was around in the zeitgeist, it really didn't get traction in the tech community until Joel Spolsky wrote this blog post over 20 years ago, and he calls it the law of leaky abstractions, and the law is all non-trivial abstractions to some degree are leaky.

and the law is all non-trivial abstractions to some degree are leaky.

And what he means by that is an abstraction failure, sometimes a little, sometimes a lot, there's leakage, things go wrong, it happens all over the place when you have abstractions, right? So it means that you have this thing you're relating to, it's abstracted away so you have an interface, an API, a calling to a function, and you interface with it in a way and it doesn't do what you expect. That's a leak.

So the question becomes, what do you do? So you've got, if you want to be a master of any abstraction, not just a user, but a master of the abstraction, you have to understand what's under the abstraction. Now that's the only way you can truly debug or truly understand an abstraction is to understand at least one layer beyond.

So when we think about abstraction, I want to expand what I mean, because many of you are probably thinking something like this, and I just like Googled computing abstraction, right? Where we have at the high level, you have like a high level language or an application, and then it goes through an assembly language program and the assembler turns it into machine code, and then it, like actually, then there's a bunch of hardware abstractions, and then an actual calculation gets done inside the hardware of the machine. That's how we often think about abstractions.

I want to expand that for the purpose of this conversation, because there's also organizational abstractions. So if you think of an abstraction as we've got some set of directives we're passing down, and we kind of don't care specifically how things get done in the next layer, we just want the next layer to do something, well that's not unlike organizational structures.

I come from corporate America, right? So I think, but this generally concept applies to your nonprofit or your software company or even your civic organization. At some level, we have a board of directors or some committee, and they pass down directives and priorities to the executive management. The executive management makes a bunch of choices and then passes things down to department heads who pass it to team leads, who pass it to team contributors, and if we continue to think about this, they then pass those on to computers in some way, right? Like they pass in because they use applications or they use high-level coding languages or something, and then all that other stuff from the slide before happens underneath this. It's abstractions all the way down, right? There's the title of my talk.

So why though, like why do we need these abstractions all the way down? Wouldn't it be easier if we just had, like I grew up on a farm, and the great thing about farming is you have to do everything because you don't have staff. The worst thing about farming is you have to do everything because you don't have staff, right?

So you become a master of every level of abstraction until you get in over your head and you have to have John Deere repair some piece of machinery or something, right? All the way up and down the levels of abstraction, you become at least proficient. It's been hard for me to get used to organizations that didn't expect me to run up and down the stack, right? Because I want to run up and down all the abstractions.

Well, the reason we can't always do that was really articulated back in the 50s by Herbert Simon. He's an economist, I'm an economist, so I've got to get economists in here. He had this article in 1957 called Administrative Behavior, A Study of Decision-Making Processes in Administrative Organizations, and he coined the phrase bounded rationality, which I TLDR as head trunk only hold so much junk, which Gary Lawson captured in this cartoon where it says, Mr. Osborne, may I be excused? My brain is full.

We can only handle so many levels of abstraction and so many pieces of the stack before our brain overflows, and we can't make sense of all the pieces. And so we build these interfaces, and even if we are the person traversing the levels of abstraction, we would like to interface with different pieces of them and not think about what happens below them, even if we wrote what's below them, because it means when we're problem-solving here, we don't have to think about how the read-write is happening on the database. That just magically happens behind an abstract interface, and we don't have to think about it, and it allows us to work at the problem-solving level that's appropriate for what we're trying to accomplish.

What abstractions are and are not for

So let's talk a little bit about what abstractions are for and what they are not for. So one thing I want to point out that they're not for is they're not for gatekeeping, and I see this being done a lot, right? You're not a real data scientist unless you, you know, PyTorch or deep learning or whatever. Those are all different abstractions, different tools, that are used in certain places to solve certain problems. Those may not be your problems. Then you don't need to know that abstraction. You don't need to be a master of that abstraction.

You don't need to be one layer below, and I watch a lot of early learners run around learning abstractions, learning tools, because they feel like if they don't know this tool, they're not a real whatever-it-is-they-think-they-want-to-be. That's really toxic, because you'll wear yourself out because this guy has already said you can't fit it all in your head in a really useful way. So don't let the learning of abstractions be like, oh, once I accumulate a big enough toolbox of these abstractions, then I'm a real whatever. That's just gatekeeping, and cut that out.

You don't even have to know all the abstractions you use deeply. But you do need to know your limits. So know which abstractions you really understand. Recognize when you're up against an abstraction that you don't grok, so you don't understand the abstraction. It's a breakpoint. You're like, I don't really understand what's going on beyond here. At that point, you have a choice. You can either learn that abstraction, learn what's really going on beyond it, so that you can deeply understand it, or you can partner with someone who's an expert there.

Partnership and pairing with someone else and working with someone else is always an option. It may be harder in some organizations than others, especially if you're the only person on the data science team. You may feel some pressure to learn those abstractions, and that may be the right choice. But if you're in a larger organization, and someone else in the organization is a master of that abstraction, you may not need to become the expert on database indexing.

Now, what I see happen a lot is people blame an abstraction for problems when they bump up against it. And often the problem is, they don't understand what the abstraction is doing. Now, that may be a leaky abstraction, but still, it's like, okay, my dashboard doesn't refresh fast enough. My database has a problem.

I had literally this one within the last year. And I worked with the Power BI developer, and we discovered that when using direct query in Power BI, which is how you access any database that isn't the one that's built into Power BI, it issues all the queries in series and will not issue them in parallel. So the dashboard had, you know, 13 queries that all took three seconds. They could have been run in parallel, and the whole thing refreshed in three seconds. Instead, it took 13 times 30, because it was running them in serial.

And the analyst had thought, oh, this database is crap. And I'm like, actually, Microsoft is shunting you into buying more cloud storage so you can shove your data into their platform instead of letting you use your own perfectly good database because they force your connection to issue queries in serial. And I'm like, that's broken, right? That's not a leaky abstraction by accident. That's a leaky abstraction by sales, right? And that should make anybody that runs into that one angry.

That's not a leaky abstraction by accident. That's a leaky abstraction by sales, right?

But that's an example of don't blame the abstraction. Understand the abstraction. Understand what's going on. And then decide what you want to do with that information.

So I would, if I did anything where I dabble with, like, computer sciencey concepts and I don't quote Edgar Dijkstra here, I would probably be remiss. So he has this kind of great quote that says, Programming, when stripped of all its circumstantial irrelevancies, boils down to no more and no less than very effective thinking so as to avoid unmastered complexity to very vigorous separations of your many different concerns. Right? And the TLDR is, constrain complexity and separate concerns. That's what we're trying to do with abstractions.

Think. Right? Think about what people actually need, and build what they actually need. And more importantly, don't build stuff they don't need.

Big idea recap

And that's hugely empathetic. So, my big idea recap is abstractions start way up with leaders and go all the way down to hardware. It's abstractions all the way up and down. To debug an abstraction, you have to see what's below it. We're building mental models of complex systems, and that's why we need abstractions, because we can't hold

It's Abstractions All the Way Down... - posit::conf(2023)

Transcript#

Abstractions and leaky abstractions

What abstractions are and are not for

Floating-point math: a leaky abstraction

Organizational silos as leaky abstractions

The 80-16-4 framework

Big idea recap