Posit Conf 2025 Keynote Previews | Kieran Healy & Jonathan McPherson | Data Science Hangout

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Heron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I want to go ahead and introduce our featured leaders today because we have two and they are both keynote at Posit Conf, which is really exciting. So we have Kieran Healy, Professor of Sociology at Duke University, and we also have Jonathan McPherson, Software Architect at Posit. Say hi to everybody. I am going to ask Kieran to go ahead and introduce yourself first.

Hi everyone, and thanks for coming. There's a big crowd here. My name is Kieran Healy. As Libby said, I teach Sociology at Duke here in North Carolina. I'm originally from Ireland and I've been using, as I like to say, I've been using R since it was a different letter, since it was S. Back as a graduate student, I was introduced to it by one of my teachers and since then it's been sort of the main way that I do data analysis and teach it and draw pictures with it. And kind of over the years, having just being a user of it, I've become somebody who also writes about it and tries to teach people how to use it specifically with data visualization.

Anything for fun? Being Irish, I'm just naturally averse to those kinds of questions. I restore old computers. How about that? I build those things that you can see in the background. I rehabilitate in my basement vintage computers and then occasionally threaten my students with having them run their analyses on them whenever they complain about running out of memory or something like that.

Hi, I'm Jonathan. I'm coming at you live from the Seattle area of Washington, where it is still morning. I still need coffee. I'm a software engineer and architect at Posit. I'm not a data scientist, to be clear, but I do make tools for data scientists. So I was fortunate enough to be one of the early employees at RStudio , now Posit, and have worked for most of my time here on the RStudio IDE. If you have not used RStudio, it is one of the first and, in my opinion, still one of the best IDEs for the R language.

A couple of years ago, I switched gears and helped create a new IDE called Positron , which kind of takes a lot of the ideas from RStudio and develops them into a bigger, more ambitious multi-language IDE. So I'd love to take questions about RStudio, Positron, data science tools, software engineering and architecture, and so on.

For fun, I am a dad. I have three kids, so I spend a lot of time with them. I'm into music. I play the piano. I read a lot of books. I go through at least a few books a month. Fiction, nonfiction, everything in between. And I also ride my bicycle a lot, so that's me.

Keynote previews

My talk will be about trustworthy data visualization. I'm interested in data visualization because it's such a powerful way to condense and present information, often tremendous amounts of information, into sort of a compelling and seemingly authoritative, often beautiful artifact that travels really well, that you can show to people and convince people of things with. And now, in a way, all of statistics, all of the things that we produce with data science models and tables and all the rest of it is like that, too. But visualizations are especially like that, I think, because they strive to be accessible and seem immediately interpretable.

And as I say, the whole kind of language around data visualization from sort of old stuff like lying with graphs all the way through to more contemporary phrases like visual storytelling and things, I'm just a little bit suspicious of that. We want our data visualizations to be compelling and we want them to be convincing. But we also want them to be trustworthy or we should want that. And so we can ask, how can we do that? How can we make trustworthy data visualizations? And where does our trust in them come from? How is it sustained?

And I think what I want to argue in the talk is that what we find is that the answer is not really the techniques used in the visualizations themselves or even in the software used to make them, even though that's made such tremendous strides over the last 20 or even five or six years. And instead, the trustworthiness that we want depends on a kind of web of social processes and individual commitments that can't easily be automated and that we inevitably rely on. And if those processes break down or those commitments are weak, then it won't matter how good looking our graphs are or how compelling the stories we seem to tell with them can be.

And instead, the trustworthiness that we want depends on a kind of web of social processes and individual commitments that can't easily be automated and that we inevitably rely on.

So my talk is going to be a little bit of history and a little bit of today and then a little bit of tomorrow. So like I mentioned in my intro a few minutes ago, I worked at RStudio. I've been at Posit for quite some time, and so I'm going to reflect a little bit on a decade of work on RStudio, kind of the principles that have led it to become the standard data science environment for R. I'm going to talk a bit about how those same principles developed the basis for Positron. I'll talk about Positron for quite a bit, and then I'll probably spend a little bit of time kind of looking to the future and talking about some of the ideas we have for the future of data science tooling.

The really cool thing about using them in an IDE like Positron is that they have access not only to the state of your project in terms of code, but also the state of your data as an in memory object that can be manipulated with the result that you can get some like really like really quick robust analysis going by kind of describing what you want in natural language and kind of working with the AI kind of hand in hand to arrive at a result.

So second part of the question is about making an IDE that works well for both power users and beginners and I'm going to say that there's actually you know, this is actually very hard to do. And at the end of the day, we basically we kind of compromise, you know, that you can't make an IDE that's perfect for power users and beginners. So the best thing that we can do is to try to like keep the complexity in a place where you know, it doesn't show up unless you need it. In these systems, there's usually what I call like a conservation of complexity, which is to say that like the amount of complexity is constant. It's sort of like the amount of energy in the universe.

So if you look at the design of an IDE like Positron, you'll find that it lands somewhere in the middle. We've added a lot more affordances for things that beginners are accustomed to while still like preserving like the full power of the command palette of like a rich set of commands for advanced users that maybe don't necessarily have all the buttons that beginners are drawn to.