Resources

Posit Conf 2025 Keynote Previews | Kieran Healy & Jonathan McPherson | Data Science Hangout

To join future data science hangouts, add it to your calendar here: https://pos.it/dsh - All are welcome! We'd love to see you! Thursdays at 12PM US Eastern We were recently joined by upcoming Posit Conf 2025 keynote speakers Kieran Healy, Professor of Sociology at Duke University, and Jonathan McPherson, Software Architect at Posit PBC, to chat about how and why open-source IDEs like RStudio and Positron get made, how to do data visualization for discovery and explanation, what their keynotes are going to be about, and what’s next for Posit’s IDE development, including AI integration. In this Hangout, Kieran talked about the trustworthy data visualization. He highlighted that while data visualization is a powerful way to condense and present information, often creating compelling and authoritative artifacts, phrases like "visual storytelling" can be problematic if they encourage presenting a predetermined narrative not fully supported by data. He emphasized that the trustworthiness of visualizations does not come solely from the techniques used or the software, but from a "web of social processes and individual commitments" that cannot be easily automated. Jonathan talked about the future of Positron and its relationship with RStudio, addressing whether Positron is intended to replace RStudio. He clarified that the long-term goal for Positron is to make it the best Integrated Development Environment (IDE) for working with data in any language. He explained that Positron is built with an extensibility layer, allowing anyone to write plugins for new languages or capabilities, making it a robust and evolving data science workbench. It does not have all of RStudio's features and makes different design trade-offs. RStudio, having evolved over decades, is highly optimized for specific R-based workflows and remains the best at what it does for those use cases. Resources mentioned in the video and zoom chat: Posit Conference 2025 Registration β†’ https://posit.co/conference/ Kieran Healy's Website β†’ https://kieranhealy.org Kieran Healy's book "The Ordinal Society" β†’ https://theordinalsociety.com/ Kieran Healy's book "Data Visualization: A Practical Introduction" β†’ https://socviz.co/ Jonathan McPherson's LinkedIn β†’ https://www.linkedin.com/in/jonathanmcpherson Joe Cheng’s AI Talk on Harnessing LLMs for Data Analysis β†’ https://youtu.be/owDd1CJ17uQ?feature=shared TidyTuesday GitHub β†’ https://github.com/rfordatascience/tidytuesday Positron IDE β†’ https://positron.posit.co/ Will R Chase's talk on making clear plots β†’ https://www.youtube.com/watch?v=h5cTacaWE6I If you didn’t join live, one great discussion you missed from the zoom chat was about the ongoing debate and practical tips for moving from presenting tables of numbers to visualizations. Community members shared various strategies, including using color-mapped tables as an intermediate step, providing both tables and visuals, and ensuring accessibility and interpretability for diverse audiences. Are you team tables or team graphs? β–Ί Subscribe to Our Channel Here: https://bit.ly/2TzgcOu Follow Us Here: Website: https://www.posit.co Hangout: https://pos.it/dsh LinkedIn: https://www.linkedin.com/company/posit-software Bluesky: https://bsky.app/profile/posit.co Thanks for hanging out with us!

Jun 12, 2025
54 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Hey there, welcome to the Posit Data Science Hangout. I'm Libby Heron, and this is a recording of our weekly community call that happens every Thursday at 12pm US Eastern Time. If you are not joining us live, you miss out on the amazing chat that's going on. So find the link in the description where you can add our call to your calendar and come hang out with the most supportive, friendly, and funny data community you'll ever experience.

I want to go ahead and introduce our featured leaders today because we have two and they are both keynote at Posit Conf, which is really exciting. So we have Kieran Healy, Professor of Sociology at Duke University, and we also have Jonathan McPherson, Software Architect at Posit. Say hi to everybody. I am going to ask Kieran to go ahead and introduce yourself first.

Hi everyone, and thanks for coming. There's a big crowd here. My name is Kieran Healy. As Libby said, I teach Sociology at Duke here in North Carolina. I'm originally from Ireland and I've been using, as I like to say, I've been using R since it was a different letter, since it was S. Back as a graduate student, I was introduced to it by one of my teachers and since then it's been sort of the main way that I do data analysis and teach it and draw pictures with it. And kind of over the years, having just being a user of it, I've become somebody who also writes about it and tries to teach people how to use it specifically with data visualization.

Anything for fun? Being Irish, I'm just naturally averse to those kinds of questions. I restore old computers. How about that? I build those things that you can see in the background. I rehabilitate in my basement vintage computers and then occasionally threaten my students with having them run their analyses on them whenever they complain about running out of memory or something like that.

Hi, I'm Jonathan. I'm coming at you live from the Seattle area of Washington, where it is still morning. I still need coffee. I'm a software engineer and architect at Posit. I'm not a data scientist, to be clear, but I do make tools for data scientists. So I was fortunate enough to be one of the early employees at RStudio, now Posit, and have worked for most of my time here on the RStudio IDE. If you have not used RStudio, it is one of the first and, in my opinion, still one of the best IDEs for the R language.

A couple of years ago, I switched gears and helped create a new IDE called Positron, which kind of takes a lot of the ideas from RStudio and develops them into a bigger, more ambitious multi-language IDE. So I'd love to take questions about RStudio, Positron, data science tools, software engineering and architecture, and so on.

For fun, I am a dad. I have three kids, so I spend a lot of time with them. I'm into music. I play the piano. I read a lot of books. I go through at least a few books a month. Fiction, nonfiction, everything in between. And I also ride my bicycle a lot, so that's me.

Keynote previews

My talk will be about trustworthy data visualization. I'm interested in data visualization because it's such a powerful way to condense and present information, often tremendous amounts of information, into sort of a compelling and seemingly authoritative, often beautiful artifact that travels really well, that you can show to people and convince people of things with. And now, in a way, all of statistics, all of the things that we produce with data science models and tables and all the rest of it is like that, too. But visualizations are especially like that, I think, because they strive to be accessible and seem immediately interpretable.

And as I say, the whole kind of language around data visualization from sort of old stuff like lying with graphs all the way through to more contemporary phrases like visual storytelling and things, I'm just a little bit suspicious of that. We want our data visualizations to be compelling and we want them to be convincing. But we also want them to be trustworthy or we should want that. And so we can ask, how can we do that? How can we make trustworthy data visualizations? And where does our trust in them come from? How is it sustained?

And I think what I want to argue in the talk is that what we find is that the answer is not really the techniques used in the visualizations themselves or even in the software used to make them, even though that's made such tremendous strides over the last 20 or even five or six years. And instead, the trustworthiness that we want depends on a kind of web of social processes and individual commitments that can't easily be automated and that we inevitably rely on. And if those processes break down or those commitments are weak, then it won't matter how good looking our graphs are or how compelling the stories we seem to tell with them can be.

And instead, the trustworthiness that we want depends on a kind of web of social processes and individual commitments that can't easily be automated and that we inevitably rely on.

So my talk is going to be a little bit of history and a little bit of today and then a little bit of tomorrow. So like I mentioned in my intro a few minutes ago, I worked at RStudio. I've been at Posit for quite some time, and so I'm going to reflect a little bit on a decade of work on RStudio, kind of the principles that have led it to become the standard data science environment for R. I'm going to talk a bit about how those same principles developed the basis for Positron. I'll talk about Positron for quite a bit, and then I'll probably spend a little bit of time kind of looking to the future and talking about some of the ideas we have for the future of data science tooling.

What a software architect does

So I have worked on Positron since really day one. And so what an architect does in my role is quite a lot of coding and also quite a lot of fitting pieces together. So a system as complicated as an IDE has a whole bunch of subsystems. So if you think about the pieces that compose a complicated data science IDE like Positron, you have a user interface that displays the console and the graphs and charts and data grids and your variables viewer and all that stuff. You have language execution engines like R and Python that are actually performing the computations and then emitting results.

And so an architect's job in many cases is to simply design the way that all those processes work together to form a cohesive whole. So in the early days of Positron, I spent quite a lot of time just kind of slotting pieces in and out to figure out how to build a solid, reliable, robust system that does all the things we need it to do. And as the product has matured, my job is to be spending a lot more time actually writing a lot of middleware and glue code that keeps everything talking to each other, as well as writing a bunch of stuff that does individual features on the front end and the back end.

The evolution of R and the Tidyverse

One of the things that changed is that you used to be able to use the underscore character as the assignment operator, which made for really terrible readability for code, and thankfully that went away. I think that it's in the character of R as a language to be constantly trying out new things within a sort of more or less structured overall community of the core developers and the wider world. And so in effect, what you have with R always is this kind of like these stratigraphic layers of previous efforts to develop aspects of the language, not all of which hang together very well.

And so that was the main reason, I guess, that sort of Tidyverse and the whole place that Posit eventually can be traced back to, that was a real revolution in the usability of R for kind of people coming to data science broadly construed or data analysis, statistical data analysis for the first time. Not that there weren't sort of strictly speaking, there weren't that many things that you literally couldn't do before Tidyverse tools came along. But the harmonization that it brought really was a big step forward and sort of helped streamline things a little bit with all of the usual kind of trade-offs and caveats that come with that kind of development.

And then the other big development, I think, going back to what Jonathan was talking about in his role, has kind of been in many ways the tooling, right, whether it's IDEs specifically, but then also kind of the ease with which R can talk to other aspects of general computing has really improved a great deal. And we'd like to see that sort of continue both in terms of its sort of flexibility while maintaining its core strengths, which have always been sort of it's the language that's built for data analysis at its core and builds into it kind of abstractions like the humble data frame that a lot of other languages kind of have to work a lot harder to integrate into how they work.

Data science and software engineering cross-pollination

So I'll kind of answer in three parts here. I think that there has been limited flow in the other direction, probably less than you might hope. But there are a variety of things that are valuable when doing data science that have leaked back in the software engineering world, especially for people who work on software that does data science. But I will say that like having been part of both communities, like a lot of the back flow that I'm seeing is around like number one around iteration.

Right. So like when you develop software, like it is a very iterative process, right? Like almost nobody, even in the age of LLMs, can one shot an application. And so one thing you notice when you watch data scientists work is how iterative that work is, like especially when they're doing exploratory data analysis, like that process of like iterating, like testing, like there's a loop here, right, that you're doing as you iterate on and explore your data. The second is in reproducibility. Right. So one of the things that matters a lot to data scientists is making sure that the work they do is fully reproducible. Like this is one of the original goals of doing code first data science is that it generates an artifact, you know, that anybody can use to verify your analysis.

And then the third one is simply that, like, as we make software for data science, you know, that software is informed by the data science community and is built differently than people who are writing software for software engineers. These two communities have some similar needs, but also some that are quite different. So, yeah, not a whole lot of crossover, but a lot of co-evolution and certainly some ideas have leaked over.

Moving from tables to visualizations

There's a lot of habit makes a big difference to how organizations work, and established practice is very hard to shift, even for good reasons. And I will say that, like, you know, data visualization as a technique is great. And I think in a world where we're giving sort of talks with slides all the time, it's much more effective in conveying what it is that you want to show than presenting a slide full of numbers, for example, can be.

One is that because if people have been doing it for a long time, it doesn't just mean that they're entrenched. It also means that their skills are oriented towards interpreting that artifact like a table, that thing. I remember very, very strongly one of my kind of early experiences being trained just as a social scientist was encountering people, advisors and teachers who could just look at a table like of models or statistical data or summaries and see things kind of immediately in the table because they've been looking at them for years.

One of the things that I emphasize a lot, and it's also something that people don't grasp all that straightforwardly when they're learning data visualization, is that an awful lot of visualization that we do, in effect, is structured the way that a table is structured. And one of the simplest things you can do that's extremely underrated and very effective for visualizations is to try and turn your visualization into something more like a table that people can read. Often, that's as simple as put the categorical stuff on the Y axis and not the X axis, so people don't have to turn their heads. And instead, put the continuous quantity on the X axis and put the categorical quantity on the Y axis. And then suddenly it's just a table again, except that now there are dots or lengths which are easy to grasp.

And so the I think the similarities between tabular layouts and graphs are easy to underestimate and that one way to get people who are used to looking at tables of numbers to over to the visualization stuff is to help them realize that effective visualizations are often just tables of visual quantities rather than just tables of numbers and are easier to read for that reason.

Positron's long-term goals

So our long term goals for Positron is to really make it the best IDE for working with data in any language. So Positron is built in such a way that the R and Python subsystems that we added to Positron are just extensions. We built it so that anybody can write a plug in for like a new language or new capabilities to Positron. So like Positron as a core is going to have the really like first class, like hopefully best in the world at being like a data science workbench that will be extensible to new languages and new capabilities over time and continue to evolve as the industry does.

So for the second part of that question, is this eventually going to replace RStudio? The answer is probably not. So the first is that Positron is not and will never be a superset of RStudio. So it does not have all of RStudio's features. It won't ever have all the features that are currently in RStudio. And it makes a different set of tradeoffs.

RStudio is very, very fit. It's evolved almost like a river rock, just smoothed out by decades of work to be very, very good at specific R-based workflows. And honestly, even as someone who has worked on Positron for years now, I don't think Positron will ever be as good as RStudio for those specific use cases, simply again, because you can't take something so generic and make it work as well as something that is fit for a more specific purpose.

So we think if you work for R and Python together or just Python, you'll be a lot happier in Positron because it's so much better at multi-language stuff. It has much deeper IDE capabilities because it's built on VS Code. And also, we are continuing to invest in RStudio with a lot of the ideas that we have germinated in Positron. So if, for example, if you use a really recent build of RStudio, you'll notice that it has much better treatment for errors and warnings and messages in the console, which is basically derived and pioneered from work that we did in Positron.

One of those tools, for example, is called AIR. It's an incredibly fast code formatter and server for the R language. And it's native in Positron, but also you can now hook it up to RStudio in a recent release to have it format your R code beautifully every time you save it. So we're seeing a lot of cross-pollination between the IDEs, which we think is great.

Starting with answers vs. letting data speak

I think the thing you identify is a real issue for data analysis generally. I think it's kind of a chronic issue in the sense that there's no sort of sin you can commit with graphs or with data visualizations that you can't also commit with tables or models if sinning is what you want to do.

One is that the most general one is that when we're looking at data, we're looking for patterns and humans are pattern recognizing creatures. We're very good at finding patterns and we're often motivated in data visualization to find patterns. We want them to be there whether that's because we're an academic wanting to write a paper that has an interesting finding or that you're in industry and you're looking for something that you want to sort of pitch or explain. And half of the reason that statistical techniques exist is to sort of slow your role a little bit and give you criteria that are not just kind of I think this looks like it's real.

That's why when I said at the beginning that I was a little suspicious or it tended to make me a little bit queasy that phrases like storytelling with data kind of bring me out in hives just a little bit is because they tend if you misunderstand the intent behind them, they can encourage you to just kind of well, I've got a yarn I want to tell and I'm just going to look for the things that help me tell that yarn in a compelling way.

And the second thing I would say is that one of the things both again both in data visualization and I think in the whole business of the whole world of modeling data statistically on the whole that we see again and again is out in the world and in classrooms with students and wherever else is that there's a again understandable but real tendency for people to say to themselves to begin by saying like what's the most complicated thing I know how to do and start there. And very often the advice that I end up giving people a lot of the time, people sort of want to resist it but because they say look I want to do the fancy thing I want to do the hard thing why are you telling me just look at the simplest thing first and then see where I should go next.

AI integration and balancing power users with beginners

So I'll talk first about what we're most excited about and I almost hate to say this because everyone's excited about this right now but we are excited about AI integration quite a lot. One of the things I've been working on more than anything in the past couple of months is integrating an AI assistant into Positron so we have not really shared or demo'd much about that yet but you'll be hearing quite a lot about it in the months to come. There are so many places in an IDE where an AI can provide like really good feedback if you've ever used like you know GitHub Copilot or Cursor or like Windsurf in these IDEs.

The really cool thing about using them in an IDE like Positron is that they have access not only to the state of your project in terms of code, but also the state of your data as an in memory object that can be manipulated with the result that you can get some like really like really quick robust analysis going by kind of describing what you want in natural language and kind of working with the AI kind of hand in hand to arrive at a result.

The really cool thing about using them in an IDE like Positron is that they have access not only to the state of your project in terms of code, but also the state of your data as an in memory object that can be manipulated with the result that you can get some like really like really quick robust analysis going by kind of describing what you want in natural language and kind of working with the AI kind of hand in hand to arrive at a result.

So second part of the question is about making an IDE that works well for both power users and beginners and I'm going to say that there's actually you know, this is actually very hard to do. And at the end of the day, we basically we kind of compromise, you know, that you can't make an IDE that's perfect for power users and beginners. So the best thing that we can do is to try to like keep the complexity in a place where you know, it doesn't show up unless you need it. In these systems, there's usually what I call like a conservation of complexity, which is to say that like the amount of complexity is constant. It's sort of like the amount of energy in the universe.

So if you look at the design of an IDE like Positron, you'll find that it lands somewhere in the middle. We've added a lot more affordances for things that beginners are accustomed to while still like preserving like the full power of the command palette of like a rich set of commands for advanced users that maybe don't necessarily have all the buttons that beginners are drawn to.

Starting with visualization to teach data analysis

Yeah, I do. Yeah, I'm in the middle of writing a second edition of the book that's trying to sort of do the same thing again. There's, if you're learning how to do this stuff with data science broadly construed, then you have to learn how to do statistics. You have to learn the idea of representing stuff for public consumption in tables or graphs. You have to learn a computer language, programming language to do that in and you have to learn some sort of IDE to do it with and that's just tremendously overwhelming for newcomers.

The flip side to what Libby was saying about how everything is hard when you're bad at it is for experts, it's what sort of the psychologist called the curse of knowledge, right? That once you know something, it's very hard to imagine not knowing it. Data visualization is really handy as a way in because it can help you get past this feeling that you have to know everything before you can do anything. And so I really do feel like the combination, in R specifically too, the combination of pipeline approaches to coding, first this and then and then and then, which leads naturally to sort of thinking in terms of functional programming, but you don't know that that's what it is first, combined with the idea of kind of doing a series of minor small analyses that emit a graph you can see immediately and understand and interpret is a really powerful way to get people up the steepest part of the learning curve.

I think for that reason, data visualization is a very handy way to introduce people, not just to, you know, the business of data analysis, but to the whole ecosystem that's required now to do, you know, the whole enchilada of IDEs, code, software, and the file system and all the rest of it.

Lightning round: career advice

For career advice specifically, I think, you know, just being humble and curious is the best advice I ever got, you know, that's it's very simple, but like being humble and curious and just like nice to people that you work with, it sounds so simple, but it's very powerful.

I would just say that an elaboration of the curse of knowledge thing that I just said, once you know something, it's real hard to imagine not knowing it. So don't forget that people who don't know things don't know it yet. And when you're trying to explain something to somebody or learn it yourself.

Thank you so much to Kieran and Jonathan. This was so much fun having you both. I cannot wait to hear your keynote talks at Posit Conf and everybody. I'm so excited to see you today. Thank you for being in the chat.