Exploring Datasets in Positron (Wes McKinney, Posit) | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

Okay, very good. Hi, I'm I'm Wes McKinney. I'm a software architect here at Posit and here to do a little bit of a deeper dive into the Data Explorer, which you've already seen featured and all of all my colleagues talks in this Positron , Positron session.

So I wanted to talk a little bit about how I got involved in the project, why I'm working on working developing the Data Explorer for for Positron and how we've designed it to have great performance scalability as well as integrating into your development workflow as a natural tool to make you more productive.

Most of you know me as the as the creator of the pandas project and I've spent a lot of the last 20 years staring at datasets in the console and in other environments. I was really excited when the Jupiter notebook used to be called the IPython notebook came out in 2000 2011 and for many years. I had a very console and notebook centric environment for working with datasets and developing these developing open source libraries that process and manipulate these these data sets in my book Python for data analysis is about teaching people how to use these tools.

But one of the first things that I did after five years of working on pandas was that I started a visual analytics company called data pad to help build visual environments for working with working with datasets. And so this has been a long a long time passion for me later, you know, we started the IBIS project to provide portability and adaptability between different backends so that we can build data expressions once and then run them in many different many different places and you just saw that featured prominently in in Austin's talk.

Pain points in data viewing

But when we think about the the data viewer component that's that's within ID is often we we run into a number of pain points. So so one one pain point is the tedium of jumping between coding and working with the data set and inspecting and inspecting the results. And so seeing like the part of the data set that you need like the the part that you're focused on maybe you're doing some data cleaning some data preparation and you want to see specifically like the part of the data set that you're trying to fix. We want to make that easier.

We've all seen the scale limitations of large data sets a data viewer breaking down. It works great when your data set is small. But when you have suddenly you have a data set wild data set with 50,000 columns and it completely grinds to a halt the data set has a billion rows as long as you have a backend like a you know, duck TV can handle a billion rows. No problem. So we should be able to interact with those types of data sets in in our in our data viewer.

So having having things become laggy or unresponsive is something we definitely don't want another problem, which is a little more subtle is is the low information density problem. And this is definitely associated with those terminal centric data inspection workflows where you're just not seeing that much about the data set in the terminal. So, you know, the human mind is able to process a lot of information. And so in building the the data Explorer for for Positron, we wanted to pack a lot of visual information to leverage the the power of human cognition to be able to recognize, you know, recognize outliers see see issues of the data set that you're that you're trying to fix.

So we think that human cognition is underrated and we want to to help augment, you know, your ability to see issues with your data or to find find areas that you want to look at more closely so that ultimately you can iterate faster in your data wrangling whether that's you writing code or working with the AI assistant positron assistant or another LLM to to write your code.

So we think that human cognition is underrated and we want to to help augment, you know, your ability to see issues with your data or to find find areas that you want to look at more closely so that ultimately you can iterate faster in your data wrangling whether that's you writing code or working with the AI assistant positron assistant or another LLM to to write your code.

Exploring Datasets in Positron (Wes McKinney, Posit) | posit::conf(2025)

Transcript#

Pain points in data viewing

Design principles and inspirations

Features of the data Explorer

Live demo

Closing remarks

Q&A

Featured software#

positron