Wes McKinney & Hadley Wickham (on cross-language collaboration, Positron, career beginnings, & more)

Transcript#

This transcript was generated automatically and may contain errors.

Hi all. Thanks so much for taking the time to join us today. If we haven't had the chance to meet before, I'm Rachel. I lead customer marketing at Posit and I also co-host our weekly Data Science Hangout. I'm joined by my other co-host here behind the scenes, Libby.

But I know many of you are already Posit customers, but just in case Posit is new to you today, Posit builds enterprise solutions and open source tools for people who do data science with R and Python. We are also the company formally called RStudio , which is why my mug is the RStudio one, not the Posit one today.

But I'm so excited to have you here for this special event today. So I'm joined by Wes McKinney, entrepreneur and open source software developer, focusing on data science tools and analytical computing. Wes is co-creator of Pandas, Apache Arrow and Ibis projects and currently principal architect at Posit. And I'm also joined by Hadley Wickham , chief scientist at Posit. Hadley builds tools to make data science easier, faster and more fun. His work includes packages for data science like the tidyverse , which includes ggplot2 and dplyr.

So for today's session, this is going to be a casual chat for us all to ask questions and exchange insights with each other about cross language collaboration. But we at Posit are also really excited to learn from you and to better understand how your teams work with both our free and open source projects, but also our professional products like Posit Workbench and Posit Connect.

So I do encourage you all to connect with each other here too in the chat. If you've been on our data science hangouts before, this is one of my favorite things, people getting to know each other through the chat. So if you want to briefly introduce yourself and say hi, maybe include your role or your base, something you do for fun. Feel free to share LinkedIn as well. That's your chat is yours to share resources with each other.

There were so many great questions submitted ahead of time, which I can definitely start with. However, today you can jump into and ask questions that come up live. And if you want to share your own experience, so you could put questions in that chat. If it's anything you want me to read out loud instead, you could put a little asterisk next to it. But we do also have a Slido link where you can ask questions anonymously too. And if I end up missing something, feel free to raise your hand on Zoom and we can call you to jump in there too.

Real quickly, we did keep this to a smaller group today, so I will be sharing the recording more broadly. I just want to make sure I let everybody know that. But I can also send the recording to you all later this week as well. But thank you again so much for being here today. And let's jump in. I know Wes and Hadley, I just briefly introduced yourself, but if you want to say hello here first too. And maybe also let us know something you do for fun outside of work.

Introductions

OK, Hadley, go first. I'll let Hadley go first. OK. Hi, I'm Hadley. As Reg just said, I'm a chief scientist. I make lots of art packages. And outside of work lately, I've been doing crochet.

I was just thinking of buying one of those the other day, those kits. Yeah, I got onto it from Woobles, which make it super easy to get up and go. And I highly recommend it for a Christmas prism.

And the fun, cool Woobles connection is that the people who started Woobles were actually one of the ex-data scientists from Google who actually used our entire universe in the previous slide. So I thought that was pretty awesome. That is, I had no idea.

All right, Wes. Yeah, so I'm Wes. Most of the time I live in Nashville, Tennessee. Over the last year at Posit, I've been mostly, I would say, mostly working on Positron , which is a new polyglot data science IDE that's in public beta. Pretty excited about where that's going.

Rejoined Posit about a year ago, but we've been collaborating actively on Arrow and making polyglot data science projects work better, making Python and R play nicely together since, gosh, it must be going back to like 2015 or 2016. So coming up right up on a decade. Been a pretty long collaboration with the open source tools, and it's great to be able to work together on a day-to-day basis in the company. I guess we released Feather in like 2016.

And the fun thing outside of work. I'm a big, you know, big on cooking and cocktails. I know Hadley's also really into making cocktails, maybe more than me. But during the wintertime, I enjoy doing like kind of large braises, you know, putting something in the oven to cook for five or six hours, and that makes for a really fun dinner party.

Getting into open source

So, let me jump in with some of the questions people submitted ahead of time, and we could start early days here. So the first one is, how did you both get into open source? Wes, do you want to go first?

Yeah. For me, it was a bit stumbling into it because I started doing Python programming back in like 2007, and I was working inside a hedge fund where all the code was very secret and there was no open source. And even using open source was a little bit, you know, was a little bit dicey. You had to be really careful about, you know, what code you pull in and everything was really scrutinized.

But I learned about, you know, the scientific Python community and started looking into these different projects and how they became open source projects and what was an open source project. And at a certain point, this was like the middle of 2009, I decided that I really wanted to open source, you know, what was then very early version of Pandas as an open source project. And so that led me to, I finally got permission to do that. And then I went to my first PyCon in 2010. And that was like my first foray in the Python community.

A lot of the folks in the scientific Python community, like NumPy, SciPy, people who go to the SciPy conference, like they, you know, I met them and then they mentored me in like how to do open source, how to build open source communities, and just became a bit of a, you know, a bit of an addiction, I guess, after that. And, you know, really enjoy working in public and building tools that are freely available on the internet and have a lot of impact.

And Hadley, what about you? What, why data science for you? Yeah, so I, in like high school, I really enjoyed both like programming and statistics. And so I ended up doing a double major in computer science and statistics, which at the time, seemed like a weird combination to a lot of people. But it's now obviously what we call data science.

So I did that at the University of Auckland, which was the home of R. So as far as I can tell, I actually started using R in 2003. And I had to look up and that was R version 1.6, which is a little horrifying to me that I've been using R for like 21 years now.

So that, so R was open source and that, I mean, that just kind of felt natural to me to like try and, you know, develop R packages and release them in open source. And just like such a great way to have like an impact on the world. And, you know, my original career track was like more, more academia thinking I would be a professor somewhere one day. And it just seemed to me like open source was just such a great way to get your ideas in the world and not just provide like a text description of what you're working on, but to provide code that people could actually use to implement your work.

open source was just such a great way to get your ideas in the world and not just provide like a text description of what you're working on, but to provide code that people could actually use to implement your work.

the trend that's really interesting to me is the kind of the unbundling of, like, data storage and compute. But now, like, you have a directory of Parquet files, you can use that with DuckDB, you can use that with Arrow, you can use it with Polars, you can use it with Athena. And just that idea that, like, you can compute on your data with lots of different engines, I think, is really powerful.