Emily Riederer: Column selectors, data quality, and learning in public
Emily Riederer writes Python with an R accent, and we’re all comfortable with it. In this episode, Emily reflects on her journey through R, Python, and SQL — from lessons learned in averaging default values (oops, we're not all rich!) to discovering that column selectors are way cooler than they sound. She weighs in on the delicate art of learning in public, why frustration often makes the best teacher, and how to find your niche by solving the boring problems. Oh, Oh, and the crew casually drops that she's keynoting posit::conf(2026)! Emily’s had a wild ride through modeling, data engineering, machine learning, and back again, and she knows a thing or three about the evolution of SQL tooling (from nightmare multi-page scripts to the dbt renaissance). She reveals how building internal packages became her gateway to making work enjoyable. Plus: the surprising Stata origins of column selectors, the eternal struggle of naming packages across R and Python, and why watching people code teaches you more than any tutorial ever could. The conversation gets real about imposter syndrome and the magic of tacit knowledge. IN THIS EPISODE • Why real-world data is chaos, not truth • The path from modeling to data engineering (and back) • What a data pipeline really is (extract, load, transform) and why organization matters • How dbt changed the SQL game • Learning by watching: Tacit knowledge and coding over the shoulder • Imposter syndrome and learning in public • Building internal tools to escape busywork • posit::conf(2026) keynote preview
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome to the test set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning. Digging into what makes them tick, plus the insights, experiments, and OMG moments that shape the field.
On this episode, we're joined by Emily Riederer, who I think has the distinction of living in that sweet, sweet overlap between Python, SQL, and R harder than anyone else I've ever seen, and is a data science manager at Capital One, and apparently listens to the test set, but is not planning to listen to this episode. Welcome to the test set. I'm Michael Chow, and I'm joined by my co-hosts, Wes McKinney, who's a principal architect at Posit, and Hadley Wickham, who's a chief scientist at Posit. And I'm so excited to be here with Emily Riederer, who is a data science manager at Capital One, and I think a sort of like icon in the R, Python, SQL community for just putting out so much interesting work in this intersection of Python, SQL, and R.
So like, recently the article Python Ergonomics, and some talks around that, of how to have a workflow that R users love in Python with things like polars, and dbtplyr, which was a plugin for the SQL framework dbt, which was a really interesting sort of cross-section of ideas. So Emily, thanks for coming on. So happy to have you.
Oh yeah, thank you so much for having me. I've loved the pod so much so far. I have not missed a single episode. Yo, I'm so glad. Honestly, the biggest downside of being on a podcast is I was thinking about it, and I'm like, oh, the next one that comes out, I'm not going to want to listen to. Yeah, are you going to listen to your own? I feel like, oh my goodness, you can't break the streak, you know? Never, never, never, never.
Emily's journey into SQL
So I think people will be really interested to hear sort of your journey through languages like R, Python, SQL, and the things you've put out. But one interesting question I thought might be good to open up with is maybe just on the role of SQL kind of over the years. Since I know you started doing a lot of R work, and folks here have done like a lot of work in SQL, whether it's like dbtplyr, which translates like dplyr and R to SQL, or ibis, which translates Python to SQL. Emily, I'm really curious to hear like, what was your journey into SQL like?
Very abrupt in some ways, because I think if I think about my educational background, I was in a relatively like theoretical stats program, so could prove like asymptotic convergence of a lot of things, had very little experience with real world data, until I get one of my first internships, I'm asked to make a customer profile, I take the average of customer incomes, and I'm like, we're all rich, but we were not rich, I was just averaging in all the 9999 encoded default values. So that was my first introduction, both to databases, to SQL, and just like the vagaries of real world data. And I think that had been like, was also kind of jarring, because I was under this whole illusion of like, oh, data is this like, ground truth, where we can like figure things out about the world. And then I entered into this world where like, data is this like, kind of source of chaos that you need to control.
And then I entered into this world where like, data is this like, kind of source of chaos that you need to control.
But then kind of like, kind of coming into industry, I think SQL was one of like, the main tools for the trade, the only way to kind of like, access your data before you could get in there with something like R or Python. And pulling them from just like kind of a single brush I'd had in college with kind of database design and data modeling and verbal forms, I think just something about that, combined with the tidyverse, just so kind of like, clicked in my mind of like, there's actually a real art to how you like, set this thing up, how you like, get the data out, that can really just like, set you up for success.
Yeah, it feels so real that like, moving into a business, like going to work, and hitting like SQL, and realizing that there's like a process to get the data out. I mean, I will say I've been in an active like, I think calling it a war against SQL is maybe putting a little bit strongly, but I've definitely engaged heavily, and I think as has Hadley, in building tools so that humans write a lot less SQL. And I've often felt that SQL is as a language, it's clearly not going anywhere, it's the lingua franca of databases. But it's also a little bit like assembly code or Fortran code, it doesn't have a lot of the modern niceties of real, like a real programming language, you know, like functions, like reusable code, like code that you can refactor and reuse.
And so, like early on in my career, I was a bit scarred by hundreds and thousands of lines of copy and pasted SQL queries that, you know, there's no tests, and so, you know, you're dealing with these like highly brittle, very complex, you know, hundreds of lines, many pages long, 10 page long, SQL queries. And, you know, the reality is like SQL is really alluring in the sense that like, it's declarative, you know, writing simple SQL queries is easy, but complex business logic, especially in financial setting ends up being rather subtle and with a lot of complexities. And so, I found myself like making the same kinds of mistakes over and over again and seeing other people make mistakes. And so, I felt like if only, you know, if we could essentially abstract away the unpleasantness of SQL and make it easier for humans to write, essentially author SQL indirectly and to avoid many of those common errors, then that would be, we would be doing humanity a great service.
I guess I actually started with databases, like I did SQL before R. So, my dad, a lot of his work involved databases. So, we had like, you know, we had like dinner table conversations about like relational data and COD's third normal form. And when I was in high school, I guess starting from like age, I don't know, like 15, I like made Access databases as like my part-time job. I did some like database documentation as a part-time job, which is kind of like crazy looking back at it now. Like I was, I mean, that's really how I learned to program was in like Visual Basic for applications. Like that was my first real exposure to programming, real exposure to SQL.
And kind of interestingly, like I've been working on dbplyr lately, which translates dplyr code into SQL. And people, like dbplyr has an access backend and people file issues that it doesn't work. And the fact that like, that is mind-blowing to me that even in like 2025, people are writing R code to connect to a Microsoft Access database and like work with that data. That, I don't know, that's just kind of like mind-blowing, mind-blowing to me.
Becoming a data scientist
I'm so curious, like, so your, Emily, like your journey into data science and how you encountered a lot of these things. Could you, do you mind explaining a little bit just how you became a data scientist? Like what did that journey look like for you? I mean, in some ways, like I think I have like a very boring or traditional data science background in some ways, but in like, I think there's a different read where it's like kind of funny because at every step of the way, I knew what I wanted to do. I probably didn't know the right reason why I wanted to do it. So kind of started out like in college or in high school, took my first stats class, just like had been a math, like kind of math kid figuring out what can I do with math? And just had this like idea of this being like this amazing truth-seeking like applied way to like do math in the real world, I think.
I mean, definitely going in, I didn't actually know probably like so much of the things that were true about it. The data science is like so much more of an art than a science, that it requires so much more engineering skills and you'll like never once again in your life feel the certainty of math, which is probably the thing I liked about math in the first place.
In my time at Capital One, my current employer, I've really worn like three very different hats, which also like maybe kind of mirrors some of the different tools in the data space. Started out working a lot more on problems of like measurement, causal inference, understanding, you know, the values of different like levers you could pull and customer lifetime values. And that's like a lot more exploratory type work, you know, a lot of more like visualization and modeling and just like being a lot more intimate with kind of like both the data and the business.
Then kind of like took probably a sharp right turn to move upstream and spent like a number of years in more of the tools and data stack. So I'm thinking about building out data pipelines or in Python tooling for kind of a broader community to use. And really just like spent a lot more time thinking about how does good coding practices, how does automation, how does good engineering really enable better analytics before moving back into the like core traditional like machine learning modeling type space.
Moving into data engineering
I feel like the switch from like the modeling to the data engineering, it's so intriguing. Like what was that like? Like what were the sort of tools you switched into and things you used? In some ways I like fell into that one pretty like organically even though the like actual roles seem on paper very different. I think in part for what we were talking about of like you joined a company and suddenly like you find out like in school the data sets you were working with was penguins or I'm old enough I'll date myself and say iris. It just like embedded in some nice little toy data set. But you know, I mean so much of your work even if you want to do that really like exploratory deep analytical work is around getting the data.
And I think I both had kind of Wes's reaction of like I'm working on this like huge long multi-page SQL script and they're all these subqueries and nested tables and I'm trying to draw it out on a board like the Always Sunny in Philadelphia map. But at the same time like there was something to me that was like so interesting about both how like and like I think I was feeling all the pain points so closely that like I can't focus on the thing I want until we get data right. So starting to get like obsessed with kind of like column names, data quality checks, how can you actually do like testing and macros and all the things Wes called out that aren't native to kind of SQL. So it's like spending a lot of my time outside of work trying to like understand build out that part as well as realizing like truly understand data.
We talk about from a statistics perspective like understanding the data generating process but there's like a separate data generating process that's like the data pipeline process that is like the number one thing that predicts kind of what the data areas will be, what the failure cases are. So I think just kind of like fell in love in service of like understanding the problem in the last mile with thinking about the data.
The thing I was working on without going into too much detail I found myself kind of like repeating for many different use cases in a way that just like felt so anathema to me to just like how I saw people in like the RStats Twitter world outside of work like building packages, sharing code, like easier to collaborate with someone in Australia than collaborate with someone in the same company sometimes. And it was like that seems strong which is how I started like getting into like internal package development, tool building. So just like kind of thought I was doing it all in service of the analytical space but more and more I found my like time and energy like gravitated towards these upstream problems.
And if you had to like break down for yourself like at that time like what a data pipeline is and like what are kind of the the key pieces involved like how would you kind of break that down for someone? I mean the most kind of I guess classic go-to paradigm that you can think of with that is something like extract, load, transform. And even probably before extract there's the step we don't talk about a lot that's like logging or encoding of like somehow something that happens in the real world has to be turned into some digital signal. Then extract being someone has to go be like capturing that signal and getting all the different data sets into one sort of centralized place in some sort of format.
Depending on your field maybe that's hitting up APIs, maybe that's working with vendors, maybe that's even being a field scientist and doing manual like work in a notebook and then function it into a computer. But then once that's kind of like loaded into a data lake or a data warehouse you still need to impose some level of like organization which at a high level you can think about as maybe being more like organizing the files in your file system. But there are actually different conventions around that if you're using blob storage like S3 which is more like literally organizing files on your hard drive or doing it in a database where in some ways there are a lot more rules and constraints but that helps you get a lot more stuff for free like discoverability, some level of kind of like constraints and checks on internal integrity.
Discovering dbt
Down the road I think a long time but like I think dbt was like still really in its infancy at the point I started doing a lot of this. But this is definitely the point where I think started feeling very very acutely on a daily basis the exact sort of pain points that dbt solves for. That like I could build kind of my own workflows for like oh how do I like if I want to test some SQL code maybe not harder to set up test cases and unit tests not have a need to do that. I had some like crazy workflows like pulling data down into an RMarkdown notebook from my database and you know kind of like doing a lot more round trips than necessary.
Figuring out a big part of data pipelines is orchestration which like I don't think you run into so much in analysis but if you have like a lot of long-running things especially that depend on systems that aren't in your control thinking about how do you make sure all of those everything like happens in the right order. Which is not a hard problem there are a lot of great open source tools for doing that but maybe not always ones analysts have at their disposal and fingertips to spin up the right infrastructure for.
So I think like I just like had my elevator pitch for like the seven or ten things that were the biggest pain points of SQL and then one day I was like just at the gym listening to like a data engineering podcast and heard one of the first like interviews or I don't say first to me interviews about dbt and I was just like oh wow they like these people happen to like be interested in solving like some of the exact same problems I've been thinking of. Which I think like for a certain time in the data space was like a weirdly frequent feeling of just that sense of serendipity where like you'd be working about something or thinking about something and then within a week somebody like put out a dbt put out an R package ask the exact same question on stack overflow and it was definitely just like kind of serendipitously fell into it that I was having the same problem as a lot of other people.
SQL as a language and the spec problem
And I think that's the number one weirdest thing of SQL as a language is like it's not one thing but there's like a standard that everyone adheres to until um they don't. And of course introducing things like column selectors which I'm a very big fan of largely thanks to dplyr but at the same time like that is it's a tough trade-off too because I know everything like every time you deviate from the grain it's like yes but it's also like slightly less interoperable.
And I think that's I don't know I find that kind of bizarre I mean luckily like I think recently like Claude is like filling a lot of that gap for me like Claude seems to have this knowledge of SQL that I could never like through googling I could never find the right website but it must be on the internet because like Claude seems to understand it now. But I don't know it's just really interesting to me as someone who's been like writing this translation layer from R to SQL like to try and figure out like like what like what is SQL like what's the official way you're supposed to do this and like what the databases actually support.
That is fascinating about the spec I didn't realize that was so not accessible like there's so many like fascinating topics in open source governance but I've never heard of that like truth rules but you can't see them. I mean you can get them but it costs like three thousand dollars or something. And I mean maybe it's easier maybe I just never had the right searches but that just yeah I found that like mystifying like and kind of even just this idea that like to me like if I want people to like do something to like follow something I've written like to me it's like obvious you want to give that away for free.
Column selectors
I think just to like circle back and flesh out something you brought up Emily is you mentioned selectors and I wonder if it almost be useful to almost explain what a selector is because it's kind of like it has been added recently to a lot of SQL implementations but maybe you could talk us through like just what is a selector.
Absolutely and I think that tees me up for a question I've always wanted to ask Hadley so um appreciate that but um so if you think about I mean I'll pander to the R crowd here to start out but as selectors I mean you have a data frame you have a bunch of columns um if you have in fact named your columns well in a standardized way or if you think about data types there are a lot of different like kind of identifying information you could use to grab out and act on a set of variables. Um so if you want to do some sort of mass data wrangling process like say in SQL maybe you want to take the average of every boolean variable you have in your table if you're doing that in SQL you're going to be there a while because you're going to be typing like mean or average variable a average variable two average variable three.
But in some of the more modern um kind of programming languages with more flexible APIs you can have these like really nice kind of selectors where you might say like for all boolean values apply the same transformation or um if you get a little clever with like naming columns in a standardized way for all of my like variables related to this entity you're representing an indicator do this operation. Um so for kind of like large scalar wrangling it can just be a way to like write a lot like cleaner code avoid a lot of like kind of typos or copy paste errors by consolidating your business logic and just applying or mapping it over many different variables.
I don't I don't know it seems like my recollection of this is vague like I kind of remember around this time like kind of learning that oh like one of the problems people have is that they have a data frame with like 800 columns and just selecting the correct columns is a pain. My vague recollection and I did a little googling that kind of supports this is I think like Stata had some tools for variable selection uh that I that I have I and I think like some I think that's where I learned about it from from like Stata users. Um like looking at the documentation for Stata like it looks like it you can say like here's a here's a start of the range it has their end of the range here's like all the very you could select all the variables of the prefix so I I think that's probably I have a very vague recollection that maybe SAS has something similar as well.
But yeah I think I think that's or I think it came from like statistical software which is kind of surprising because it does feel like this is something you feel the need for and like SQL all the time because you're just typing out like I was just looking at looking at pandas to see if it has it had it and I don't I don't think it does I mean I haven't been super active in pandas in a long time but um it doesn't it doesn't look like it has selectors quite in the same way that that dplyr does or that you know DuckDB does now for example. I think the ibis team implemented it um and I guess like polars and polars and ibis have selectors similar to dplyr now.
I knew I'd done something similar in polars or in python with pandas of just grabbing out the columns and doing some list comprehensions and throwing that back in but I didn't want to say that was the best way to go. But I'm fascinated if it came from Stata like my main recollection of my major collection of Stata from college is like you can't have two data frames in memory at the same time so I did not think of these bits as like bastions of user experience.
So interestingly like I'm looking at the docs for this on the Stata web page and it one of the things that mentions like there's specifically like my kind of favorite selector in dplyr that like literally no one uses is called num range and so you give it like a prefix you give it like a starting number and an ending number and it will generate so you can say like select me from x1 to x50 really easily and like the Stata docs specifically say that there's no way to do that easily in Stata so I I don't know I think that's kind of evidence for like okay I was like well Stata can't do this like I think that's something useful and I'm gonna do that.
You mean there's like that's like spite like a response evidence of a response you're like yeah but it's also one of those functions I'm like I think this is cool I would have thought like people would use this and like basically no one does. Yeah it's so fun it's such a small in a way like such a small behavior but it does seem like so like it's it's so ubiquitous that that it does make a big difference that that SQL users and when I've had to type SQL manually this idea that you just type over and over again the same calculation say on and just change out column names um is kind of is kind of mind-blowing.
So it is funny like selection does seem like a really simple topic but it's kind of crazy how much it shapes the quality of life and I think even like for me personally I always wish like I feel like if that existed in databases natively people would think about more like as they build their database is like names is something that can do a thing and then think about those more cautious carefully like you would think about designing an API. I think you know when you get into like kind of industry or production databases it's like you get into all these things where there are like 10 different ways to like abbreviate account id so you'll end up with like 10 different versions of that be mentally accounting which table but I think for me it was just like a big aha moment of like oh like my column names can actually like do something if I'd like actually think of them as like part of the software.
dbtplyr and cross-community pollination
And this I think this is very much your um column names as contracts post is that right? Yeah yeah indeed indeed yeah nice and you I feel like dbtplyr so like your dbt plugin kind of like built on that concept a bit do you try to explain a little bit about that?
Yeah exactly so um dbt um plugins packages whatever you want to call them are the answer to what kind of was called out about um SQL not really having a native function interface. Again I think that's something that's wildly data specific some do and some don't but even the ones that do it's not really great to use them because then you've just like loaded code to a database it's not really version controlled you can't really see what it does can't access it that easily. But um dbt just I guess taking a step back is essentially like a collection of SQL scripts macros and other files organized in a very specific way so it can kind of infer the dependency graph and execute.
Um similar to how in R packages maybe just a lot of R files organized in a smart way so the computer knows what to do with it so they abstracted that then like step forward more where um you can have dbt packages which are kind of like data agnostic chunks of SQL code that you can then like kind of import in calls macros they can be at the function level they can be at the table level they can exist in a lot of kind of different levels of granularity. Um so dbtplyr was kind of like an early-ish um dbt package that I put out there that was essentially stealing um a number of things I really liked about the tidyverse but specifically around column selectors and trying to like port that API um into SQL to solve this exact kind of problem of how do I grab out a set of columns and how can I then like apply transformations on those in bulk.
What was the reception like because when I I went to dbt's conference coalesce in 2022 to give a talk and they were like they were kind of surprised that a person from RStudio came through they were excited but they the one thing I remember is they kept saying like emily is our like representative from the R community and they kept like they they pointed to dbtplyr a lot as an example of kind of like a really interesting cross-pollination of ideas like what was the what was the reception like what was it like kind of going into that community.
I think it was interesting because I think there were people kind of going back to our the point about it being like a simple idea but that has legs of like there are people that like got it probably that had seen it before. I know there is one um dbt labs engineer at the time that um really was like knew a lot about R had been part of that community really kind of like like latched on to it. You know I mean I think people that had like seen how it worked somewhere else um kind of really got it. I think the thing I realized in retrospect was probably calling it like dbt selectors um would have probably been like far more useful and informative for like discoverability purposes um since like dbtplyr like kind of a shibboleth of like pre-limiting yourself to only having like the R part of the community having like any earthly idea of what you're talking about or what to expect from it.
Naming packages across languages
That makes me think of a question we've been struggling with that you you might have some insight on and that is like as we're creating more uh packages where we have like an R version and a python version like what do we name them like do we give them like I think with like orbital and pins we're like okay we're going to call the R package and the python package exactly the same thing or do we do it like um great tables we've got gt on one hand and great tables on the other and I think there's another I don't know if you remember Michael but I think there's another case where we're like these are totally different names like do you do you have any sense of like like what do you think we should do when we're doing like one package like the same idea but implemented in in two places in two languages.
That's a really interesting question because like I'm sure too you're also bound by different namespace availability in both languages. Yes on one hand like like with orbital like I find it very satisfying of like if I've heard about in one language then it's trivial easy for me to be like oh yeah I know they have that thing in the other language but I do think I mean even with like gt versus great tables like in a weird way there's like a nice like mental name spacing to the fact that like I know these do not like aspire to be at parity the APIs within them aren't 100 the same and like I feel like it kind of fits my expectations right that like these are aiming at the same goal but they may not 100 get there in the exact same way.
Yeah that's interesting because like orbital is the case we probably that like the API of orbital is simple enough you can base it's basically the same for R and python but obviously the more complicated the package the more it has to kind of diverge like we like you want a package that feels R-like and you want a package that feels python-like you don't want to be like oh I'm writing R code in python or I'm writing python code in R. That's that's something I've so loved about um Michael and Rich's work on both like kind of great tables and um pointblank and I mean honestly even like plotnine I feel like manages to get like that distinctly more like pythonic feel in the python version.
I definitely write python code with an R accent but at the same time like as best like it feels so much like you're going to get so much further and have a lot less pain if you are like leaning into the conventions of the language you're in.
I will say for great tables we when we started working on great tables which is a port of the library gt in R to python it wasn't available on PyPI so that kicked off right away like the need for a new name but what I appreciate about Rich is he was open to like what if we just nobody few people knew that gt stood for grammar of tables so I have to hand it to him for being willing to just retroactively like rewrite history and pretend the acronym stood for great tables. I think I think actually few developers would be willing to retroactively change and I'd actually forgotten that like in my head I'd already translated yeah like gt that stands for great tables I'd forgotten it was like grammar of tables inspired.
Learning in public and imposter syndrome
I'm curious Emily how do you choose like the things you get into because I think that like digging into column names as contracts and like addressing the dbt communities so interesting um how do you kind of choose what what you're gonna like which threads you want to pull. I think I've always like first just like really gravitated towards things that might otherwise frustrate me but just like had so much curiosity of like couldn't they be better like don't we maybe need more internal packages or like surely there must be a better way to like write the SQL code. And I think so I think kind of like really being kind of like drawn to kind of like the little just like more paper cut everyday problems um I think it's just like left me with a lot of like curiosity and energy to explore things that otherwise it just kind of like train you.
And then I think that's just coupled with like I love just like learning about kind of new new tools new algorithms you know just like kind of squirreling away information that like it doesn't feel like I probably ever need to have just like save it as a nut for winter. And um I think and I think that was so easy especially early in my career was kind of like the renaissance of like RStats Twitter where you just have like people just like you could be on the fly on the wall for just like non-stop conversations people so much smarter than you doing just fascinating things and then just like very luckily just being able to do a lot of that pattern matching of like oh I have this problem oh and then I heard about dbt or like I'd really love to never copy paste this again but I think there's a thing called RMarkdown that exists.
Um that I think is just like it's kind of like I'm curious about the things that frustrate me and then I had to have all these like tools I want to try and then sometimes I just like kind of get lucky pulling the right tool from the toolbox. Yeah it's so cool and I I do think it's so interesting where so much of this really shines in your blog where you're often like stitching together tools like multiple tools in really creative ways.
And I know I think something that that like came up when when preparing for this is like imposter syndrome because I know that like with blogs or putting when people put themselves out there like that's a big can be a really big challenge. I'm really curious like your take on like what what that looks like for you and maybe advice you'd have for people that are like blogging or putting tools out that might feel a little imposter syndrome.
Absolutely I mean like even preparing to join this podcast is like what have I done I could just be having a nice afternoon and I'm talking to like three luminaries this field. But um I mean I think in some ways I think it's something that in some ways you really have to get comfortable with because if you like if you aren't in rooms with people that you feel like you have a ton to learn from that's kind of sad you're probably also in the wrong rooms you know. I mean going back to like I think the last question of like the way I learned was just getting to like absorb all this amazing content from other people out there and like if you weren't part of the conversation you weren't going to get to kind of like learn and encounter those things.
So I think like I've always like somehow managed like get myself like in a lot of places that I really had no business being you know even switching between like the analyst engineering more mo hats and like just getting kind of comfortable jumping in and like learning as you go. And same with like obviously not PhD political scientists or economists but the more that like I could like peer over into those spaces and see the different ways they're talking about causal inference you see you know I mean like I think to a large extent I think it's very good to feel like you aren't smart because that's like tells you you're like in the right place where you're learning.
Sharing on the internet I think it's always like hard and scary I think it is so much to the great credit to the R community that like it never felt that way. I've thought about that if I graduated like if I'd like come into the field either like five years sooner or five years later both of which rudely was when the internet seems like was most more hostile place and like oh what I have felt is like confident just like putting stuff out there. But um I think Emily Robinson really maybe said it best at some point in a um conference talk of like just like right for the person that like is struggling with the things you were struggling like 6 or 12 months ago and just like try to help them know what you need to know because I think that's just like a good way to remember there are other people out there just working on the same stuff then that you do like genuinely have something something to contribute.
Just like right for the person that like is struggling with the things you were struggling like 6 or 12 months ago and just like try to help them know what you need to know.
Yeah it's such a neat um way to frame it I feel like of just like thinking back as both a way to like really emphasize like your own growth but also like it does it does feel nice to be able to just take that difference and like write it down. Yeah and and I just like I think it's the best way to like crystallize your own thoughts that I mean nothing I write will ever be like hey guys this is exactly what I did at work today because I'm pretty sure my employer would um they kind of prefer I not go like sharing IP all over the internet. But um I think kind of also that just thinking about like how would I help somebody else do this thing forces you to actually think at the right level of abstraction that I mean to just take the trivial example of column names it's not like how it actually works better for the state is that if I do this but kind of like almost like it forces you to kind of like generate like a theory behind things that is in itself a little bit more like reusable.
Advice to your past self and Claude Code
I'm curious if folks have this is maybe a tough exercise but like things they would say to themselves from 6 to 10 months ago 6 to 12 months ago like I'm almost curious to hear what people would tell themselves. I mean I think I would tell myself to like try Claude Code earlier maybe I was trying to use that like six months ago but uh I think that is something it does feel that like as you get further along in your career though like you do like the your growth definitely slows like there is that period like when you're first starting your job and there's just like this firehose of information and everything is like oh my god this is like amazing and you look back at code you wrote like six months ago and you're like this is a heap of shit like why would I ever write that.
And so now with coding agents like that whole chart needs to get completely redone because the amount of time that it takes especially if it's building something it's not very hard to build but it's something that is just for you and it makes your life just a little bit better it saves you five minutes a day 10 minutes a day maybe an hour now and then um I I you know should have been a little bit more like I I've built things in the last three months where I'm like should have done this a year six months ago or or or a year ago. And so I think that that seeing those those successes have given has given me like a sense of like you know maybe more boldness or like more of like a willingness to like dive into things or like set you know set an agent to work building building something like maybe it's only something it's going to save me 10 minutes like twice a month but whenever I have those 10 minutes that it saves me it's going to be super satisfying.
One of the um words I thought the phrases I found super useful was this idea of tacit knowledge which I learned from Bill Bearman who taught this like fantastic amazing data science course at Stanford that I can help out with and this idea of all of these things that like you know to do but you'd never think to write down and like when you watch someone working you're like oh my god you can do that like like it's fascinating like even you know within the tidyverse team we've like worked very closely with each other for a long time like still when someone shares their screen and they do something and you're like like what you I was like what you can right click on the Positron in the dark and it lists your like recent projects like like mind blown like you'd never think to write that down because it's just something you're like oh everyone knows this.
Closing and posit::conf(2026)
I'm also I had the are we allowed to talk about posit conf yeah sure sure if Emily's okay with it I heard a little bird told me you might be the keynote at the next posit conf is that yeah I am both honored and I told um Jenny when she mailed me I'm right now I'm still speechless which is probably not a good trade in keynote speaker but no I'm I'm so excited yeah we're super excited to have you well thank you I'm honored going back to imposter syndrome a little bit horrified but um honored yeah looking looking forward to the journey because I feel like I always like I learned so much in the process of like bubbling my thoughts back down down into words like I feel like the the process of writing talks too I feel like you just learned so much more about whatever it is you thought you were enough of an expert to talk about.