Resources

Julia Silge: Part 2 — Glue work, licensing, and open source in the age of LLMs

In part two of our conversation with Julia Silge, we discuss how work actually ships: the boundaries, the glue, and the tools that turn noise into signal. From there, we go macro and wonder what the LLM era means for humanity’s contributions, plus how licensing is evolving to protect sustainability without abandoning openness. Both practical and philosophical, this conversation spans workplace energy, team connective tissue, and the big questions LLMs have us asking in a shifting data science landscape. What’s inside • Julia’s system for turning scattered community signals (GitHub, Stack Overflow, discourse) into product insight • The power of “glue” work, and where to find the wins • From Stack Overflow to LLMs: What changed when communal Q&A became model fuel — and what that means for finding answers • Licenses in a new era: Threading the needle between MIT-style generosity and elastic-style sustainability for platformed software • Try Positron: Where to download, read docs, and give feedback

Dec 3, 2025
30 min

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to the test set. Here we talk with some of the brightest thinkers and tinkerers in statistical analysis, scientific computing, and machine learning, digging into what makes them tick, plus the insights, experiments, and OMG moments that shape the field.

This episode is part two of a conversation with Julia Silge, data science leader and engineering manager at Posit.

Synthesizing community signals and working in the open

Yeah, it's so interesting to hear at the edge of experience as being your focus and even that inside of Posit and inside of companies being a big, from what I heard you like a challenge across teams. Where does that thinking show up for you usually? Is that in like a notebook or document? Where do you like to kind of refine your thoughts or shape them?

Yeah, yeah, yeah, yeah. I think I work in a combination of GitHub issues. If things can be broken down into, say, like a fairly concrete piece of work, then I'm like, okay, I'm somewhere there. Another sort of way of working that I and people on my team work in is writing documents, whether that's like feature specs that are very technical or whether that's more like a description of the kind of behavior we expect to be a little more like high level. I think that another way that this comes out is actually in like synthesizing bits of smaller information, whether that is, you know, we have as a public facing product where that has quite a number of users out there, right? So there's like stuff comes in on GitHub, of course. So, you know, stuff comes in. You know, we watch Stack Overflow. We watch like various discourse type boards and whatnot to like kind of be able to synthesize.

And often that involves like a kind of like a bottom up kind of organizational, like information organizational kind of thing of like how can we like how successfully can we understand how these things are related? Because they 100% are, right? Like very few, it's rare when something comes up and we're like, well, that's like nothing else we've seen before. Like we either these are thematically related or like they're different ways of the same kind of problem. So I would say in a practical sense, it's like, like I'm writing in different places depending on who the audience is or like how much and then in terms of organizationally thinking, I do a fair amount of like those kinds of like information organization kind of like activities.

Do you have like a preferred like tool or system that you use? Like are you like a mind mapper or like really into Obsidian or?

I feel like because of where our team stuff is, I end up in GitHub projects a lot, which are pretty good for if for things that are captured as really concrete type things for things that are a little bit less concrete. I think I am mostly I am mostly in documents and I am mostly in documents that are like shareable because like Obsidian is great for Obsidian is great for your internal stuff.

You're like just talking around the fact that you did everything in Google Docs and you don't want to admit it.

I don't want to admit it. I spent all day in Google Docs and I just don't want to admit it.

No, I mean, the thing is Obsidian, I don't know. You don't have any like shareable way of using Obsidian, do you? It has to be another. No. Yeah, yeah. And so much I mean, so much of what like I need to get feedback from people on it or like so the fact that I so much of what I do involves going to other people, either for their input or for them to agree, you know, like it's really the collaboration.

So speaking speaking of that, we do have we do use actually Quarto as well for a good chunk of this because we have like really extensive internal internal documentation for like process like like like iterating on plans that is less like a that is less like a situation that might have been a GitHub issue and also less something that like might have been a feature spec that's like kind of throw away like right at one time do it, you know, like don't come back to it. But stuff that we maintain over time about process and about how things work that we do actually use Quarto for.

And then the collaboration looks like Git looks like, you know, like those kinds of modes.

Yeah, I'm really looking forward to the day where we figure out how to like have shareable commentable Quarto docs.

That's the dream. That's a dream.

Yeah, I've become like a pretty pretty heavy Obsidian user mainly as like a as a capture tool because like stuff will come up in like a Slack channel that I'm that I'm a part of or over email. And I find that like if I have too many like unread unread tasks in different places that eventually like I'll just forget things and I'll drop balls and so I find like capturing things and obsidians like my personal like don't forget this list of things to not forget and to get things out of my out of my inbox.

Glue work and the value of boundaries

I have to say, too, I feel like you inside Posit, like we find you wrangling a lot of kind of like meetings or I want to say like edgy, not edgy. OK, edgy is the wrong term, like at the edge of experience types, things like if I could give one example, it's the the Python open source meeting. You're you're really a facilitator of the Python open source meeting, which I've always found really pretty inspiring that you like there was it seems like to me like there was a gap, like we need someone to fill this.

And there are a lot of people floating around, but that you really kind of took it and ran. So I think there's different ways to operate as a manager and some people who are really successful in managers have a really strong sense of ownership of like this is my team. I am going to make them successful. If this thing is successful, I am successful. And that is great like that. But that's not quite the way I operate.

I think as like a manager or a leader, I think there is so much space for a like slam dunk experience. There is so much space for massively leveling up how people perceive your tools when attention is spent at the boundaries of things.

And and it is often somewhat thankless work in the sense of like, like you don't always get as much credit for stuff that ends up gluey, you know, like you. It's also like pain, like people can't acknowledge pain that they didn't experience. Yeah. Right.

Yeah. I think that's like you're making pain go away, but it's because people never experienced it. Like it's hard for them to appreciate.

Yeah, yeah, yeah, yeah. Super. I think that sort of work is like so incredibly important. It's important. And also, I feel like there's often a lot of space to really, you know, you're it's not like you're trying to eke out some tiny little bit of performance improvement, making something tiny. It's like you can usually massively level up some sort of painful experience by paying attention at some boundary.

And so I do think, you know, like internally at Posit, I do end up doing things like like, like maintain the Quarto extension for a while. I mean, you know, like, you know, stuff like that because it's like, well, no one else is quite doing it and it's important to this team and this team. And I think it's because I have seen a lot of I have seen, I'm going to say, really outsize impact from attention at those spots that I think have a big, like big picture, big picture kind of like effect, I guess.

I think as like a manager or a leader, I think there is so much space for a like slam dunk experience. There is so much space for massively leveling up how people perceive your tools when attention is spent at the boundaries of things.

It is. It's sort of fascinating to me just how much like kind of team identity starts to play. Like I think like in general, I think it's a good thing for a team to have the strong sense of like, this is what we do. But it can mean that there's these things that like are really important that kind of fall between the gaps of every team. And it's surprisingly valuable to me, like it surprises me, like I don't think it's really surprising. It surprises me. Like one of the very valuable things you can do is convince people that this is their problem to solve. Like this is a real problem that exists. Maybe someone else should fix it, but they're not going to. Like you could have a really big impact here by like kind of going outside of your comfort zone, like tackling this problem that crosses some boundary you don't normally cross that like that can be so, so valuable.

Remote work and staying connected

Yeah, I feel like one question I have is this is kind of like inside baseball having seen, I've noticed people at Posit have a lot of chats, like monthly chats, just like catch ups and stuff like that. I'm really curious how, say, Julia, you've approached kind of like keeping in touch with people, whether I don't know if you have a chat with the Quarto team or if if that's part of your strategy and how you approach kind of just catching up or water cooler type stuff.

So our company is fully remote, right? Like it's fully remote. And I am very comfortable with remote work. I I've been working remotely for a really long time. So I'm very comfortable with like asynchronous communication. I'm very comfortable with like kind of a lot of habits around remote work. I don't feel disconnected, like I don't feel isolated.

And I think part of that is because I do have habits around like a like kind of having a low bar to asking somebody, say, to pair or like having a low bar to like like, hey, can I get on a call with so-and-so and like to talk about this thing that came up and that we might need to do? And I think that the you know, there's there's tradeoffs there, of course, with time, you know, and like how how much heads down uninterrupted time do you need for your work? And I like I I like most managers or I don't know, some proportion of managers that posit I do. I do write code as well.

Still, we're like we're not we're a place that tends to at some companies, you know, people who manage the expectation is you will not write code. You cannot do a good job at both. And there you know, there is some truth to people who make that argument that like if you're going to be a really good manager, you can't also be trying to write hands on code. But for better or worse, that's not. I would say the norm for most people, like most managers that posit most of us do have kind of roles where we we do contribute, even like on a like a concrete, like I wrote this code kind of way.

And so kind of balancing that, what's it like to have heads down, heads down, like focus kind of time with this connection? Like there's definitely a tension there. There's a tension there. I think at our company, there's often a lot of value placed on focus time on that. That's really where we get our like why we are able to build excellent products is because we allow people to have this focus time. And I think that is totally true. And I think at the same time, that often is in tension with with people knowing what's going on and other teams are being connected. And I probably for internally at Posit, I probably am one of the people that biases a little more towards boundaries, connections, like how are teams working together?

What excites Julia about data science

I feel like with wrangling like these kind of complex systems and doing hands on work and managing, one thing I'm curious about, too, is just to go back to kind of a more basic question. What would you say gets you excited about data science?

That's a great question. That's a great question. I so yeah, no, I find so my you said my background is like academia. Like I you know, I got I did I did like a physics undergrad and then an astronomy PhD. And so I went really narrow and I had a great time in grad school, actually. Like I loved my project. I had a good relationship with my advisor. Like I had a really good experience.

And then when I went to be a postdoc, there were a variety of reasons why I was like, wait, is this for me? One of them was I was like back in a physics department and I was like, oh, that's right. Physics departments can be kind of toxic, like overall astronomy departments tend to be full of much more practically minded people. Like a lot more like applied kind of thinkers, people who make instruments, people who analyze real data. And so part of it was being back in the physics department, but part of it was just progressing a bit in academia to find that. Oh, right. I have to keep specializing for forever. I have to keep getting the better and more and more expertise on the narrower and narrower thing. Like that's what this path is. And I kind of came to grips with the fact that actually I was not so interested in that. Like that was not that did not bring me a lot of like joy and personal fulfillment.

And I started to think through like what like actually what do I like doing? And that has like lasted through through the whole like rest of my adulthood and career to realize, oh, and now I would frame as what really motivates me, what I really get excited about is I really I love I love learning about and being involved in how I like people's really applied work and the processes and systems around people's like how people are really doing their work.

And so I love working on data science tools because it is such applied. It is such like real world applied work that comes up against like it comes up against the mess of people's data and like talking about systems and processes and like tooling around that is like really motivating to me. So part like do I think data science is the only thing I could be happy like working in a field? No, honestly, part of it was random chance that like I came out of astronomy at a time when transitioning into data science was like a thing that people were doing. So part of it is this kind of random like it probably could have been something else. But I think the reason why the characteristics of it that I'm like, oh, yeah, this is for me. This is a really good fit for me. Is this it gives me an opportunity to work on tools that are about like the design of applied systems that people use for their work. So and I just love that. I love it.

Stack Overflow, LLMs, and the fate of communal knowledge

That's so cool. I this is, I think, a related thing on the point of these systems that people use for their work and these real world problems. One thing I was thinking about as as I was preparing for this is I remember that you were at Stack Overflow and it almost made me think about this complexity of the system, thinking about how people get answers to things, to questions about programming. I'm really curious, almost in the context of the the age of AI and how people seek answers and complex systems. I'd be curious if you could say a bit about your time at Stack Overflow and whether you've thought about kind of that role of a site like Stack Overflow and how people get answers today with AI.

Yeah. So I was a data scientist at Stack Overflow for about five years and it's a very interesting place. So this was before this was before the rise of LLMs. Like this was I was there. Before the fall of Stack Overflow.

So it was it was a very interesting place to be a data scientist. Maybe roughly, let's say roughly half my time would be spent on working on what probably what you think of as like public Stack Overflow, like people who come ask questions, answers questions, the voting system, comments. How do we deal with content? Like, you know, like how does the how is content curated? How do people find content? So stuff. So maybe roughly half my time was spent on that. Roughly half my time was spent on the ways that Stack Overflow at the time made money, which would be a combination of like ads, including like content based ads. And then there was like a private Stack Overflow kind of like product like the people would use internally, kind of like as an alternative to, you know, like other internal knowledge kind of basis.

And it was it was really interesting, both in like a Web 2.0 way, like which which what like because of the age I am like that was like a big part of like how I experienced the Internet, right, was like, oh, actually, the Internet is for you to come and post your stuff on, you know, like that was for me really empowering at the time, like back at the time. And so like Stack Overflow is part of that that wave, right, of like what is Internet technology like? What is technology like?

So it was interesting in like understanding like, hey, how can such an organization be sustainable? Like what are ways that like you could build a business on this? Because the founders of Stack Overflow were interested in building a business. They were not interested in like a Wikipedia style model, like they wanted to build like a value generating business. Like, OK, what are some attempts of that like to go?

And and now kind of after the advent of these LLM tools, which, to be clear, slurped up all of Stack Overflow in their training data, you know, like what what does it look like now? Like are we in a like a lot of sort of the the back and forth, you know, question asking, answering has now moved from a place that like it could be viewed as a communal resource. It has all been slurped into places that are not accessible to us as a community anymore. So it's like the big the big model training organizations came and took they took data that I would say morally belongs to all of us. Right. Like morally, ethically, that's our data. That's our data as like human beings. Right. And they took it and they trained a model. And now they have like they have not only like made something they can make money with off of and, you know, made useful tools. Right. But it means that the the data that was used to make that possible is now not being generated at the quantity or in the way that it was before.

They took data that I would say morally belongs to all of us. Right. Like morally, ethically, that's our data. That's our data as like human beings.

So I think it's super interesting to think about where are we today and what are going to be the what are going to be the steps that will keep, for example, our ability to get answers to our coding questions, you know, like like what will what will that look like going forward? I'm not I mean, I'm not a big doom and gloom person, you know, I'm not saying like the world's ending and anything like that. But I think it's some real questions because what got us to here is not what we are doing now because the ecosystem has substantively changed. The world has changed. The world has changed in terms of what like what like where is that data coming from or like like where where are the questions and answers? Like where is it such that it can be useful to the community as a whole?

Licensing in the age of LLMs

Do you think it's changed how you think about like because I think Stack Overflow is all like Creative Commons license. So like kind of legally, all the LLMs are, you know, providers are fine. Does it kind of change how you think about that license? Because I have to say for me, like for the longest time, that just seemed like absolutely the right thing to do. And now I'm like, I don't know, that's just giving all the stuff away for free is kind of. I don't know.

Well, there was like there was there was like an inflection point where I think the license for content posted on Stack Overflow changed where prior content prior to that date, it became this like massive IP contamination issue where developers would copy and paste stuff from Stack Overflow into their company's proprietary code bases. And then, you know, if you go to sell a software product that that code would turn up during like, you know, IP legal due diligence and say, oh, like you used code from Stack Overflow and maybe it could just a small snippet. But either you have to figure out how to replace that code with your own IP or you have to go and find the original author and ask them for permission to use that code. I think in the meantime, it's there was a change to make it more lenient. But still, like any content that was posted prior to that date has like the old, more more restrictive license. But it's interesting. It's very interesting. The licensing around these kinds of issues is very, very interesting.

And I think it's I think questions around how these licenses gotten us what we thought they were going to get us, like how these like because these these all these open source licenses, you know, like like came up and like have been hashed out and whatnot in a in a time when like like it was a technologically different kind of time than than what we have now in terms of the constraints. And people talk about like, OK, can we iterate on these? Can we think about these differently? Like something that, you know, there's still discussion about is like like iterations on these licenses that make kind of kind of moral or ethical claims. Right. Like like I want to exclude certain kinds of uses like you like there are these licenses that are like mostly open source ish, but they exclude some uses like maybe defense or or something like that. So there's that sort of category of iterating.

There is the category of license that's like mostly open source ish. But I put some restraints on, say, how you could how can you can platform it? So in full disclosure, Positron has a license like that. Positron has a license that is not a true OSI approved license. It's elastic license. And the reason why we did that was because like our experience as a company working on open source software has showed us huge benefit. We're huge believers. We're committed to open source software. The pieces of the software that you can make money by platforming them. We ended up making the call that like actually we don't want another giant, say, cloud company to be able to like to to get directly revenue from just just making available the thing the thing that we made.

So I think like I I've been involved in open source for a pretty long time. I'm a huge believer in open source. I'm not religious, though, about these specific licenses, because I think they're just things we wrote. And how are they turning out? How do we want to iterate on them? Like, like, what do we think is best for us as a community? I mean, our company, we're iterating with that. We're being really explicit, like this kind of software, like a Python package or an R package. It's like MIT, this kind of software. We're not going to do MIT anymore because we think it is not aligned with our long term goals around the like the sustainability of our of our company. Super interesting questions.

I think, yeah, to me, like a lot of it is about like. To me, like this open source is kind of like a gift, like this is a gift that I'm, you know, I'm spending my time on and getting to the world. But it's not like if you're going to like abuse that gift, like I have to keep giving it. Like there's some I don't know, there's some sort of sense of like I want to be giving it to people who are like necessary to be giving it back to the community. Like I get not everyone's in a place like I'm happy for, you know, a lot of this work just to be used by people. But it just starts to feel like exploitative when like big companies that are like tens or hundreds of thousands of times the size of Posit, like make money off our work. That just feels a little.

Gross, and yeah, I'm not like so religious about this, like it's about freedom, it's about all of these other kind of like big philosophical ideas, like to me, it's more about like community and, you know, trying to share what people are trying to do good in the world in some way, but at the same time accepting like if you try to pin down exactly what that means to like you get you get lost in the details and you just have to accept that people are going to use it for things that you don't, you wouldn't personally like them to do. But I don't know, on the whole, I kind of hope that people are using open source software like make the world a better place, not just make money for themselves.

The chicken-and-egg problem for new open source projects

Yeah, it's helpful to hear the kind of how Posit is trying to like thread the needle between things like MIT and the elastic license and find things that kind of work for everyone in the different like circumstances. It's for sure an experiment, it's like, okay, let's try this. How's this going to go? You know, like, and I think I mean, so many of these things are untested. And like, we don't quite know, you know, like how these things will play out. But it definitely is interesting.

The thing that's been on my mind a lot is, is if how will developers will will users be motivated to discover and learn new technologies that their their favorite LLM doesn't know about? And how is how is that going to affect the development of new open source software projects? How do the LLMs get the content that they need to get trained on new projects? And so, you know, for folks like us that have been in the business of teaching people how to do data science, building data science projects, it presents this, this conundrum or this maybe chicken and egg problem of just the nature of building new new open source software for data science is, I feel like going to be permanently, permanently altered and how that affects like adoption rates. And just how long does it take before a new project is important enough that the LLM providers go to the effort of like, creating a training corpus to teach the LLM how to use your new open source library that doesn't have doesn't have that much content available on GitHub.

And we're on the kind of like the lucky side of this, like most of the tools that we have created, like are now in the training sets, like we are kind of like, but I don't think we want to be like locked. Like, I don't want ggplot2 to be the visualization package used for the rest of humanity because it's impossible to create a new system because everyone uses LLMs. If it's not an LLM, they don't use it. Like, that doesn't seem like a win.

Like, I don't want ggplot2 to be the visualization package used for the rest of humanity because it's impossible to create a new system because everyone uses LLMs. If it's not an LLM, they don't use it. Like, that doesn't seem like a win.

Like how, like how, what's the, like, how does the, like the death part of the open source life cycle, like, how does that change? It's interesting with Stack Overflow too, like to these points, like a lot of my early experience with ggplot2 was on Stack Overflow, where Hadley would answer questions, but you would see the question and then you would see like a couple different options and one would be ggplot. So it's interesting to think about sometimes, yeah, the answers that people might get now if they ask an LLM, if it both, you might not even see that Hadley's answering and you might not see the range of tools or you might only see certain tools, I guess, in the output to Wes's point.

So, yeah, Julia, I really appreciate you coming on and just like opening up the complexity of this like data science workflow, how to reach your senator, you know, is there any, any like parting words for people at home, either ways to help you with Positron or things you'd encourage people to check out?

Yeah, yeah. So if what we've been talking about today has piqued your interest in Positron, you can go to positron.posit.co for installers and documentation. And I think I am excited for more and more people to get exposed to it and to try it out. And it's been really delightful to talk with the three of you here today. Thank you so much for having me on and for asking such insightful questions. I honestly haven't thought about the pizza in quite a while. So it was a little bit delightful to get to to get to revisit that.

No, it's been such a treat. Thanks. Thanks so much for coming on.

The Test Set is a production of PositPBC, an open source and enterprise tooling data science software company. This episode was produced in collaboration with creative studio Adji. For more episodes, visit thetestset.co or find us on your favorite podcast platform.