Virtual Day Q & A Hangout with Virtual Speakers | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

Welcome back, everybody. Welcome to the Data Science Hangout-style Q & A. It's not a real Data Science Hangout. We are here speaking with our amazing virtual talk panel speakers, and I'm joined by my co-host today, Rachel Dempsey. Rachel, would you like to say hello?

Hey, everybody. I am actually in Atlanta this week, so I'm happy I get to be here for both virtual and in-person, so if anyone happens to be in-person too and is joining from their hotel room, come say hi to me later today at the reception too.

Awesome. Yeah, we all want to know how Rachel's Wi-Fi is doing at the hotel. Everyone's like, it's great. So far, so good, but maybe I shouldn't have said that.

Okay. We will say no more about that. I forgot to introduce myself. My name is Libby Heron. I'm a data community manager here at Posit. You probably know me from being the host of the Data Science Hangout. I wanted to give a quick intro just in case anybody didn't join for the talks, but they are joining now, and just in case you're going to be very confused, but we're hanging out with our speakers for the talks. I would love to go around in the order that our speakers gave their talks and just do a quick intro, your name, reminder of which talk you gave, and something you like to do for fun.

you can use it as a way to review the responses and start to get an idea of where to look, especially if you have a huge data set. If you look at those uncertain responses, you're going to start finding. It's going to help filter through to actually get good ones to review.

Q&A: ggplot2 interactivity and extensions

I see a question for Gina. I'm going to go repost into the Discord. But it was, is there any appetite for interactivity in ggplot2? And if there are any extensions that do this, I may just be unfamiliar. They said, I'm a huge fan of High Charter and Plotly for the interactive elements, and it'd be great to see options using ggplot.

Yeah. So, one of the presenters that we want to have is David Goyles, and we've been in touch with him. I think that that is going to be on our schedule soon, is the GG-IRAF. I don't really, some of the packages are hard to pronounce. I can put a link to his, this package, and that allows for, I think, some interactivity with not too much difficulty.

Yes, I say GG-IRAF as well. I know, I know. I think it's G-Giraffe. There's definitely a difference. Yes, it is. But I started saying GG-IRAF a long time ago, and I don't think I can break that habit.

Actually, on the topic of packages, I was curious, what is everybody's favorite package or hex sticker? Because this is my new favorite with these things. The Stacks. Stacks are always good. If anybody doesn't know the Stacks lore, there's a new Stacks every year, a new Stacks sticker with different fruit on it. Does anybody want to go around here with their favorite hex stickers? Gina, what's yours while we're talking to you?

Or a favorite. It doesn't have to be the favorite. It's ggplot, too. ggplot, too. Of course. That's cheating. Ruz, how about you? Do you have a favorite? Yeah, I'm trying. I love the ones that are holographic, and I think dplyr has one, and it also has the seal on it. So, I don't know, some holographic one, my favorite. I have some of the holographic Apsalon ones as well, and they are so dang pretty. All the Apsalon ones are gorgeous. Kennedy, how about you? Do you have a favorite hex? Nope. Not gonna lie. No. No favorite. John?

Yeah, I've got one. I've done a few myself for obscure packages no one's ever heard of, but the one from Packagedown, the box-like thingy is pretty nice. It's pretty nice. I agree. It's like a cardboard box, right?

Indeed. That's what I remember at least. I think you're right. I have it behind me. I really like it as well. And Dylan, how about you? So, I guess since it's relevant to my talk, I do really like Elmer, and I also like Simon's package vitals . They've been trying to create some art where they're together, where this little bear with a stethoscope is trying to find the bear, or the elephant. And I also really like the one package that I made called Red Quack, and it's a play on DuckDB and Redcap, and it has a little bill on set. I'll put it in the chat. So, shameless self-plug there.

Q&A: blue-green deployment and user sessions

Okay. Let's move on and have another question, because we have one for Richard, and that is, reconfiguring the vanity URL to switch from blue to green. Are there any challenges to user sessions that could impact the end user experience?

So, that's a good question. From my tests, users didn't seem to be disrupted. They might be connected to the previous version, but the thing that you do need to be careful about is if you reconfigure environment variables, because if you do it, then, for example, in the UI, you would get a warning that, hey, just be aware that if you change an environment variable, all processes will be terminated and started up again. So, yeah, that's the place where you should keep that in mind.

Q&A: Ambiorix mindset shift from Shiny

Okay. But at least you get a warning. Thank you. Thank you. And, John, I have a question for you. Someone asked, anonymously, for someone who is used to Shiny's reactivity, what kind of mindset shift do I have to use Amburex? What challenges might I encounter with my assumptions? It's rather different. It's a good question.

It's very different, but I would argue that Amburex is rather easier in that sense. So, at least, I build Shiny up almost for a living now, and one of the difficulties with the reactivity, though it comes with plenty of convenience, is that you have to hold in your mind constantly quite a few moving pieces and reactives. What happens in the back end, how it reacts on the front end, et cetera. And with Amburex, it's basically a bit like, how it works is a bit like Plumber . So, you just have to think of the user makes a request to your server, and you just send a response, and you don't have to think a bit. There are no moving pieces around and things to keep in mind, essentially. So, in that sense, it is much easier, I would argue, at least.

So, maybe less of a challenge and more of like, this might feel a little bit easier in that way, because reactive programming is so challenging in and of itself. It's hard to wrap your brain around, especially when your apps get really big. Thank you, John. I think so. Yes, I think so. I don't want to ramble real quick. It's the problem. I mean, reactivity is great for what generally people call SPH, single page applications. And we use plenty of them daily, I think. Things like Instagram or Google Maps, when you think of it, these are all single page applications, and they are with a reactive back end. It makes sense. But I think as soon as in China, you start having too many types, too many tabs, sorry, and too many, I don't know, too much business logic, too many different things moving on, and it becomes a bit too complicated. I think a traditional framework, a bit like Plumber or MVRX, is better suited. Yeah, you've got lots and lots of pages.

Q&A: LLM model selection for classification

All right, awesome. Thank you. I have got a question for Dylan. Dylan, are you ready? It is, you mentioned frontier models. What models do you find work best for image versus text classification? Which ones are not good?

Okay, so I have only really used the vision models for this kind of testing I did for this presentation, so I am not an expert. I think all of their vision models are going to be pretty basic. You could even see we quickly ran into the limitations with the vision model in this use case for the classification of the iris flowers, but I know they all... It is practical to you, right? Yes. Just making sure that we all know the context. I did try it with Anthropic's Claude model as well, ran into the same issue. I did try Gemini, but they all have vision support at this point, so I would play around with it. Text models, I think they are kind of all pretty good. I like OpenAI's GPT because it is cheaper and performs usually as good as Claude with the structured data tasks, but of course I like Claude more for coding and some of it is kind of like ethical foundations. You just kind of have to explore, and also you can classify with different models. If you do not have... In the presentation, I showed examples where we had the true classification to look at whether the LLM was giving us a correct response, and you can actually use just different LLMs too to see if they perform differently on the task and use that as a signal of why aren't these LLMs agreeing on my classification. Yeah.

Q&A: ggplot2 extension motivation

I see an anonymous question for Gina, and it was, you mentioned feeling like an outsider to the ggplot extender community. What drew you to ggplot in the first place? Did you use a lot of ggplot in your job at the time?

Yeah. In the first place, I went to a workshop that was on ggplot too, and I think that was really foundational to show me how the pieces fit together and that things are so, like, the orthogonality of the different components can be expressed that way. Yeah. I was using ggplot too in my job and just saw, like, I would come to a point where I would be frustrated with what was on offer in base ggplot too, and I saw other extensions. I think one of our first presenters was the author of ggbump, and I think that with that, I realized, oh, this is just a ggplot too user that's also at the edge of what they want to do easily with ggplot too, and they figured out extension. And so seeing that experience, I was like, well, maybe I can get into that too.

Some of the cases were really basic. My point of entry was on this means question was, like, I don't want to compute those means. Myself, I see ggplot too computing stuff all the time like in making histograms and stuff, so I know you can do it, and let's just get into it and figure that out.

Q&A: blue-green deployment for A/B testing

We have a question for Risharj. It is here. It is, have you ever used blue versus green deployment for A-B testing a new feature?

I have not. I would use feature flags for that, and one thing to keep in mind is if you want statistical significance, you might need a bigger sample size, and if you have such a large user group, then first of all, congratulations. And, but if you have a smaller user group, then what I, like, I wouldn't call it, like, a full A-B test, but sort of, like, doing user testing with two different versions to get a sense of what UI might be better, so I would get on a call with different users because the user group was small enough where I could, like, talk to most of them and present some with one version and others with another version, and that way sort of learn which option might be better, and their feature flags and breaking down the option of, hey, if this person belongs to this group or to this group, could be a way to do that.

Q&A: deploying Ambiorix

So, for either Kennedy or John, there are a few people asking about how do you deploy Amburix?

Yeah, so you have several options for how to deploy your Amburix application. For someone just getting into deployment, the easiest is probably using Shinyapps .io. We actually have documentation for all those methods that you can use. But obviously, the easiest Shinyapps.io route, it works well with Amburix. You just have to set an environment variable, and then that's it. Or rather, grab an environment variable. We have it well documented. But Shinyapps.io is the easiest. The second easiest would be Shiny server. Yeah, you can actually deploy. Just the same way you deploy Shiny applications is probably the same way you're going to deploy Amburix applications. And in fact, it works well if Posit connects to a workbench, and we're actually working on the documentations for those. Yeah, but for people who are more familiar with servers, you can go for anything from AWS, DigitalOcean, GCP, all those options.

Q&A: getting started with LLMs in R

What advice would you give someone who doesn't know where to start with LLMs in R and they just want to get their feet wet? Yeah, that's a great question. I think just start with Elmer. Like, just start up a chat. Start looking at how to programmatically chat. So, you know, first you'll see that it's just like sort of like basic, you know, prompt and response. And then you can start to see how then you could, you know, like paste in other text information or provide like data to it programmatically and start thinking about those sorts of scenarios, which then will get you into like, okay, well, I actually want that data to be structured. Actually, I have like, you know, want to make multiple requests. I think one example is like, you know, use the weather, like pull some weather data from a weather API and like have the Elmer chat provide like this nice, you know, short couple sentence description about what the weather's like at a certain location. So, just play with like simple examples like that and look at the Elmer documentation.

Q&A: starting with ggplot2 extension packages

Question for Gina. Somebody had asked, for someone who's mostly proficient in ggplot2 but new to creating packages, what's the first step you'd recommend to get started? Are there certain extension packages that you consider really good, like a good example to follow?

Yeah, so I think the place to start is like working on the extension itself and making sure that like that is sufficient motivation to to get into packaging. But I have some of my own packages that would get you to like a minimal package, too. So, I think a question that I ask myself this a lot is like, what can I do without? So, I can provide maybe a link. I'm going to put the recipes in the chat as well because I think that that's maybe a good place to start. Yeah, yeah. So, yeah, I think like kind of a good sequence of events is work on the extension mechanism itself and see if if, I don't know, come to the group maybe and and see if that doesn't already exist, you know, ask in the discussion, see if it's worthwhile to package.

Q&A: HTTP authentication with Posit Connect

All right. I have a question for Richard. It is, okay, it says, not directly related, but do you have any insights about handling HTTP authentication with apps hosted in PositConnect or authentication with PositConnect in general? For example, apps that require calls to internal only APIs.

Okay. So, in terms of what's available within PositConnect, so, in terms of just authentication, you get some things out of the box. That link I mentioned with the session object and getting the username in groups, that's part of that. In terms of calling other internal APIs, yeah, I'm not 100% sure here, but I think there was an update in PositConnect, because before you would get the connect API key for the publisher. But I think, and yeah, fact check me on that, please, is now there's an option where you could get the, like, an API key that would present itself as the user accessing your app. So, you could sort of forward it over to your internal IP that's also hosted on connect and that way access it. But yeah, that one might need a fact check.

Q&A: Ambiorix and JavaScript learning

Okay. Let's see. This one says, Kennedy, I've worked on huge and unwieldy Shiny apps as well with so many tabs. Are there any other challenges besides, oops, it just jumped. Besides needing API endpoints that Ambirox helped you solve, did using it help you learn more JavaScript?

That's a good one. Definitely, you can't use Ambirox and not have to learn JavaScript. So, yes, I had to learn more JavaScript. The other thing it taught me, I think for me, at that time, the multi pages and the APIs were the big things. That was like my immediate problem. So, at that time, that was my, that was all my focus. Yeah.

Q&A: Ambiorix plugins

I love it. And I wanted to pull John in while we are in the John and Kennedy zone. Because one of the questions in Slido actually matches something that was just asked in Discord. And it says, I think you mentioned plugins for Ambiorix. What plugins are there? What kinds of extra things do they let you do with Ambiorix? That's a good question. Yes, I can pull that up. We do have a series of plugins. So, they're essentially middleware. Most of them middle, what we call middleware, which is something that you can discover when you, if you look up a bit about Ambiorix and this type of framework. But essentially, you can customize the logger. We have things that will, for instance, minify the HTML that you send back as a response, so that you make it smaller, loads faster. You can use different engines if you want for the rendering of your HTML. We have, they're not known in the R ecosystem, but elsewhere, there are Jade and Pug, things like that. We have other things for security, CSRF tokens, these sort of things that you can, basically, they work like plugins because you install it, you just do library, and then use something, one function call, and it's just integrated into your app.

Q&A: feature flags as security

Connor had asked a great question for Richard. It is, can shiny feature flags be used as a security feature, or can users get to the obscured UI element by inspecting the web page in the browser or editing the JS, which I'm assuming is JavaScript?

It could be part of the solution. So, as I mentioned, the feature flag is just a fancy name for an if statement because feature flags doesn't provide you with, because I assume you're asking about user access and that the user can only access a specific part of the app. Then that needs to be handled, for example, by PositConnect. And in terms of if they will be able to find things in the UI, here you need to, it depends how you write your shiny code. Because if you, for example, in the server side, you can check if that user is the user to whom you want to show your content and insert the UI or within the render UI statement, you can check who they are and either render nothing or what they're supposed to see if they are a person that can access that. So, part of the, could be part of the solution. And you need to structure your code so it works the way you want it to.

Q&A: LLM performance on disease classification

Dylan, Noor asked a question and I will put it in the chat as well. It was for the disease classification that you had in your presentation. How unbalanced was the data between disease types? Influenza-esque viruses have greater representation versus malaria, etc. Secondly, do you think that impacted the performance? Yeah, this is a good question. This was just a test data set, so it's not necessarily representing like the frequency of diseases in the real world. So, I think all 24 diseases had something like 200 data points for each. So, it was balanced, just an example data set to test for machine learning models that are doing kind of natural language processing algorithms. So, kind of a perfect fit for the LLM to also take its stab at, which it did not do very good at. So, I would not use that, but it's good to know.

Q&A: ggplot2 extension challenges

Gina, I see a question, an insider question. Do you know if there'll be a ggplot3? I've only heard that mentioned once. I don't know anything.

But the question I wanted to ask you is, what are some of the recurring themes around challenges that ggplot extenders face? This person said, I've been thinking about making an extension package, but I'm nervous. I think that because the like user API is sometimes so nice, then we don't kind of know the building blocks beneath the surface. Like those aren't super exposed to us. So, I think just like getting some exposure to those through resources like the easy recipes and then the extension vignette and extension chapter is probably like a good starting point. So, you're just going like one level down.

Q&A: front-end frameworks for Ambiorix

Alrighty, we have a question from Umair. Hey, Umair, how's it going? It's for Kennedy. What front-end frameworks do you recommend for our users who are new to Ambiorix?

Okay, that's a good one. So, most of our users will be coming from Shiny. So, definitely for CSS frameworks, Bootstrap would be a really good starting point. You won't feel the difficulty, like it will just be like using Shiny only that you have to build the action buttons and the text inputs yourself. But it will feel more like a good starting point. So, I think that's a good starting point. So, another container would be Tailwind CSS, which is really good. As far as the CSS frameworks are concerned, Tailwind is really, really good. For JavaScript frameworks, I personally go for HTMX. It's not a recent framework. It's been there for a while. It's just that we've known it recently. But yeah, that's what I go for. It will feel a lot like Shiny in that it's like saying, when the page loads, I want you to make this request. So, same kind of way you use React, you use Reactives in Shiny, but now with HTMX. Yeah.

So, those are my recommendations. John? Yeah, sorry. It's because Shiny comes bundled with the front end, essentially. That's why it's so easy to get started. But then you're tied with Bootstrap. So, one of the advantages of MBRX is that you can choose whatever framework you want. But the problem with asking this question to Kennedy is that he loves Bootstrap. So, he uses Bootstrap anyways. But yeah, that's, I think, an advantage. And as Kennedy put it in his slide, also an inconvenience you can think of. There's no just text input that will work out of the box or action button that will work out of the box. There's a bit more work, but at least you can make the application a tad more unique, I would argue.

Q&A: prompt iteration for LLMs

Dylan, a question for you. Does it take a lot of iterating before you get a prompt that will correctly identify things? How much of the analysis do you think is the prompt figuring out part? I think it's a lot. I think iterating is really good, and you can see that creating this presentation, I did a lot of iterating to find out what the limits of the LLMs are. Like, you can see Simon doing that openly on his blog as well. I think what I'm learning, though, is you can reduce the amount of iteration by starting small, like using simple prompting that's really powerful, that's using, like, really trying to identify what you need. A lot of times, the LLM just kind of overrides it with its own prompting and desires. So, really, if you can focus in on, like, what are the edge cases that you need to handle, what are some of, maybe, the tricky parts of the task you're asking to help kind of handhold, I think, focus on that. So, look at where the limitations are, focus on the edge cases, prompt around those, but don't try to tell it, like, you are Bob the Builder building this website. Like, just get rid of that. Yeah.

Q&A: favorite ggplot2 extension

Okay. Gina, I have a question here that says, what is your personal favorite ggplot2 extension package, or just one that you like, or favorite creation story behind one? It says, do most people say they make an extension package because they couldn't get ggplot2 to do what they wanted? Like, since you've heard so many creation stories and motivation.

I mean, ggbump always comes to mind, just because I think the focus is so narrow. And, yeah, that extender, he doesn't have, like, a huge library of extension stuff, but it is, like, I always see people using it. It's, like, so useful for the world. So, it's, like, kind of an inspiring reference. I mean, I just think, yeah, all the stories are great. I feel like every meeting, I'm like, this is my favorite meeting so far.

All my children are my favorite. Yeah. Mike just put ggbump into the chat. And I will say, I co-hosted VisBuzz in June. And one of the plots that had to be recreated in VisBuzz was a bump chart. And everybody who was not using R had a hard time, because ggbump does a really unique thing. Like, bump charts are not a standard thing that you can make.

Q&A: Ambiorix and ExpressJS inspiration

John, there was a question for you that was, why ExpressJS? Could Amberix be adapted to use other frameworks? Curious what you liked about it.

I try and be as clear as I can, but that seems to be a misunderstanding that comes up very often. It's not built on ExpressJS. The syntax is inspired by it. And if you take the Hello World, which I think show it on the website. If you take the Hello World for Amberix, and the Hello World for ExpressJS, although it's a JavaScript framework made for Node.js, the syntax is virtually the same. You replace the dollar signs in R with dots, and you've got the same application almost. And so essentially, one day I was trying to write an application, I think I ended up on ExpressJS, which at the time was extremely popular. And I don't know, I don't use it today, but the syntax clicked. The way you compose your application with ExpressJS clicked with me, and I thought I'd try to do something like that.

Q&A: automated testing and edge cases

All right. I've got a question for Rishard, and it is, there we go. What types of automated tests or validation checks have you found most useful or most effective in catching those messier edge cases issues?

So if it's about messier edge cases that are hard to figure out, I would say I do remember a case where I was working on an app, and we knew what features are critical. So those need to work all the time, because otherwise the app makes no sense. And think of an e-commerce website, like the critical path is put something in the basket, go to checkout and buy it. So try to find that path for your app and set up an end-to-end test, for example, with ShinyTest2 . And this might uncover some edge cases you might not anticipate. Like I do remember a case where I updated one package that was a dependency of a dependency, so I didn't even use it directly, but the test caught it because it broke part of this critical path. And the other is, yeah, use the deployment strategies because that's where you'll find those unknown unknowns. Of course, like other strategies that traditional ones also work, like ask for code review, like your teammate might figure out there's an edge case. Or I also heard about cases where someone would take production traffic and would simulate it again on a development version, and that way that can also be used. Or I heard about people generating random inputs. And I think the term is monkey testing, so you can also apply that strategy to try to find those messier edge cases.

Q&A: LLM feature extraction for image classification

Awesome. Thank you. It's always hard. Dylan, there was a question for you that was, you said you used the LLM to extract features to tell it what features to use. Can you describe that? Do you give the LLM the images and ask it to determine which features are most characteristic of visual differences? Then do a whole new prompt cycle using those features in your prompts?

You can do that. I don't know if it helps or not, putting the features you extract as a new prompt. The reason I showed it, and I think it can be useful to look at, is because it gives you a glimpse into kind of like how the model is thinking about the task. Also, in this case, I wanted to know, can it pick up on characteristics of the flower that we would have in the iris data set? It can to some degree, but there's so many subtleties, like relative size, and there's really small unit differences in the size of the flowers or proportions. It really can't pick up on those. It's like, this is a purple flower and has short leaves and is kind of smallish. It'll focus on those things. Use the features as a glimpse into the LLM, kind of like you would just looking at your data set and what features it has that you're going to use to predict some data.

Q&A: most underrated ggplot2 extension

Gina, I have another one that is maybe a favorite question, but not really. It's, what's the most underrated ggplot extension? Also, part two of that question, what's the best extension you've seen for scientists?

Yeah, I preface my talk saying I wasn't going to talk about the universe of extension. I mean, I would like really to hear from this ggsignif, I think, package. I think that that seems like kind of a simple idea, but really, it like adds, it compares groups and says if there's significance between them. I think that that looks really well done and it seems like really relevant to scientists can try to find them.

I will say my vote for underrated gghighlight is so good. I love it. Go play with it if you haven't. If you want a really easy way to be like, I want to highlight this one thing in my ggplot, gghighlight is great.

Q&A: LLM safety and accuracy in day-to-day work

Dylan, do you think it's safe to use the LLM you demonstrated in our day-to-day work or would you recommend another way of integrating LLMs that ensures better accuracy?

I guess it depends what your day-to-day work is. I think as a programmer, it's super helpful in starting ideas or like completing code when I know what I want but I don't want to have to search like the dplyr docs or the ggplot docs to write code. I think that's something we can all relate to if we are writing code in R. And I think for like this topic of classification, I think it's one that I would tread very carefully. I think it's cool because like Elmer is extending these chat models to be able to, you know, do data manipulation on large data sets and to replicate things like that machine learning models that would be difficult to create can do. So, I think it opens up a door there and I just kind of wanted to show that it opens up that door and it can be good at it and there might be useful things we learn from it. For example, like when I compared it to another machine learning model that's being used in production and I realized, oh, there are some blind spots in this model. This model has some biases and the LLM actually highlighted some of those. So, I think in that situation, it's like actually could be a huge like benefit. So, it really depends on the use case and I think it's just something we all have to kind of explore and find out what works. And I just say be careful, look closely, you know, iterate and ask like the hard questions and look at your data.

Q&A: R Consortium ISC grant for Ambiorix

Okay, we have a question that is either for John or Kennedy depending on who knows about this topic. It's from Mike K. Smith and it was, was Ambiorex funded via an R Consortium ISC grant? If so, how was that process and would you recommend that to other people?

I think it was funded by an R Consortium grant. John or Kennedy, who knows more about this topic? I'll start and I'll let Kennedy close. Essentially, I created the package, I call it 2020-2021. There was no funding up until we've had a very small amount of funding for Kennedy. I can't remember, late last year or something like that. I'll let Kennedy answer that. Yeah, they gave us funding last year. Well, I'm supposed to work on it this year. We've been working on it. So the process, it wasn't that difficult. The only difficult bit is you have to explain why this package matters. And of course, it has to have like a wide effect on the R community. And the second thing was like stating how you are going to achieve what you said. Like, it's like saying, okay, so this is what I want to make or to improve in Ambiorex. And then you have now to say, how are you going to do that? And what are the timelines? So it wasn't really difficult per se.

Q&A: testing LLM categorizations without ground truth

Okay, we have a question for Dylan. Dylan, everybody really needs to know about the LLM stuff. One of them was, wait, we got to find it. Okay, there it is from Sivani. Do you have any advice for testing these LLM categorizations? I'm thinking of use cases where there are no good data sets with existing categorizations or when our own classifications may be inconsistent.

This, I think, gets to kind of a crux of using LLMs for classification. But what I go to is I go back to like the gold standard of like testing for classification, which is that you should be testing on new data to see the, you know, the correct, like how the proportion of correct classifications to the incorrect ones. And when that's not possible, I think LLMs do allow for some flexibility, right? So we can use like uncertainty scores, looking at the, asking for the reasoning or the features it's looking at to understand what it's thinking. But when we have a lot of data, that's hard. So you might like sort through and just look at really all the cases where the LLM is uncertain, you know, maybe run some stats on that to see the mean uncertainty, the deviation, to understand like how difficult the LLM thought this task was, run that against our own expectations, you know, review some of those difficult to classify responses. Another technique you can use is you can use multiple LLMs as raters or different classifiers and try it that way to see if there are any disagreements or divergences in the different LLM models. I'd also look to see if there are existing machine learning models that are intended for your specific data, so they're trained on data just like yours. Now those might not be perfect either, but now you have multiple tools in your toolkit that you can start to use to converge on what a true response might mean if you don't feel like you can go through and make that determination for every single data point. Yeah, that's rough. Sometimes things are not mutually exclusive either.

Another technique you can use is you can use multiple LLMs as raters or different classifiers and try it that way to see if there are any disagreements or divergences in the different LLM models.

All right, thank you so much for that question and thanks Dylan for answering. And now we are two minutes from the top of the hour, so I will say it's time to wrap up. I will go ahead and I did see, Martine, you had asked about the links in Discord as well. I will put them all back in Discord. No worries. Give me a moment to do that, but I wanted to say thank you to everybody. Thanks Rachel, thanks all of our speakers. You did such an amazing job today and I hope that you join us for virtual trivia, which happens in half an hour everybody.

Thanks everybody. Looking forward to chatting with everyone on the Discord too, and I'm

Virtual Day Q & A Hangout with Virtual Speakers | posit::conf(2025)

Transcript#

Speaker introductions

Q&A: deployment strategies

Q&A: Ambiorix front-end and back-end

Q&A: LLM certainty scores

Q&A: ggplot2 interactivity and extensions

Q&A: blue-green deployment and user sessions

Q&A: Ambiorix mindset shift from Shiny

Q&A: LLM model selection for classification

Q&A: ggplot2 extension motivation

Q&A: blue-green deployment for A/B testing

Q&A: deploying Ambiorix

Q&A: getting started with LLMs in R

Q&A: starting with ggplot2 extension packages

Q&A: HTTP authentication with Posit Connect

Q&A: Ambiorix and JavaScript learning

Q&A: Ambiorix plugins

Q&A: feature flags as security

Q&A: LLM performance on disease classification

Q&A: ggplot2 extension challenges

Q&A: front-end frameworks for Ambiorix

Q&A: prompt iteration for LLMs

Q&A: favorite ggplot2 extension

Q&A: Ambiorix and ExpressJS inspiration

Q&A: automated testing and edge cases

Q&A: LLM feature extraction for image classification

Q&A: most underrated ggplot2 extension

Q&A: LLM safety and accuracy in day-to-day work

Q&A: R Consortium ISC grant for Ambiorix

Q&A: testing LLM categorizations without ground truth