
Joe Cheng - Shiny x AI
These days, you can’t turn around without encountering a large language model—they’re embedded in everything from Google search results to the lower-right corner of every Windows desktop. But… in your Shiny app? In this talk, we’ll discuss some ways the Shiny team is combining the magical chaos of LLMs with the structure and control of Shiny. You’ll learn how to use modern chat models to add features to your Shiny apps that will feel like science fiction to your users while minimizing the risks of hallucination, irreproducibility, and data exposure. Talk by Joe Cheng GitHub Repo: https://github.com/jcheng5/py-sidebot
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
So if you'll allow me to complain a little bit, if there's one question that I have hated answering, hated receiving, hated being asked over the last year, that question is this.
What is your AI strategy? And let me ask how many of you have been asked this question or some variation of this question by upper management, by your coworkers, maybe by your spouse, in my case, my 25 year retired father asked me this all the time.
What is your AI strategy? And because it's because I have a pretty conflicted set of feelings about AI. And I like to say I've gone through these five stages of AI grief, starting with denial that like artificial intelligence, no, these are just stochastic parrots, to anger like enough with this hype, like, can we just talk about anything else but AI these days, to bargaining, like, okay, fine, there's something that's interesting that's happening and these tools do have something to them, but like, do we have to get involved right now? Can we just sort of wait for the dust to settle a little bit to depression and feeling like, oh my gosh, these things are just gonna take over my profession, my livelihood, my kids' futures, everything's gonna be, you know, ruined to finally acceptance.
And this journey from depression to acceptance really, I think, came at the behest of our users, Shiny users who were starting to ask, like, I am trying to build things with these LLMs and the lack of sort of support from Shiny is starting to stand in my way.
So we have approached this problem with Shiny and AI from two different directions. First, what does Shiny have to offer AI researchers? So if you are someone who has an idea for an AI app, is Shiny a good framework for you to use? And secondly, what does AI have to offer Shiny app authors? So like a lot of us here at PositConf who already have Shiny apps or already know how to write Shiny apps, what interesting, useful, and responsible ways might there be to leverage LLMs to make our apps better?
What Shiny can do for AI builders
So let's start with this first one. What can Shiny do for people who are really into AI? And this whole talk is going to be very much like, these are not final answers. We are at the beginning of a very long exploration, but I wanted to share with you some interesting things we've already experienced.
So this is Tina Huang. She is a YouTuber that talks about data science. And lately, everything that she's talking about is AI. It's all her viewers want to hear. And we've been working with Tina, and Tina had a conversation with Curtis in marketing for Posit. And Tina said, hi, Curtis, I want to build a video chat assistant using GPT 4.0. This is right when GPT 4.0 was announced and OpenAI had this big splashy demo that you could FaceTime basically with GPT 4.0. So she was like, I want to recreate that in Shiny. Can we do that with Shiny? And he was like, oh yeah, totally. You can totally do that with Shiny. And she said, great. Can Winston Chang, Winston Chang right here from my team, can Winston join my live stream in nine days and live code one of those for me? And he was like, oh yeah, sure, no problem.
So a couple of problems with that. Number one, GPT 4.0, they did this demo, but they did not actually publish the API. So GPT 4.0 did not actually have a way to ingest video input at the time, and still does not several months later. Secondly, Shiny does not have a way to accept video input directly from the user. And third, Winston had just gotten on a plane to leave for PyCon.
So fortunately, this first problem, GPT 4.0, it turns out that within a few minutes, we'd figured out there's this Rube Goldbergian chain of commands that we could use to take a video and break it down into something that GPT 4.0 could understand. GPT 4.0 could give us text back, and we could turn that into speech. But that did not solve this problem of, how do you get video into this app, and how do you get this audio to play back?
So with nine days, I got to work. And about a week later, I created this input video clip component that is available for both R and for Python, where you could record a short video clip of yourself. And secondly, an audio spinner component that could play audio and then pulse in time with the signal.
So the morning of the live stream, I had this working. I had this application. And if you'll just allow me a quick moment, I would love to demo it for you.
So I'm going to need your help for this. And I'm going to need the sound on. So I'm going to point the webcam at you all. So maybe you guys can wave. And I can ask the video assistant, what does it look like I'm doing right now?
So it's going to send that input to GPT 4.0. And I'll get an answer. You're speaking to a large audience and maybe taking a selfie video. The crowd behind you seems engaged, waving, and smiling. Seems like a fun event.
So this is the morning of the live stream, eight hours before we go live. And her response to this was like, OK, that's great. What if it could query three different LLMs and let you compare which one did the best and plot which model is in the lead, which I had some feelings about, but never want to back down from a shiny challenge. I was like, sure, let's give it a try. And it turns out that it was not even really that hard. And three hours later, I was able to build this much more complicated version of the app that you record a prompt in a similar way, and it simultaneously does three queries. And then you can play them each at a time and then vote for which one you like the best. So less than three hours to do that.
And what I learned from that experience is that this interactivity in the small, making these small, very polished, interactive components is hard. And it does require a lot of web development expertise. So it took me the better part of this week and many years of web development experience to create really good versions of these components. But once those components exist, this interactivity in the large, how we connect these components in a reactive way, is what Shiny has always been designed to do quickly and iteratively. So building apps, different kinds of apps around these components is super fast and easy.
And the other thing we learned is that sometimes you need a Tina. Sometimes you need someone who is not aware of the limitations of the tools to make an audacious ask of you. And some of our best work happens when we have someone making those kinds of audacious asks.
Sometimes you need someone who is not aware of the limitations of the tools to make an audacious ask of you. And some of our best work happens when we have someone making those kinds of audacious asks.
What AI can do for Shiny apps
But the more interesting conversation, I think, is around this today. What can AI do for Shiny? And again, in the spirit of humility, I'm just sharing what we have discovered so far. But I'm sure we will have a lot of different new things are just around the corner besides this. But I'm really excited to share with you today.
Actually, the genesis for this demo was a conversation with Hadley. He said, have you thought at all about how you might combine Shiny plus LLMs to create dashboards that give the viewer some ability to do independent analysis?
So we took this dashboard that is one of our most basic templates for Shiny from Python. And Carson Sievert here had just created a chat component that just shipped with Shiny for Python 1.0 a couple of weeks ago. So of course, what do we do? We put a chat bot on it.
I'm going to do this live. Screw it. So can you guys see that OK in the back?
So we have replaced the sidebar with this chat bot. And by the way, I've made the text very big for the purposes of this demo. It's not all out of proportion if you use it for real. So this is just for today. So I can ask it simple filtering and sorting things. So I can say, show only Sunday dinners.
So this is showing restaurant tipping data. So I ask it to show only dinners, and it says, great. I filtered the dashboard to show only Sunday dinners. And all the plots and things on the right have updated. But the key thing here is that it's showing me how it did that with SQL.
And in fact, the way this dashboard works, this chat bot does not have access to the data. In fact, all it has access to is the data schema. And it knows that this is running DuckDB. So if you're going to write SQL, you should write it in the DuckDB dialect. So I don't have to worry about all of these visualizations being hallucinated or inaccurate. The only power that this chat bot has over this dashboard is what is the SQL query that it wrote.
So this is a pretty simple query. So I can read it and ensure that it did not hallucinate anything. Yep, that seems fine. So let me change that now and say, actually, invert that filter. So do the opposite. Instead of only Sunday dinners, it's everything but Sunday dinners.
Now, that first example of only show Sunday dinners, that's a pretty simple thing to do with Shiny inputs, right? You check Sunday, check dinners, and they're ANDed together. But this, I mean, it's subtle. But saying the inverse of something is actually much harder to do with your traditional, not just Shiny, but any BI dashboard. It's a more complicated thing. And let's take it a step further by saying, also filter out smokers.
And now we've ended up with something that is like a more complicated Boolean expression. And there are very few Shiny apps, indeed, that bother to do Boolean combinations of query parameters.
So this particular app, let me go ahead and reset those filters. This particular app also has the ability to not only filter, but also to answer data questions. So I can say, this dashboard is showing the average tip and the average bill. But it does not show, for example, what is the average party size.
So again, it uses SQL. It's the only access that it has is writing SQL. And for the purposes of answering questions, I do permit it to receive the results of that query. So it said the average party size is approximately 2.57. So let me ask a more interesting question, maybe a more controversial question. Who tips more, male or female?
OK, to my surprise, it says males tip more, 3.09 versus 2.83. And there's a SQL query. Does this, I think, strike anyone about this answer and the way it came up with the answer?
We're talking about tipping data. And to me, that means what percent did you tip? But this has answered in terms of absolute numbers. So if I ask it, like, wait, is that absolute or relative to total bill?
And it says, oh, that was absolute. Let's go ahead and do the other way. And it flips. So females actually tip more as a percentage. And I can take this further and say, like, break that down by day and time.
And now I get a whole table of different percentages.
Now, this is all stuff that I expected this to be able to do. And one interesting thing about working with LLMs is how often we find that the thing that we just built does things that we did not expect it to be able to do. So the first time I was demoing this for someone, I said, well, it's only SQL. So there's lots that it can't do. Like, for example, I can't say, like, filter out outliers on total bill.
It's going to take IQR times 1.5. And it's going to filter out the outliers, apparently. And it filtered out nine outliers. I guess SQL can do that.
But maybe that's not how I want it to find outliers. I want it to say, like, no, use standard div times 3 as the definition. It's like, OK, sure, great. We'll do that instead. And not only can we do this filtering out of outliers, but we can say, flip that. Show the outliers only.
And we can see that using this standard deviation times 3, these are the four data points that got excluded.
Now, one more feature that I want to show you before moving on. So we are still showing this data with outliers excluded. One other feature that we added is the ability to explain plots. So these are both Plotly plots, but this also works with ggplot2 or baseplots or matplotlib if you're in Python. We added this button to explain what you're seeing here. And the reason is because sometimes what's the right visualization to people, like the people at this conference, might not be the most understandable thing to the decision makers that these dashboards are intended for.
So to remove any possibility of people being stuck not knowing what they're looking at, we can add this button where it takes a higher resolution screenshot of that visualization and then asks the chatbot to explain what it sees, what observations it can make. In this case, it's noticing that there's a positive correlation, that there are different things that happen below $10 and above $30. And by the way, it also notices that this particular visualization is based on outliers already being excluded. So it is aware of the manipulations that we've been doing in the sidebar. And you can ask follow-up questions as well down below.
Lessons learned
So what did we learn? What did we learn from this experience, from this experiment? And the biggest takeaway for me was the power of domain-specific languages. Using SQL as this sort of intermediary between the chatbot, the human, and the data app worked out fantastically, because SQL is pretty easy for LLMs to write these days, at least the really frontier models. And it's relatively easy for humans to read and reason about. I mean, I don't do a lot of SQL. And I write Python and R code every single day. But I would rather be debugging those SQL statements than debugging a bunch of pandas code, or well, dplyr is pretty good. But in general, it just constrains what's possible to express in a way that makes it easy to read and reason about.
And it's super powerful and flexible. Like, with just SQL, we were able to do all those things that you saw in this demo. And compared to executing straight Python or R that's being written by an LLM based on a prompt from a possibly malicious user and then executed on my server, yeah, I'd rather stick to SQL.
And this experiment was using SQL. But I'm really excited to experiment further with maybe we can use Vega-Lite as a DSL for creating visualizations, or use CSS Grid as a DSL for doing different kinds of dashboard layouts. So you could ask questions in the future like, please draw me a scatterplot. And instead of getting a bunch of Python or R, you get, again, something that's easy for you to read and reason about.
And more generally, I was really astounded at what I learned once I started coding against these tools. So I'm convinced that if your only experience with LLMs has been as a consumer of Copilot and ChatGPT, that you are actually not well equipped to have an informed opinion about what these things can do. So a lot of people that I've talked to have been very skeptical of these tools based on their experiences. And let me tell you that if that's you, I really suggest you start coding and see what these things can do when you call them with code.
Secondly, I think it is really time for a lot of us who have been skeptical in the past to revise our mental model of what these LLMs are and what they can do. I mean, this stochastic parrot mental model, it is true, it is factually true that it's an accurate description of them. It's a very reassuring view of the world that these apparently smart things are just stochastic parrots. It is an incredibly unhelpful mental model if you're thinking about what these things are and are not capable of. Whereas thinking of them as machines that reason, I mean, that is factually false. It leads us to a fairly terrifying future. And also it is a really helpful mental model when you're thinking about what the potential of these things are.
It is an incredibly unhelpful mental model if you're thinking about what these things are and are not capable of. Whereas thinking of them as machines that reason, I mean, that is factually false. It leads us to a fairly terrifying future. And also it is a really helpful mental model when you're thinking about what the potential of these things are.
Finally, I really want to encourage everybody who's hearing this, stay skeptical. That is one of our superpowers is critical thinking, stay skeptical of all the hype, but do not let that skepticism get in the way of being curious about what the potential of these things are, not just in the obvious use cases, but what non-obvious use cases might there be that are right under our noses that can really make life better for ourselves and for our users.
And if you can do those three things, if you can stop just being a consumer and start coding, if you can update your mental models of what these things might be capable of, and if you can let yourself be curious about what the potential is here, then a year from now, when I see you at PositConf 2025, when I ask you this question, you'll be ready to answer, what is your AI strategy?


