Observability at scale (Barret Schloerke, Posit) | posit::conf(2025)

Transcript#

This transcript was generated automatically and may contain errors.

Hi everybody. My name is Barret Schloerke . I like the wave. Thank you. There's a QR link to the slides that will also be at the very last slide if you miss it now.

So today I want to start off with a code demo. It's going to be a small chat app. Nothing new. Nothing exciting. But we'll see what happens.

So let's look at this app. It is a small chat app that has a single tool call available to it that can ask for the weather. Can I have a city to ask for the current weather? I'm going to go with that one. So what is the weather in Chicago? And then it's going to think about it. Ask the tool. It's going to run it.

And then it's going to think about it. Ask the tool. It's going to respond. And then it's going to summarize that tool response. And for the purposes of the demo, it's going to shut down the session. That will come in later.

But nothing surprising. I mean, if we want, we can kind of sneak what the tool output was. But we can see that it asked the available tool call to ask about Chicago. And then wrote up a summary of that information.

Nothing too fancy. But let's actually look what happens underneath the hood. So we'll rerun that. And we'll say, what is the weather in Chicago?

Exploring the flame graph

Let's check out the chart on the left. So we have a simple session start. And then we have some reactive updating. And this is like the whole crux of the talk today. In that this little reactive update is going to spawn Claude to do some communication. That is going to make a post request. And then it's going to stream back some results if they're there.

This then prompted to have us go into executing a tool call. Which I just happened to do through Mirai . To go through a background R process. That itself said, hey, I'm going to go get the weather forecast. And it happened to do a hit or two requests. And then finally, I came back with Claude and with the post to say here's the response. And then we're streaming back results as Claude does within the website.

This is awesome. I now have an 8 second demo and I can point fingers as to who took the most time. In this case, while it is about 7 seconds here, 3 seconds was first Claude. Then the tool was here and it took about 650 milliseconds. Half a second roughly. And then finally Claude's response took another roughly 3 seconds. So my tool call is of the big picture, not the big time sync. But there are little locations where we can improve what's happening.

I now have an 8 second demo and I can point fingers as to who took the most time.

Logs are wonderful if you want to send, you own the server, you only have one server, and you're writing to a single location, sure, just use logs, but when you start having multiple machines that may or may not exist in five minutes, like having OpenTelemetry collect to a third-party site, it makes it that much easier.

Alright, cool. Last question, does it also expose cat print message? Oh, no, sorry, does it override the cat function, the message function, print function, str, any of those? It does not. Because you have to opt in to the different logging levels if you're doing that, versus spans, it is something that you must opt in to as a developer. So you'll have to change your item instead to say otel log info, or otel log error, instead of saying message or cat.

Alright, and I think your last slide, you were talking about it's not there for Shiny for Python yet, but is there a rough ETA when that might be up? No. Alright, yep, and then that's it. That's all our questions, thank you.

Observability at scale (Barret Schloerke, Posit) | posit::conf(2025)

Transcript#

Exploring the flame graph

What is OpenTelemetry?

Integrating OpenTelemetry

OpenTelemetry vs Profvis and ReactLog

Analyzing span performance

The 2025 approach to Shiny in production

Next steps and active development

Q&A

Featured software#

Shiny