Posit Meetup | Hlynur Hallgrímsson, City of Reykjavík | R in Public Sector

Transcript#

This transcript was generated automatically and may contain errors.

Yeah like Rachel said, I'm Hlynur Hallgrímsson. I'm a senior data scientist within the Office of Data Services, which is the centralized data science team here in the City of Reykjavík. I used to be an economist, still I'm a chartered economist, but I've been working as a data scientist since 2016 and a data analyst before that. I joined the City of Reykjavík data science team in March of 2020, which was a short welcome because the second week of my employment we all got sent home due to COVID. So it's been like a rather remote back and forth, back and forth between remote and just in the office since then.

But the data science team is new. It was created in 2019, the end of 2019, and we're a small team. So there's only six of us and although sometimes we do like data products from start to finish, other times we are essentially brought in to help with projects in other departments within the City of Reykjavík. But essentially our clients are other departments within the city.

Swimming pools in Reykjavík

So I'm going to talk to you about swimming pools and a fun data science project that has to do with our swimming pools. So Reykjavík accounts for a lot of, a large share rather, of Iceland's population or about 36%. So the population of Iceland is 375,000 people, but in Reykjavík there are 135,000. And if you count the whole capital region, like the suburbs, which are other municipalities, they are actually like 64% of all Icelanders, like the capital region. And we're pretty crazy about our swimming pools.

So this is going to be about a project that is trying to show people how crowded each swimming pool is. But it's also just about how these are hard in data science. And it's not like that, you know, training the most perfect machine learning model is hard. It's just like the project logistics and what we've come to call group intercommunication. That's, you know, that's difficult sometimes.

But so we have geothermal resources in the form of hot water. So we pump the water from the ground at 80 degrees Celsius. So the hot water in Iceland is like incredibly cheap, which is like one of the reasons why we have these pools. They're actually, you know, reasonably, they're not like this super fancy thing. It's just everybody goes to a swimming pool. And also throughout the 20th century, one of the five people in Iceland was on average working in fishing. So like a big part of, you know, just our coming of age story as people is going to mandatory swimming classes through first to 10th grade. And like, yeah, I noted that it's roughly 400 hours of mandatory swimming, which, of course, is like the worst version of everything. It's the mandatory version. And also, it's just cold and dark. So we like to get what little sunlight we can in the hot tubs. And we also go to gossip and talk politics and just socialize.

The project: showing pool crowdedness

So the idea behind this project is telling people, being able to convey to people which swimming pools are crowded and which ones are not. So like since the swimming pools are really popular, people have been contacting us and asking if there's any way to see, like, is there a website where I can see like how crowded these are? And the welfare workers have also expressed that interest for clients that are like susceptible to sensory overload. So that will be also a cool feature once we implement this. But like a big thing was that a neighboring town, it's like all about municipality competition. They now offer a website dashboard which shows you how crowded their two swimming pools are. But in their case, they have staff manually count the patrons, as far as I'm aware. But also during COVID, swimming pool availability was cut to 50% of normal capacity for certain periods. So when that became clear, we realized, OK, this is something that we have to do, not just, you know, to. It wasn't like it went from being like a nice to have to something that we wanted to work on.

So this is something that comes from the Department of Sports and Leisure within the city of Reykjavik. And this is their pitch, but it's not a direct quote. So they say, OK, so we've updated the gates or the turnstiles at our swimming pools to modern electronic turnstiles. With these modern gates, we can now get information on how many people are at each swimming pool through an API, an application programming interface. And their question is, can you help us present this data showing how crowded each pool is?

So this is essentially, at that point, this is just like, honestly, just like we figured it's a simple Shiny app and just a question of database. How are we going to present this? If like if you can just access the data that tells us the number of current visitors for each pool. So like I said, this is a project that is spearheaded by the Department of Sports and Leisure. They come to the centralized data science team to ask for our help. But with regards to getting the data, this is, of course, a little bit more complex because the Department of Sports and Leisure has to deal with the Icelandic partner to the manufacturer of the gates. And they are our contact partner for the actual vendor, the gate manufacturer.

And this is my train of thought. But still, it should be pretty straightforward once we get the data right. And the clown face there is put for emphasis.

So at the start of the project, I envision this as a rather simple project. You read the data from the API using the HTTR package. Of course, we're going to be doing some data cleaning using the tidy version per. So we'll visualize with ggplot, make that visualization interactive with ggraft probably, and then make that into a reactive Shiny app and deploy it to either Shinyapps .io or RStudio Connect. And this, of course, is a tragic miscalculation on my part because I'm assuming that, OK, the data might take some time because of this communication. It's hard to make another team in a company twice removed from you care about your product as much as you do, essentially. So they are not going to be with the internal fire that we have for this project at the data science team.

It's hard to make another team in a company twice removed from you care about your product as much as you do, essentially.

Damn you kindness, you ruined my data.

But, so, the things we're trying, of course, the first thing we wanted to do was just ask the vendor to make changes to the API, to separate the in data from the out data for Laugardalslaugar, we're still waiting on this, but in the meantime, we have, you know, because we really want to do the, we really want to get this thing out, we've been working on it for a long time, so one idea was to use a secondary, like, another counter that is located by the actual pools in Laugardalslaugar, not by the entrance, but this has proven unfruitful because it's just a traffic counter, it doesn't say that people are going in this direction or in this direction, so it's just, we can't tell you how many people are going into the pool and how many people are going out of the pool. And then, this is like, pardon my English, or rather, pardon my French for what I'm about to say, but this, like, the next thing feels like something you read on the Twitter account, Internet of Shit, which is like an internet of things, like, where it roasts these smart solutions, but there are smart lockers in Laugardalslaugar, but, like, the internet of shit part is where it's unclear if the firmware can be updated to the most recent version to support live data, so for the time being, we only have historical data for the smart lockers, which I believe gets read into a database every night when the pools are closed.

So, essentially, we tried a couple of things that didn't work, we're waiting on the vendor to make changes to the API, and I'm sure it's going to be, you know, oh, let our lawyers talk to your lawyers about this fee, it'll end up some, I'm jaded, terribly sorry, but also, before I end, like, this here was a gross oversimplification where we're only talking to, like, the Icelandic partner to the vendor, there are also gyms that have access to the pools, so you can go to a gym that's, like, located next to a pool, and they have separate counters into the pool, so after you've done your workout at the gym, you can actually walk through some tunnel into the pool area, and we have yet to figure out how to count those, because that's, we deal with Department of Sports and Leisure, and Department of Sports and Leisure deals with the owners of the gym, and the owners of the gym talk to some Icelandic partner to a vendor, which talk to, you know, an international vendor, so it's, like, a more complex, like, pipeline, essentially, pipeline of information.

So, still to do, we need to account for, like, gym patrons that use the swimming pools at Laugardalslaugar and Breiðholtslaugar, and, like, the final point of this presentation is, of course, there's going to be something else, but we don't know it yet, so I'd love to take questions on this. I'm sure, I, yeah, we have 20 minutes, and I'm sure there are some things that I did not explain properly, so all ears. Thank you so much, Leonard. That was awesome, and I'm just thinking, if one time here, it would be so cool to be able to just go to a pool right now.

Q&A

So, there's a lot of great questions that came in on Slido, and so, just so everyone knows, and Josiah can help me share the link again, you can ask questions there. If you want to put your name in, I could also call on you to, like, add additional context, too, but I'll start with some of the anonymous ones, and one was, and quite a few people commented that they really liked that first visualization you shared, and they said, could you kindly share the R syntax of the first visualization that showed your email tracking, the day we requested, the day we received?

Okay, yeah, so this is, this presentation is, like, an R markdown presentation, so the timeline, it's essentially three ggplot graphs, and I can share this, I should have added my GitHub account, but I'll share this, like, this is, this, oh, I'm terribly sorry, I'm not sharing the screen, I'm sharing the, I'm sharing the Chrome browser. I can see the presentation screen, okay.

Okay, I'm gonna, I'm gonna share the secondary screen, so I can, like, where's the share screen? Okay, hopefully, and I can share your, you can see my, oh, now I can see the code, okay. Okay, so this is, like, this is just the data, and it's a, like, this is the first ggplot, and it's not just a mess, but I didn't think I'd be sharing this code, but I'd be happy to, like, data realism, so, yeah, I'll put that on my GitHub, and Rachel, you can share that. Perfect. Awesome. Thank you. Someone else asked, is there a live link to the swimming pool dashboard currently?

Not that it's accessible outside, like, the City of Reykjavik team, because we haven't, because of our problems with Laugardalslaugar, currently, there are two apps. One app is for the managers of the swimming pools, and the other one is, like, the in-progress, because I hand, like, I make the plain-looking Shiny app, but then we have Thor, who's a data scientist, a great data scientist there at the City of Reykjavik, who has, like, this artistic eye, and he makes, like, all the pretty production Shiny apps, so that one is hopefully, you know, soon to be released, but at the moment, no, it's not accessible.

One is, did this project initiate any improvements in user tracking, or is everybody just content with the way you patched the problem and things will stay the same? I think you maybe mentioned at the end about the API possibly changing. Yeah, but another thing is that, like, there have been, like, discussions between Sports and Leisure and the managers of the actual swimming pools with regards to change, like, it's perfectly acceptable for 40 school children to not be counted out, but, like, for the later, like, if there's two full-grown adults waiting in queue, like, you can, they're essentially saying, you know, we'd appreciate if you stop the kindness just a little bit, and, you know, let these people be counted out. So, that's something that we haven't seen in the data yet, but provided that, it provided that we fixed the problem, that we fixed things with, like, once we get the gym data, and then we can say, like, can we look at this if they were to, like, be more, or rather less kind, let's just say less kind, now they count people out.

Another upvoted question was, could you compare your use of Plotly and ggiraph? Did you use both or prefer one over the other? I actually, I started out using Plotly because I've been using that for a long time, but then Sol, who's this, you know, artistic data scientist that I mentioned earlier, he pointed me in the direction of ggiraph, and I much prefer that now, because, like, you just create a ggplot, and then you wrap it in the ggiraph, and it instantly becomes, like, an interactive visualization, provided you change the, like, the geom part. So, I like the ggplot, like, flow to things, and this is, like, a perfect extension to that.

I see Gregory, or Gregor, you have a question. Yes. Can you hear me well? Yes, I can. Yeah. Okay. Thank you, Eleanor, for the great presentation. Actually, I'm an economist, too, so that's nice to know that I'm not the only one interested in data science, and I wanted to know how much the city or the public entity is interested in all what you can provide as data analyst or economist, or, I mean, are there many projects are willing to put you in, or it's, like, still growing, as you mentioned, because it's new to them?

Awesome question. Thank you. It's actually just, like, skyrocketing. So, like I said, the data team is relatively new, like, years and 2019, but now, like, at the beginning, there were, we didn't have these, like, fixed protocols in place, how we did these projects, but now, just with a little experience, we've been, like, moved into the IT departments framework for, like, this is the way we accept projects, and this is how we allocate time to them, and, like, since that, it's become, like, it's been made clear to the departments within the city of Reykjavik, like, here's a team of data scientists, if you have an interesting use case or something urgent, which you want them to do, here's how we would go about getting them in, and it's, like, people are super interested in doing that, and it has to do with, like, you do one cool thing for, like, the Department of Schools, and the next, and after that, they're, like, okay, we, like, they, of course, know, like, the potential, but then, when you hand something off to them, they see, okay, this is something that's, like, this is something that is now available, it's, like, real, essentially, so, yeah, like, super, super interested in all things data currently.

Jaiwan, I see you asked a question on Slido and put your name there. Can I pass the mic over to you to add some context there? Yeah, sure. Just wanted to know how the teamwork or delegation works within your team. Do you face any limitations because you're in the public space? Yeah, just wondering how the teamwork delegation works.

Yeah, okay, great question. Like, like I said, we're, there's only six of us, so we have our, like, chief data officer, Inka, and she's, like, the head of the operation, but we have a data engineer, and we have me, a data scientist, and my, my skill set is mostly geared towards predictive modeling, but also, you know, analytics, you know, reports something that needs to be done, and then we have Sorbet, who's, he's, like, a super machine learning engineer type of data scientist. He's, like, so competent at that, but he's also, like, this, he's our go-to guy to, like, finalize, you know, Shiny apps and do, like, these more complex data visualizations, map stuff, and things like that, and then we have Siri, who's also a data scientist, and she is, she is, like, doing more of the report things currently, as our, like, our junior scientist, and then we have Grimur, who's also a data engineer, so we have essentially three data scientists, two engineers, and a chief data officer.

Saif, I see you have your hand raised as well. Can I pass it over to you? Thank you, Rachel, for giving me the chance, and my apologies if I, I might have missed this right in the beginning. What I'm wondering is, it's quite an interesting project, but the thing that I'm curious about is that what was the, what's the problem that you're trying to solve with this project, and what was the business case for this, if there is a strong business case for it?

Yeah, so, I didn't show the, yeah, did I show this slide? So, it's essentially, like, the first, the first thing that, like, sparked this idea was just citizens trying to figure out which pools were crowded, and sending, you know, calling it to the city offices, is there a way to show us which, you know, can I tell if Laugardalslaugar is super crowded right now? And it's just this interest which started the conversation. But then, once we, you know, that idea sparked, I've also talked to welfare workers within the city of Reykjavik, and they are, like, essentially, because if they have clients that have, like, are susceptible to sensory overload, they usually go to, like, really remote swimming pools, like, in the edge of the suburbs, like, which is not one of the pools in question here, but essentially just to make sure that they're at a non-crowded pool. And then, like, COVID came, and that's when the project actually, you know, became an actual project. Although we, like, we don't have 50% capacity now, it was only for, like, two one-month periods. But that's when there were actually, like, lines outside the swimming pools, and the idea was if we can get this, you know, up and running fast enough, we can then show people, okay, this pool is currently too crowded for you to go there, but here's a pool that is actually, like, not crowded. Sadly, or not sadly, like, it's great that we don't have these limitations anymore, but, like, we didn't manage to do it within that, you know, two-month period.

So, I guess, going forward, now, that explains a lot, actually. Thank you. I think I've seen something similar in terms of people trying to check how crowded the supermarkets are. So, going forward, if you could probably link it with the booking system for the swimming pool, then you might end up with a really nice business backing for that. So, no, it's a great project. Thank you. Thank you so much. And, like, that's, like, a five-year plan, but it's... Well, you already have everything, you know. So, you could possibly do it in a couple of months' time. Yeah, yeah, yeah. But, like, the booking system is connected to the... So, you can actually... If we were to, you know, team up with IT, we could actually join forces, and within this app, you could not only see which pools are, like, which pools are crowded and not crowded, you could actually, like, buy a ticket.

Oli has his hand raised as well, and then I'll go over to some of the other anonymous questions. Hey, Oli. Hi, Leonard. Guys, again, and I'd like to add also that the City of Reykjavik is on this digital transformation journey, and so the requirements are also coming straight from the business. So, we have these digital leaders scattered around the business, and they know their stuff, and they come to the data team and the Department of Innovation and Services with projects that they have prioritized based on, you know, their operational needs.

But, you know, I'm here for a question now, Leonard, and we're really close friends. Since we're Icelandic, like, everybody knows everybody here. There are, like, five of us here in Iceland. I should add that Oli is our former Chief Data Officer, which, you know, created this team of inventors, as I saw someone call it in the chat. Yes, and you certainly are. I want to have a discussion on, like, I think this is super interesting for the people who are joining this call. So, we're building this prediction model for predicting how many people are visiting the pools. What's the level of accuracy that is needed for that in terms of, like, value for the business, for the pool business? So, could you enlighten us on that? Because we have had this discussion so often, but I'm going to, because I feel that it's super important for data science projects. Is it valuable to predict whether there are 50 visitors or 51? Like, is there a difference? Does this change something for me? But isn't the value, like, predicting whether there are 50 or 500? And I'm actually amazed we're talking about prediction accuracy in terms of value. Take it away.

Thank you, Oli. So, the thing we did, like, we evaluated these models on training data, but we don't have test data because we're not going to ask the staff there to, you know, count every five minutes and create a test data set for us. But rather, what we did was leave it up to the managers and the staff of the swimming pools to look at the app. And essentially, they would be, like, the gauge if this is accurate enough for them. And the thing is, they, like, one case where they were doing, like, these daily spot checks. Okay, it's 2 p.m. You walk around the swimming pool and tell me how many people there are, and we can then compare that to this app that this data science team has given us. And what they found out was, like, it was, like, well within their, like, margin of reasonable accuracy because it, like Oli alluded to, it doesn't matter if there are 70 or 75 people at the swimming pool, but it does matter if it's 75 or 200 people. So, like, one example was, like, we did the spot check. It was 72, but the model showed 76. We're completely fine with that.

And, you know, so that essentially, like, and this is what made it fun because, as Oli knows, at my last place of employment, I was, like, always, you know, doing these increasingly complex machine learning algorithms just to get, like, you know, 0.07% accuracy increases. And that's, like, that's not the fun stuff. But, like, the fun stuff is, you know, doing a project, like, a complete project and saying, okay, here's, like, this, here's our estimate, and we're not trying to, you know, what is it, squeeze blood out of a stone. I'm not sure if that's even an English expression. It is in Icelandic. So, so it's, yeah, it's fun. It's all I'm going to say. Like, working on a project that it's, it doesn't matter if it's 72 or 76. It's just well within the margin. It's all about bringing the, like, correct level of accuracy for value, the value for the business. Like you mentioned in the, in the talk, and I really like that, is, like, we evaluate our models through discussions with, with our stakeholders. And I thought that was cool. And it was an awesome presentation, Linus. Thank you.

It's all about bringing the, like, correct level of accuracy for value, the value for the business.

Posit Meetup | Hlynur Hallgrímsson, City of Reykjavík | R in Public Sector

Transcript#

Swimming pools in Reykjavík

The project: showing pool crowdedness

Project challenges and timeline

What the data actually showed

Modeling visitor duration

The pipeline in RStudio Connect

Current setup and production architecture

The Laugardalslaugar problem

Q&A