Posit Meetup | Hlynur Hallgrímsson, City of Reykjavík | R in Public Sector
R in Public Sector Presentation by Hlynur Hallgrímsson # The data you were promised… and the data that you got ## A story about Reykjavík’s Thermal Pools Abstract: When a team within your organization comes to you with a way to launch a highly sought-after data product as a quick win, how should you respond? With open arms of course, but also, as it turns out, with a healthy dose of skepticism. Hlynur Hallgrímsson, a senior data scientist with the City of Reykjavík, Iceland, talks about putting predictive modeling in production when your data doesn’t tell you the whole story. Using the real-world example of Reykjavík’s public thermal pools, Hlynur goes through the process from idea through to implementation of a “simple” app that tells you which public pools are currently the most crowded. Speaker Bio: Hlynur Hallgrímsson is a senior data scientist within the City of Reykjavík’s Office of Data Services, the city’s centralized data science team. A few helpful links: The slides are online here: https://hlynurhallgrims.github.io/the_data_you_were_promised/#1 The github repo is at: https://github.com/hlynurhallgrims/the_data_you_were_promised Calendar of upcoming events: https://www.addevent.com/calendar/wT379734 Speaker submission form: https://forms.gle/gxHXgHcfZUKhiHZU8 Anonymous feedback form: https://forms.gle/bLRjnfqUYmkLaDc46 R for Data Science Online Learning Community Slack: r4ds.io/join (channel #chat-government)
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Yeah like Rachel said, I'm Hlynur Hallgrímsson. I'm a senior data scientist within the Office of Data Services, which is the centralized data science team here in the City of Reykjavík. I used to be an economist, still I'm a chartered economist, but I've been working as a data scientist since 2016 and a data analyst before that. I joined the City of Reykjavík data science team in March of 2020, which was a short welcome because the second week of my employment we all got sent home due to COVID. So it's been like a rather remote back and forth, back and forth between remote and just in the office since then.
But the data science team is new. It was created in 2019, the end of 2019, and we're a small team. So there's only six of us and although sometimes we do like data products from start to finish, other times we are essentially brought in to help with projects in other departments within the City of Reykjavík. But essentially our clients are other departments within the city.
Swimming pools in Reykjavík
So I'm going to talk to you about swimming pools and a fun data science project that has to do with our swimming pools. So Reykjavík accounts for a lot of, a large share rather, of Iceland's population or about 36%. So the population of Iceland is 375,000 people, but in Reykjavík there are 135,000. And if you count the whole capital region, like the suburbs, which are other municipalities, they are actually like 64% of all Icelanders, like the capital region. And we're pretty crazy about our swimming pools.
So this is going to be about a project that is trying to show people how crowded each swimming pool is. But it's also just about how these are hard in data science. And it's not like that, you know, training the most perfect machine learning model is hard. It's just like the project logistics and what we've come to call group intercommunication. That's, you know, that's difficult sometimes.
But so we have geothermal resources in the form of hot water. So we pump the water from the ground at 80 degrees Celsius. So the hot water in Iceland is like incredibly cheap, which is like one of the reasons why we have these pools. They're actually, you know, reasonably, they're not like this super fancy thing. It's just everybody goes to a swimming pool. And also throughout the 20th century, one of the five people in Iceland was on average working in fishing. So like a big part of, you know, just our coming of age story as people is going to mandatory swimming classes through first to 10th grade. And like, yeah, I noted that it's roughly 400 hours of mandatory swimming, which, of course, is like the worst version of everything. It's the mandatory version. And also, it's just cold and dark. So we like to get what little sunlight we can in the hot tubs. And we also go to gossip and talk politics and just socialize.
The project: showing pool crowdedness
So the idea behind this project is telling people, being able to convey to people which swimming pools are crowded and which ones are not. So like since the swimming pools are really popular, people have been contacting us and asking if there's any way to see, like, is there a website where I can see like how crowded these are? And the welfare workers have also expressed that interest for clients that are like susceptible to sensory overload. So that will be also a cool feature once we implement this. But like a big thing was that a neighboring town, it's like all about municipality competition. They now offer a website dashboard which shows you how crowded their two swimming pools are. But in their case, they have staff manually count the patrons, as far as I'm aware. But also during COVID, swimming pool availability was cut to 50% of normal capacity for certain periods. So when that became clear, we realized, OK, this is something that we have to do, not just, you know, to. It wasn't like it went from being like a nice to have to something that we wanted to work on.
So this is something that comes from the Department of Sports and Leisure within the city of Reykjavik. And this is their pitch, but it's not a direct quote. So they say, OK, so we've updated the gates or the turnstiles at our swimming pools to modern electronic turnstiles. With these modern gates, we can now get information on how many people are at each swimming pool through an API, an application programming interface. And their question is, can you help us present this data showing how crowded each pool is?
So this is essentially, at that point, this is just like, honestly, just like we figured it's a simple Shiny app and just a question of database. How are we going to present this? If like if you can just access the data that tells us the number of current visitors for each pool. So like I said, this is a project that is spearheaded by the Department of Sports and Leisure. They come to the centralized data science team to ask for our help. But with regards to getting the data, this is, of course, a little bit more complex because the Department of Sports and Leisure has to deal with the Icelandic partner to the manufacturer of the gates. And they are our contact partner for the actual vendor, the gate manufacturer.
And this is my train of thought. But still, it should be pretty straightforward once we get the data right. And the clown face there is put for emphasis.
So at the start of the project, I envision this as a rather simple project. You read the data from the API using the HTTR package. Of course, we're going to be doing some data cleaning using the tidy version per. So we'll visualize with ggplot, make that visualization interactive with ggraft probably, and then make that into a reactive Shiny app and deploy it to either Shinyapps.io or RStudio Connect. And this, of course, is a tragic miscalculation on my part because I'm assuming that, OK, the data might take some time because of this communication. It's hard to make another team in a company twice removed from you care about your product as much as you do, essentially. So they are not going to be with the internal fire that we have for this project at the data science team.
It's hard to make another team in a company twice removed from you care about your product as much as you do, essentially.
Project challenges and timeline
So with the five-item list, I'm assuming a lot of things about the data and the API. And just before we get into the nitty-gritty of it, simple things, like these projects can be time-consuming because there are difficult things or complex things, but also the simple things aren't necessarily simple because different groups have different priorities. We have other priorities during working on this. The biggest was the COVID dashboard that we needed to put up and maintain at the same time that we were starting work on this. And then there's group intercommunication and forced communication protocols is kind of maybe a bad choice of words, but essentially it's like we need to talk to the Icelandic partner and they talk to the venture because essentially the Icelandic partner is the company we buy the solution from. And then there's lawyer stuff. When lawyers between maybe the city of Reykjavik and the venture start talking, you can, you know, things just slow down.
So this is a timeline and it's absolutely horrible if you look at it. So like the first idea to like, we should maybe do this, do this thing. It's in February of 2020. And there's like this request for data sent out. Hey, can we access this API? And no, lots of emails back and forth. And then we show you the emails. So like maybe just 20 emails from one like thread of emails that I had up, but it's not accounting for like meetings and stuff like that. But then you take into account like that between March and June, we did not have our eyes on this particular ball because we were focusing on putting up the COVID dashboard and maintaining that and keeping that correct. But during the summer, we start to push for more like, what about this data? We asked, you know, you said this, can you maybe look into that? And in September, we come to the conclusion like, OK, you can get this data and here's how it would happen. But according to our lawyers, you'd have to pay a fee to access this API. And then the lawyers from Sports and Leisure say, no, no, no. According to our interpretation of this contract, we don't actually have to pay this fee you're talking about. So that gets in the way. But after Christmas, things start growing and we get the data.
What the data actually showed
But then we actually get the data. This is Friday, last Friday. And if you look at this chart, you can see that for five of the six swimming pools in question, these are like strictly increasing. These aren't showing us like how many visitors are at each swimming pool. And it comes down to the fact that for five of these six pools, like the hardware isn't able to count out. It only counts in. But for the six swimming pools, which you see there in the left bottom corner, it also looks kind of weird. And we'll get into that in a bit. But essentially, you can see that it's not strictly increasing. So it turns out, like I said, the gate hardware for these five pools can only count visitors into the pool. So current visitors isn't current visitors. It's the cumulative sum of total visitors up to that point during the day, like at the point of time where we sent the GET request to the API.
There's more. If you look at Laugardalslaugar, like this looks rather normal. It passes the first eyeball test. Like, okay, this looks like actual data. But it's actually super weird if you look at like the end of the day. There are no people leaving the swimming pool, but at the end of the day. So when the pool closes, and half an hour after the pool closes, there are still around 175 people in the swimming pool according to the counter. And we'll get to that a bit later. But essentially, from this API, we don't know when the people leave. We know when people come into the swimming pool, provided we are querying the API.
Modeling visitor duration
So we ask ourselves, what if we model the duration of each visitor's stay? Or rather, each five-minute intervals, like for all the visitors in each five-minute interval, because we're running the GET request to the API every five minutes. And that way we can essentially get, if we put five-minute intervals between the GET requests, we can say, okay, here are between 6.30 and 6.35, there were 20 people that entered our Árbæjarlaug pool. If we just know how long on average these people would stay in the pool before leaving, we could essentially put together a data set that says, okay, here's an approximation of how many people are at our Árbæjarlaug at the moment. The big idea is that we request historical data through the Icelandic vendor. And I, like I said, but maybe it wasn't clear, so I'll reiterate, has in and out data. Well, it's not perfect. It's just that the other five swimming pools only have in data. And if two assumptions hold, that people's stays are not inherently different in duration between Laugardalslaugar and the other pools, and the second assumption that people's stays are not inherently different in duration between people who are counted out and people who are not counted out of Laugardalslaugar, which we figured out is essentially the problem. So, like the real kicker is that some people are counted out of Laugardalslaugar, but not all. But if these assumptions hold and we get the historical data, we figure we can train a model on historical Laugardalslaugar stay durations of the people that are actually counted out, and then predict durations onto the live API data for the other five pools.
And that's what we did. And this is essentially like the sketch of how we did that. So, within RStudio Connect, we have an R Markdown document that is scheduled to run every five minutes. It gets the data from the live API, and it also gets data from a PIN. We use the PINs package. It reads in the end data that has essentially been read before. Okay. This is a terrible way to explain it. Sorry. Okay. So, essentially, every five minutes, we read the live data. But also, every five minutes, we check for other, like the historical, not the historical data that we were hoping to get from the vendor, but like every other API call. We've saved that to a PIN that we pinned to the RStudio Connect board, and then we join those together. So, we can then rewrite over the PIN. And we then create a Shiny app, which reads like, let's say for today, if the Shiny app were to look at this end data, it would have an account of all, like, all the, like, point in time counts for every pool. And the Shiny app then takes another pinned object, which is a linear model, and then predicts when these people would leave. So, you can present within the Shiny app an estimate of how many people are there.
And just to go into the, like, what we essentially did, we started just on a local machine. And once we've got access to historical data from the vendor, which they save on Google Cloud, we read that data from BigQuery using the bigrquery package. But now we've used, now we use the RStudio Professional Drivers and the DBI package. We fit a linear model for predicting the duration. And the predictors are, is it a weekend? How, like, how long since the summer solstice, or how long until the summer solstice? And then we have a natural spline term of hours. And we use seven knots for the spline term. And it's, we toyed around with it, but essentially we based the, like, the number of knots on just the staff knowledge. So, it's, we're saying it's different, the duration is influenced by, of course, what time of day it is. And we've fit the knots for the spline at these points in time because they say, okay, it's different between 6.30 and 9. So, put a knot between, put a knot at 6.30 and another knot at 9. And then they say, but also during the, during lunch hours, people stay, like, a much shorter time because these are people that have maybe 45 minutes or an hour for lunch, if they're very lucky. And they run to the swimming pool to, you know, get a swimming. And these people stay for a short time. So, the, essentially, the, it's appropriate to fit a knot around the lunch time and so on. So, it's, this is not something that we, like, were, we didn't have, like, some fitting where it was, you know, decided by an algorithm. We just talk to people, just to make sense. And then we evaluated different combinations of knots on the training set.
The pipeline in RStudio Connect
So, like I said, the R Markdown document is scheduled to read data from the API, keep that data, and wrangle it into a table, also read data from the in dataset pin, and then append the cleaned API data table to the in dataset, and then pin that updated in dataset to the RStudio part, which is what I was trying to show here.
We have then a Shiny application that just reads the in dataset pin, reads the model pin, which we have deployed to RStudio Connect from our, we trained it on our local machine, but we deploy it to RStudio Connect, and then we run a predict function for that to create an estimate of current visitors. So, this is an example of that. This is data from September, and this is rather reasonable for the five pools that we are focusing on, but Laugardalslaugar is still not good, and we essentially, we tried to do some things, which I'm not going to go into here. We tried another linear model to correct Laugardalslaugar. It didn't work out, and the way we found out that it didn't work out was we essentially gave this app to the managers of the swimming pools to evaluate, and they said, these five pools, they're okay. They're actually, like, they were really happy with it, but Laugardalslaugar, it wasn't even close. So, that's something we have to, we still have to figure out.
But, like, why is this a cool thing? Because, in my opinion, it's super cool because it's all R and RStudio Connect. So, like, we're an R shop here at the data science team, and if it's all R, we can move at our own pace, and for this part of the project, there were no outside constraints or, you know, inefficiencies in how we communicate with other groups, and we also get to run into the problems ourselves, and then we know what to take into account if parts of the process are outsourced beyond the team.
And we've actually had this, like, a way to explain it is to think about, like, building a car, and I've told Rachel this before. Essentially, we're thinking about if we are trying to build a car, there are many components that need to be built, and essentially, we are saying we can put up a fast prototype and do everything ourselves, and that way, there's no chance of us running into a problem if we outsource to another team, let's say. You take this part of the project, let's say, in the example of a car, you create, like, the seats for the car, and then we get that back, and we realize, okay, this didn't fit, they need to talk to us, but if we just do this ourselves, we run into that problem ourselves, and when we've built, like, a rather, you know, crummy version of the car as, like, a minimum viable product, we can then say, okay, we need to outsource this thing, we need to follow protocols with regards to reading the API, and then we outsource that to our IT team, but then we can say, here's what we ran into, and here's what you need to take into account, and that way, we get results so much faster.
Current setup and production architecture
So, this is our current setup. So, we are not actually reading from the API using RStudio Connect. Now, it's put into Web Methods, which is an integration platform which our IT team uses. So, that's just protocol, and it's just, like, super convenient for us because it gets written to our Azure Data Lake storage, or rather the in data that is created every five minutes, that gets written to Azure Data Lake, and we can then read that in our markdown document every five minutes, and then we do a prediction based on the linear model, which is still just a pin on RStudio Connect, because there's, like, no need to complicate that, and then we have a prediction table, which is what the Shiny app reads on, like, when you fire up the Shiny app, you're just reading from one table, which is the, like, the prediction table, which is a pin on RStudio Connect, a board on RStudio Connect.
The Laugardalslaugar problem
But, just before we finish up, I want to talk about Laugardalslaugar, because that's, like, a super interesting problem, and, like I say here, the data for Laugardalslaugar is absolutely whack, and, like you see, it's a technical term, and that has everything to do with how nice the staff at Laugardalslaugar are. So, like I said, it's the only pool that comes in and out, but when there are, say, school kids who go to their mandatory swimming, like, let's say there's a row or a queue of 40 school kids trying to leave the swimming pool after the swimming classes, the staff at Laugardalslaugar is not going to require everybody to take their, like, their armband, or bracelet, rather, and put it into the machine, which then counts it, okay, this individual came in, you know, 90 minutes earlier, they're leaving now, they just open the, like, open the gates, and the kids throw these bracelets into a, like, into a bucket by the entrance, essentially, and it also has to do with, like, late at night when there's a lot to do, they don't make people actually wait for, you know, wait their time in the queue. So, it's all due to kindness, and you can read that phrase, damn you kindness, you ruined my data, in the voice of the British comedian Richard A. Castor, and you can do that to yourself.
Damn you kindness, you ruined my data.
But, so, the things we're trying, of course, the first thing we wanted to do was just ask the vendor to make changes to the API, to separate the in data from the out data for Laugardalslaugar, we're still waiting on this, but in the meantime, we have, you know, because we really want to do the, we really want to get this thing out, we've been working on it for a long time, so one idea was to use a secondary, like, another counter that is located by the actual pools in Laugardalslaugar, not by the entrance, but this has proven unfruitful because it's just a traffic counter, it doesn't say that people are going in this direction or in this direction, so it's just, we can't tell you how many people are going into the pool and how many people are going out of the pool. And then, this is like, pardon my English, or rather, pardon my French for what I'm about to say, but this, like, the next thing feels like something you read on the Twitter account, Internet of Shit, which is like an internet of things, like, where it roasts these smart solutions, but there are smart lockers in Laugardalslaugar, but, like, the internet of shit part is where it's unclear if the firmware can be updated to the most recent version to support live data, so for the time being, we only have historical data for the smart lockers, which I believe gets read into a database every night when the pools are closed.
So, essentially, we tried a couple of things that didn't work, we're waiting on the vendor to make changes to the API, and I'm sure it's going to be, you know, oh, let our lawyers talk to your lawyers about this fee, it'll end up some, I'm jaded, terribly sorry, but also, before I end, like, this here was a gross oversimplification where we're only talking to, like, the Icelandic partner to the vendor, there are also gyms that have access to the pools, so you can go to a gym that's, like, located next to a pool, and they have separate counters into the pool, so after you've done your workout at the gym, you can actually walk through some tunnel into the pool area, and we have yet to figure out how to count those, because that's, we deal with Department of Sports and Leisure, and Department of Sports and Leisure deals with the owners of the gym, and the owners of the gym talk to some Icelandic partner to a vendor, which talk to, you know, an international vendor, so it's, like, a more complex, like, pipeline, essentially, pipeline of information.
So, still to do, we need to account for, like, gym patrons that use the swimming pools at Laugardalslaugar and Breiðholtslaugar, and, like, the final point of this presentation is, of course, there's going to be something else, but we don't know it yet, so I'd love to take questions on this. I'm sure, I, yeah, we have 20 minutes, and I'm sure there are some things that I did not explain properly, so all ears. Thank you so much, Leonard. That was awesome, and I'm just thinking, if one time here, it would be so cool to be able to just go to a pool right now.
Q&A
So, there's a lot of great questions that came in on Slido, and so, just so everyone knows, and Josiah can help me share the link again, you can ask questions there. If you want to put your name in, I could also call on you to, like, add additional context, too, but I'll start with some of the anonymous ones, and one was, and quite a few people commented that they really liked that first visualization you shared, and they said, could you kindly share the R syntax of the first visualization that showed your email tracking, the day we requested, the day we received?
Okay, yeah, so this is, this presentation is, like, an R markdown presentation, so the timeline, it's essentially three ggplot graphs, and I can share this, I should have added my GitHub account, but I'll share this, like, this is, this, oh, I'm terribly sorry, I'm not sharing the screen, I'm sharing the, I'm sharing the Chrome browser. I can see the presentation screen, okay.
Okay, I'm gonna, I'm gonna share the secondary screen, so I can, like, where's the share screen? Okay, hopefully, and I can share your, you can see my, oh, now I can see the code, okay. Okay, so this is, like, this is just the data, and it's a, like, this is the first ggplot, and it's not just a mess, but I didn't think I'd be sharing this code, but I'd be happy to, like, data realism, so, yeah, I'll put that on my GitHub, and Rachel, you can share that. Perfect. Awesome. Thank you. Someone else asked, is there a live link to the swimming pool dashboard currently?
Not that it's accessible outside, like, the City of Reykjavik team, because we haven't, because of our problems with Laugardalslaugar, currently, there are two apps. One app is for the managers of the swimming pools, and the other one is, like, the in-progress, because I hand, like, I make the plain-looking Shiny app, but then we have Thor, who's a data scientist, a great data scientist there at the City of Reykjavik, who has, like, this artistic eye, and he makes, like, all the pretty production Shiny apps, so that one is hopefully, you know, soon to be released, but at the moment, no, it's not accessible.
One is, did this project initiate any improvements in user tracking, or is everybody just content with the way you patched the problem and things will stay the same? I think you maybe mentioned at the end about the API possibly changing. Yeah, but another thing is that, like, there have been, like, discussions between Sports and Leisure and the managers of the actual swimming pools with regards to change, like, it's perfectly acceptable for 40 school children to not be counted out, but, like, for the later, like, if there's two full-grown adults waiting in queue, like, you can, they're essentially saying, you know, we'd appreciate if you stop the kindness just a little bit, and, you know, let these people be counted out. So, that's something that we haven't seen in the data yet, but provided that, it provided that we fixed the problem, that we fixed things with, like, once we get the gym data, and then we can say, like, can we look at this if they were to, like, be more, or rather less kind, let's just say less kind, now they count people out.
Another upvoted question was, could you compare your use of Plotly and ggiraph? Did you use both or prefer one over the other? I actually, I started out using Plotly because I've been using that for a long time, but then Sol, who's this, you know, artistic data scientist that I mentioned earlier, he pointed me in the direction of ggiraph, and I much prefer that now, because, like, you just create a ggplot, and then you wrap it in the ggiraph, and it instantly becomes, like, an interactive visualization, provided you change the, like, the geom part. So, I like the ggplot, like, flow to things, and this is, like, a perfect extension to that.
I see Gregory, or Gregor, you have a question. Yes. Can you hear me well? Yes, I can. Yeah. Okay. Thank you, Eleanor, for the great presentation. Actually, I'm an economist, too, so that's nice to know that I'm not the only one interested in data science, and I wanted to know how much the city or the public entity is interested in all what you can provide as data analyst or economist, or, I mean, are there many projects are willing to put you in, or it's, like, still growing, as you mentioned, because it's new to them?
Awesome question. Thank you. It's actually just, like, skyrocketing. So, like I said, the data team is relatively new, like, years and 2019, but now, like, at the beginning, there were, we didn't have these, like, fixed protocols in place, how we did these projects, but now, just with a little experience, we've been, like, moved into the IT departments framework for, like, this is the way we accept projects, and this is how we allocate time to them, and, like, since that, it's become, like, it's been made clear to the departments within the city of Reykjavik, like, here's a team of data scientists, if you have an interesting use case or something urgent, which you want them to do, here's how we would go about getting them in, and it's, like, people are super interested in doing that, and it has to do with, like, you do one cool thing for, like, the Department of Schools, and the next, and after that, they're, like, okay, we, like, they, of course, know, like, the potential, but then, when you hand something off to them, they see, okay, this is something that's, like, this is something that is now available, it's, like, real, essentially, so, yeah, like, super, super interested in all things data currently.
Jaiwan, I see you asked a question on Slido and put your name there. Can I pass the mic over to you to add some context there? Yeah, sure. Just wanted to know how the teamwork or delegation works within your team. Do you face any limitations because you're in the public space? Yeah, just wondering how the teamwork delegation works.
Yeah, okay, great question. Like, like I said, we're, there's only six of us, so we have our, like, chief data officer, Inka, and she's, like, the head of the operation, but we have a data engineer, and we have me, a data scientist, and my, my skill set is mostly geared towards predictive modeling, but also, you know, analytics, you know, reports something that needs to be done, and then we have Sorbet, who's, he's, like, a super machine learning engineer type of data scientist. He's, like, so competent at that, but he's also, like, this, he's our go-to guy to, like, finalize, you know, Shiny apps and do, like, these more complex data visualizations, map stuff, and things like that, and then we have Siri, who's also a data scientist, and she is, she is, like, doing more of the report things currently, as our, like, our junior scientist, and then we have Grimur, who's also a data engineer, so we have essentially three data scientists, two engineers, and a chief data officer.
Saif, I see you have your hand raised as well. Can I pass it over to you? Thank you, Rachel, for giving me the chance, and my apologies if I, I might have missed this right in the beginning. What I'm wondering is, it's quite an interesting project, but the thing that I'm curious about is that what was the, what's the problem that you're trying to solve with this project, and what was the business case for this, if there is a strong business case for it?
Yeah, so, I didn't show the, yeah, did I show this slide? So, it's essentially, like, the first, the first thing that, like, sparked this idea was just citizens trying to figure out which pools were crowded, and sending, you know, calling it to the city offices, is there a way to show us which, you know, can I tell if Laugardalslaugar is super crowded right now? And it's just this interest which started the conversation. But then, once we, you know, that idea sparked, I've also talked to welfare workers within the city of Reykjavik, and they are, like, essentially, because if they have clients that have, like, are susceptible to sensory overload, they usually go to, like, really remote swimming pools, like, in the edge of the suburbs, like, which is not one of the pools in question here, but essentially just to make sure that they're at a non-crowded pool. And then, like, COVID came, and that's when the project actually, you know, became an actual project. Although we, like, we don't have 50% capacity now, it was only for, like, two one-month periods. But that's when there were actually, like, lines outside the swimming pools, and the idea was if we can get this, you know, up and running fast enough, we can then show people, okay, this pool is currently too crowded for you to go there, but here's a pool that is actually, like, not crowded. Sadly, or not sadly, like, it's great that we don't have these limitations anymore, but, like, we didn't manage to do it within that, you know, two-month period.
So, I guess, going forward, now, that explains a lot, actually. Thank you. I think I've seen something similar in terms of people trying to check how crowded the supermarkets are. So, going forward, if you could probably link it with the booking system for the swimming pool, then you might end up with a really nice business backing for that. So, no, it's a great project. Thank you. Thank you so much. And, like, that's, like, a five-year plan, but it's... Well, you already have everything, you know. So, you could possibly do it in a couple of months' time. Yeah, yeah, yeah. But, like, the booking system is connected to the... So, you can actually... If we were to, you know, team up with IT, we could actually join forces, and within this app, you could not only see which pools are, like, which pools are crowded and not crowded, you could actually, like, buy a ticket.
Oli has his hand raised as well, and then I'll go over to some of the other anonymous questions. Hey, Oli. Hi, Leonard. Guys, again, and I'd like to add also that the City of Reykjavik is on this digital transformation journey, and so the requirements are also coming straight from the business. So, we have these digital leaders scattered around the business, and they know their stuff, and they come to the data team and the Department of Innovation and Services with projects that they have prioritized based on, you know, their operational needs.
But, you know, I'm here for a question now, Leonard, and we're really close friends. Since we're Icelandic, like, everybody knows everybody here. There are, like, five of us here in Iceland. I should add that Oli is our former Chief Data Officer, which, you know, created this team of inventors, as I saw someone call it in the chat. Yes, and you certainly are. I want to have a discussion on, like, I think this is super interesting for the people who are joining this call. So, we're building this prediction model for predicting how many people are visiting the pools. What's the level of accuracy that is needed for that in terms of, like, value for the business, for the pool business? So, could you enlighten us on that? Because we have had this discussion so often, but I'm going to, because I feel that it's super important for data science projects. Is it valuable to predict whether there are 50 visitors or 51? Like, is there a difference? Does this change something for me? But isn't the value, like, predicting whether there are 50 or 500? And I'm actually amazed we're talking about prediction accuracy in terms of value. Take it away.
Thank you, Oli. So, the thing we did, like, we evaluated these models on training data, but we don't have test data because we're not going to ask the staff there to, you know, count every five minutes and create a test data set for us. But rather, what we did was leave it up to the managers and the staff of the swimming pools to look at the app. And essentially, they would be, like, the gauge if this is accurate enough for them. And the thing is, they, like, one case where they were doing, like, these daily spot checks. Okay, it's 2 p.m. You walk around the swimming pool and tell me how many people there are, and we can then compare that to this app that this data science team has given us. And what they found out was, like, it was, like, well within their, like, margin of reasonable accuracy because it, like Oli alluded to, it doesn't matter if there are 70 or 75 people at the swimming pool, but it does matter if it's 75 or 200 people. So, like, one example was, like, we did the spot check. It was 72, but the model showed 76. We're completely fine with that.
And, you know, so that essentially, like, and this is what made it fun because, as Oli knows, at my last place of employment, I was, like, always, you know, doing these increasingly complex machine learning algorithms just to get, like, you know, 0.07% accuracy increases. And that's, like, that's not the fun stuff. But, like, the fun stuff is, you know, doing a project, like, a complete project and saying, okay, here's, like, this, here's our estimate, and we're not trying to, you know, what is it, squeeze blood out of a stone. I'm not sure if that's even an English expression. It is in Icelandic. So, so it's, yeah, it's fun. It's all I'm going to say. Like, working on a project that it's, it doesn't matter if it's 72 or 76. It's just well within the margin. It's all about bringing the, like, correct level of accuracy for value, the value for the business. Like you mentioned in the, in the talk, and I really like that, is, like, we evaluate our models through discussions with, with our stakeholders. And I thought that was cool. And it was an awesome presentation, Linus. Thank you.
It's all about bringing the, like, correct level of accuracy for value, the value for the business.