Data Science Hangout | Óli Páll Geirsson, City of Reykjavik | Data Science is More About People
We want to help data science leaders become better. The Data Science Hangout is a weekly, free-to-join open conversation for current and aspiring data science leaders. An accomplished leader in the space will join us each week and answer whatever questions the audience may have. We were recently joined by Óli Páll Geirsson, Chief Data Officer at the City of Reykjavik. ► Subscribe to Our Channel Here: https://bit.ly/2TzgcOu ► Add the Data Science Hangout to your calendar: https://www.addevent.com/event/Qv9211919 9:44 - Providing value to stakeholders in data science 12:35 - Why data science is more about the people 15:06 - Communication with stakeholders = Crucial 17:21 - The value of building up your data science team 18:08 - Why you need a diverse data science team 20:15 - More efficient data science teams by breaking down goals 37:46 - Active listening to better identify needs for your data science projects 43:55 - Prioritizing the projects your data science team works on 49:37 - The best way to approach key stakeholders 1:07:06 - The importance of visualizing data products Follow Us Here: Website: https://www.rstudio.com LinkedIn:https://www.linkedin.com/company/rstudio Twitter: https://twitter.com/rstudio
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Welcome, everyone, to the Data Science Hangout, and welcome back to all the familiar faces and to those joining for the first time. We really want to use this time to focus on questions that are most important to you all. So no agenda, everyone's welcome to join in live or put any questions you have in the chat. We do also have a Slido for any questions that you want to ask anonymously, too. And just a quick note that we will record this session, so we'll post it up to YouTube.
But for today I'm joined by my co-host, Óli, Chief Data Officer at the City of Reykjavik. Óli is focused on bringing value from data to residents of Reykjavik. Óli, I'd love to have you introduce yourself and maybe share a bit about your team and the work that you do.
Yeah, of course. I'm happy to be here. I think this is an important discussion framework for us to have. This is an interesting field. And I think many of us are doing these things for the, some of the things are being done for the first time or second, like they're relatively new. We need to learn from each other, at least in my case. So, as Ash mentioned, I'm the Chief Data Officer of the City of Reykjavik. And data is there, it's playing a central role recently. And it's relatively new.
So, just a few words on my story. My background is in mathematical statistics. I did my PhD seven years back, finished seven years back, which was on basic hierarchical modeling. And all of my modeling was carried out, of course, in R, like the same person. And so, when I was doing my PhD research, I was also doing a lot of statistical consultancy, both for start-up companies, for big companies, and also within the academy. So, once I graduated, I spent a year in postdoc, but I felt that I needed to be somewhere else.
And so, after the PhD, I worked for a company called Quisup. It was a computer gaming company here in Iceland. And after that, I went to Ellis Retail, that is also an Icelandic company, specializing in POC software, POS software. After that, I was with Lundspanken as a senior data scientist, where I laid the groundwork for the use of machine learning at the back, which was extremely interesting because it wasn't just risk-related. It was data science in general, and what can we do with data science. And that was the point that I was trying to get to earlier, is that using data science as a part of your operations as a whole, that is relatively new.
Bringing value to stakeholders
So, my journey at the city started two years ago, and the city council decided that, hey, we want to be a modern city. We are nothing without data. They realize that data plays a central role in the operations of organizations these days, and are aware of big companies like Google and Netflix and all of these things, Amazon, and they're data-driven. And being data-driven is something that we need to breathe and feel.
But it needs to have some point. It needs to bring some value. And this is something that I would like to emphasize on this discussion today, is that being a part of an organization as a data science team or a data science leader, or a data leader, or whatever, isn't about algorithms or data science in that sense, or cool structures, or cool inference methods. It's about seeing where you can bring value to somebody in your business. And this is, of course, obvious when you discuss that, but you need to be okay with letting go of this amazing model you spent weeks training, and realize that if it isn't bringing value, then you're doing something that is pointless for the organization. It isn't pointless for you, but it's pointless for the team as a whole.
It's about seeing where you can bring value to somebody in your business. And this is, of course, obvious when you discuss that, but you need to be okay with letting go of this amazing model you spent weeks training, and realize that if it isn't bringing value, then you're doing something that is pointless for the organization.
And I feel like if you don't see the value and you're not okay with having these discussions, then you won't make any progress within the organization.
So, we, the data science team at the City of Reykjavik, we have this momentum from the City Council, but we need to bring the action to the other departments in the sense that we need to show them how we bring value. And I feel like this is for at least the team or the team leader, or somebody has to be responsible for this aspect of the data science operations.
And bringing value isn't only your responsibility. It's your and your clients' or your stakeholders' responsibility. And it can only happen in a dialogue and in an open discussion with your stakeholders. Let me give you an example. Like, we have various stakeholders at the City. The residents of Reykjavik City are the largest stakeholder. But let's demystify this. Let's take an easy example. Like, one of our biggest stakeholders that is within the city are all departments like school and leisure. And they know their business. They know what they're about. And we can't, like, do data science for them without working with them and having a dialogue with them such that they understand the value they're bringing.
Like, how are we adding to your operation with data science? And so being in this role, you have to be okay with, hey, I need to explain things that are obvious to me and realize that not everybody is as excited about data as you. And be okay with that. You need to have a discussion with all stakeholders. Like, hey, this is great. You can do this. Did you realize that? And this is awesome. And they need to fit it into their own framework. And I feel like in my experience, like, be energetic about this. Be mindful of their needs. And if they're, like, bringing up the wall and being difficult, there might be a reason for that. That has nothing to do with you or data science. It's all about them.
And realize that people are different. And what I find amazing in this job is, like, data science is more about people than it is about AI. Isn't that weird? You can't just sit in some corner and train AI models and expect things to be amazing.
Building the team
So, a few words about my awesome team. And of course, you need to have, like, these data engineers. They are responsible for getting data into some sort of domain, like a data warehouse or data lake or whatever technology you're using. Their responsibility for the underlying framework. And I often have, like, this metaphor for using a data pyramid. And the first layer of the pyramid that you're trying to build are the data foundations. Getting data, cleaning it, and maintaining it.
So, once you have these strong data foundations, and then you need to bring in the data scientists, the talent set that can go into the data and find patterns and bring value. And I think it's incredibly important that the data scientists are okay with maintaining an open dialogue with the stakeholders and explaining things that are obvious to them. Like, hey, you know, this is the mean and this is the median. And just be okay with that. Put on your teacher's hat and enjoy the ride. And I think it's much more about that than training extremely complicated models, at least in some cases.
And the third layer is what I call and I think in the literature is called data ops. And that is operationalizing some of your products, your data products, and your data products might be APIs, APIs that you expose to your clients. It might be connections into Excel or Power BI or whatever. It might be like Shiny dashboards. And we use a lot of Shiny dashboards. And in our particular case, it's an RStudio Connect.
And so all of these things are aiming to, you're trying to build this pyramid to get to the top part of it, which is bringing value. You can have all these nice things. But if you aren't bringing value to somebody, and I'd like to emphasize that it's not necessarily only monetary value. It can be value in the form of better services, in a better experience for the residents of your city, like in our case, you need to define it. And you need to be able to measure it. And then you need to live by it.
We brought value through this product in this sense. I feel like when you are, when everybody in the team realizes that, you get this positive feedback loop into your team from your clients, that you and your team belong to something bigger than yourself, and bigger than your team. And then, at least in my case, and I think I can talk for the team as well. And then you're more, let's say energetic about going to work the next day.
And the team consists of data engineers, it consists of data scientists and data ops and product manager and me. And there's some extra obvious things that I'd just like to say out loud. Everybody knows this, but I'm still going to say it out loud. You need to have a diversified team. So embrace diversity. It's so obvious, but it's so true. You need to be diverse in age, you need to be diverse in gender, and in all of these things that you can be diverse in. Because then you bring different mindset into the mix. And I also feel like when you're not surrounded by people that are just like you, you automatically become more professional and try to bring your A game all the time.
And don't discriminate towards age. Like somebody who's been 30 years in the game is not too old for data science. And in this particular field, which is this crazy mix of science and thoughts and disciplines and discussions and being humane, then I think it's more important than ever to diversify.
Team spirit and agile development
So it's so important to get the sense that you belong to something bigger. And there are various ways to do that. But I think you mainly do that through your stakeholder relationships and building up a good team spirit. But I have some tricks that I'd like to share for building up my fantastic team spirit and belonging to something bigger than yourself. First off, I'd like to emphasize the importance of being agile in your development. And I'm not saying that because Agile is cool or the main thing I take from an Agile approach is that you have dailies and you have opportunities to set these and take these big goals. Like you want to run a marathon, but you're going to break it into pieces. And when you do that, you realize that, hey, this overall big goal is not something that I need to think about all the time. I only think about individual small goals. And that is nice because when you reach them, and then you can celebrate with your team.
And then you get this positive feedback that, hey, I did something important. I didn't complete this whole marathon, but we've reached our short sighted goal or like short term goals. And on daily or planning, I share that with my team and got this positive feedback that, hey, I did well. If I didn't, you know, break this goal into pieces, you're losing these opportunities to reenlighten the spirit or the sense of purpose in your team. And I think during COVID times and you work from home, I think that's actually important when you have team members sitting at home and they kind of think like, who am I? Why am I here sitting in the same seat all day long? I need to belong to something.
And the second trick that, these are just tricks, you can't just do these things and hope for the best. These are just extra flavor. But I'm really keen on allowing the team to have their own time on Fridays. Just work on something that I think is interesting, not necessarily projects, but it has to be data related, data science related. It can be something that hasn't been like, can be a project that, like in our particular case, one of these awesome team members decided to play around the data on garbage collection calendars. Those weren't in a digital format here at the city. So now he built up this interactive garbage calendar, which is now a hit product at Reykjavik City, which is something that was totally unknown, but he just rolled the dice and we thought that was cool. And now this is our most popular product.
But these are also the times and opportunities for you, for the team members or for your team to upscale themselves. And don't be too strict about it. And just like, hey, go play, but go play in data science, right? Because that's like having like this one lottery ticket, you never know what comes out of it. Worst case scenario, your team members know something more. And I love it. I already have a product now from that lottery operationalized. So yeah, it was totally worth it in my case.
The swimming pool project
I love that point you made that data science is more about the people. But Oli, I know I have the advantage of knowing a few more of the use cases from the team, from our past conversations. But I think it'd be awesome to also maybe share a few other examples of what the team is working on to provide value to residents.
So Icelanders, they really like swimming pools. And that has something to do with that, that Iceland sits on this geothermal spot. For you guys who are not aware, we have an active volcano now. So this is a geothermally active island. And so we have these geothermal plants scattered around Reykjavik city, and they provide the city with abundance of hot water and electricity. So as a result, we have a lot of swimming pools, like crazy, crazy many swimming pools. And this is, I think, the most favorite pastime activities for Icelanders to go to a swimming pool.
But since it's so popular, and it's also popular with tourists, is that we have been asked frequently, like, hey, can't you supply us with data on how many guests are in the swimming pools? And so I've got so many requests from various people. When are you going to bring us a dashboard on swimming pools? Like how many guests are currently in the swimming pools, like live data? We thought, like, hey, that's easy. So there are these gates that count in guests in most of the swimming pools in Reykjavik. But the thing is, for some reason, so one would assume, like, hey, I just come in and the gate comes out. It's just plus and minuses and everybody is happy.
So it turns out that these gates that are now in operations, most of them don't count out. And I have no idea why. It's just probably they were cheaper. But they are now in place. They count in. So we know how many guests have gone into the swimming pool, but we don't know how many have gone out of it. So we have this accumulation, this kind of like an accumulated distribution that never comes back down. So how many people are in the swimming pool? A million. It's not a true answer.
So what to do about that? And in one of these swimming pools, Løvesløk, they also count out because they have more advanced gates. So we have data on how many guests have been counted in and how many have been counted out. So we have like a historical distribution on how long each guest is staying at the swimming pool, and so we can model that. We can build up a model that says, hey, so it's actually a temporal, it's a time series model that can predict like, hey, now I have this guest arrived at this point in time, according to my model, which is a linear spline model. And you can go into the details if you want to, but it's a linear spline model that can predict, hey, it's this time of the year and it's at this time. And I predict that guests that are coming now should stay on average 90 minutes. So then we count them out after 90 minutes.
And so we count in and then we have a model that counts out. So we have an accumulated distribution with pluses and accumulated distribution with minuses. And then we can calculate some kind of estimate on how many are in the swimming pool during every hour. But the thing is, here we need to think about, here comes, now we think about this project as a data scientist, or we need to zoom out a bit and ask, okay, where's the value? The value is knowing approximately at which capacity the swimming pool is at. I only need to know whether it's like 70, 80% capacity or 20%. I don't need to know as a guest whether there are 50 guests or 52 guests. Like you might be discouraged to go to the swimming pool if you see it's at full capacity, or you might not. It depends on your needs.
So that's how we do it. We applied that model to the other swimming pools, and we are currently asking those swimming pool executives to evaluate whether this makes any sense. But there is a catch, and this is why it is one of our favourite projects. In Lerpersloe, that doesn't count out, that counts out, sometime during the day, the staff says, to help with it, there are too many guests here, we're just going to open the gates and re-evaluate in the morning how many guests were. So we need to do some estimation on how correct the data is.
So yeah, we have like this model running in our studio environment. It's like the data comes in, it comes onto our studio computer that's in the Azure cloud, so it's a virtual machine, and then it queries the model, like, hey, how many, like, for how long should this guest stay? Oh, he's going to stay for 90 minutes. And then it will do the aggregation of the pluses and minuses and then give the feedback back to the API. And I just think it's really cool. It's data science and some arithmetic.
Yeah, yeah. Not all of the projects are like that, but we'd rather do that than spend 10 million S&D kroners on new gates. Would you say that's your favourite project?
Yeah, it's the top 10, at least top five, because it's just, it was so laughable, it was so funny to see that, hey, this is extremely complicated, but can we actually solve that? And it turned out that we could. So that was fun.
Stakeholder collaboration and prioritisation
So, actually, two things that come to mind, Ali, like, really interesting example. I'm trying to visualize how your team worked on that over time. And I'm curious how many analysts slash scientists you had working on it, just roughly.
Yeah. So, I have a fairly accurate estimation of that. So, this has been, I think, not counting the summer vacation. I think it's been a four month project up to this date. The first hurdle was basically getting access to the data at that touching time. So, being like an official battery or entity like Reykjavik City, you need to make sure when you're doing contracts or when acquiring things that, like, in the case of these gates, like, you need to make sure that, hey, the data that these gates have generated, they belong to us, and we will not pay an ounce or krona for getting these data. And that particular contract was not phrased well enough. So, that took some discussions back and forth.
And I think when we realized that the gates will respond in this way, I think it's been a month where we tried to figure out whether we could actually solve this. But that was not, the data scientist who was working on this one was not doing that full time. It was doing like maybe 25% on and off. But I think that is a good approach when you hit a problem that is new and unsolved, not like business as usual, to just let it sit with you and let your unconscious mind work on it. So, let's say 25% for a month. And now it's ready and in testing. But in actual time, it's been four months. And one data scientist and me as well, because I can choose projects. And I'm like, you get to pick on the best stuff.
Cool. One other thing, and this should be a quickie, but I'm curious, after you built the thing and built the model and built out the dashboard, how did you know, how did you validate that it was working? Did you like go to the field and say like, oh, yeah, great results, say 60%, looks about 60% full?
Yes. So, almost like that. So, we asked the, you know, the swimming pools are being operated as a like, like individual companies. So, they have executives. So, we asked them to compare, like, does this make sense with what you see and what you think is true in your swimming pool? And that is like, I think that is a good enough evaluation without actually going and asking, like, when are you going to go out of the swimming pool? But if it's in, I would say, if it's within, if you're thinking about proportions, thinking about, like, capacity proportions, if you're within, like, 20% accuracy, I think you're doing fine in this particular case. Like, because it will not affect you as a resident or visitor, whether 70 people or 80 people in the swimming pool, but it might affect you if they're 10 or 10,000.
Rob, I see you asked a question in the chat, too. Do you want to ask that live? Yeah, well, you kind of answered it a little bit with Frank's answer. But you talked about collaboration with stakeholders. And, you know, you're in a sort of public service-esque position and citizens are your stakeholders. I was just wondering what that actually looks like.
Yes, this is a tricky one, because I don't think there's a single answer here. And so I think we always need to bear in mind that ultimately what we're doing at the city are we are servicing the residents. And even though you are a data scientist, and then you just need to know this, you're a servant, you serve them. But I think the swimming pool example is an interesting example, because this is something that many residents were asking, and we're getting frequent requests. It wasn't to me only, but also to the department that is responsible for the operations of the swimming pools.
But I have a particular project in mind. So we are starting now to build up like this portfolio of data products that is supposed to be available on the web. And it's supposed to be some kind of data market or a data store or dashboard store for the residents of Reykjavik. And I think that's a really cool product. And I think we should build that. But we can't do that without understanding the needs of the residents. Do the residents actually need to know how many bikes were going around here every hour? Do they need to know that? That's something that helps them to go through the day.
And we have done that with surveys and asking and having these open platforms for residents to get ideas in. And so some of these things have worked quite well. Others haven't. But like for external data and external products that are like especially made for the citizens, you have to like keep an ear open, like listen. Like in the case of this calendar data product that is Trask calendar data product, that was a screaming need. Somebody had to do that. And we listened. So this is in an active dialogue and also by active listening.
So the data team belongs to a bigger department or is a part of a bigger department that is called Department of Service and Innovation. So we are tasked with the digital transformation of the city of Reykjavik. Basically enhancing and building the digital infrastructure. Not only data and data science, but the digital world as a whole.
So the idea is our department is responsible for prioritising projects and person. And so we built up this, we call it just the matrix. And the matrix has a certain factors in it, columns or whatever. And they are like one of the questions that we ask that belong to the matrix, how is this bringing value and to whom? And we have certain rating range or a numerical range that represents some value. The highest score would be brings a lot of value to employees and residents. And we need to be able to articulate that why it's bringing value. And that scores something. And then we have other factors that we score by, is it supporting the digital infrastructure or is it a hindrance in some way or is it alleviating some pain points or whatever.
We also have more, so we have like these mandates and vision that we work by, like does this correspond to our green initiative. And also does this correspond to equality, gender equality and overall embracing diversity within a city. We're trying to score, I think if I remember correctly, it's seven categories. And they all represent it and the numerical values have some meaning behind them. So when we do this rigorously like that, we get like a list of projects that are scored efficiently. And then we are in a better position to argue for doing things this way.
And since our, like we need to be responsible, we need to make the argument for our choices to city council. And so when we do this in this data-driven fashion, it's easier to defend your position, if you can think about it that way. But it isn't built only for that. It's built for us to understand why we're prioritizing projects. And before we built up this procedure, we chose projects into the data team, basically just like the first year was just being cowboys and just thinking something that was interesting. But now we are more rigorous in thinking about, hey, we need to build up a data lake.
So there's more to the story. So the other departments, they also need to prioritize their projects before they come into us. And we prioritize them. And this is not only data, this is the digital transformation effort as a whole. And how we are, the framework that we're building is that we are injecting these teams agents into the other departments and we call them digital leaders. And the teams agents or digital leaders, they work closely with the department heads and they have a certain mandate to work. And their task is to get an insight into the operations of their departments and see opportunities for digital transformation. So they are tasked with like getting their hands dirty and trying to find out, hey, here are pain points, we can do this, this, this. And then they bring that to us and that goes to the matrix and then we start to go and then we go to work.
Data science is more about people
That's the trick, isn't it? Is to find this common ground with somebody who's an expert in their own business and their operations and knows nothing about data science. And I think the short answer is there is no one single way to do this. But the long answer is have many discussions and be aware of continuous development processes. So you can start out by doing something simple. And did you mean this? And have a dialogue on that. Then meet again and meet again and meet again and build something simple and build upon it. So I'm really fond of this concept of continuous development and building up some things like step by step.
And I think what happens in a process like that, if you have a good relation and open dialogue with your stakeholder, is they realize that, hey, what? That's awesome, right? And just to give you a super simple example, like one of these dashboards that we are building, and that has nothing to do with machine learning or anything like that, like we use a package in R called Plotly to build up graphs. And you can slice and dice on a dashboard and filter out the data. And then you can download the image. Like you can download it as is. Like how it appears. And that was just a game changer for somebody. And I had no idea that they didn't realize that. You know there's a download button, you can just press it. And this only happens because we talked frequently.
Yeah, I'm not sure whether there's a silver bullet here. But I think the approach is, dare I say, humility and realize that nobody thinks the same. And most people have no interest in data. I just, if you're okay with that, like that doesn't affect me as a person or my interest. I'm just okay with explaining, hey, a linear relationship, that's something that is, like looks like that. Hey, Simpson's paradox, did you hear about that? Like, if you're just okay with that and also curious about their business, then something can happen.
When you're meeting especially a new client or a stubborn client or a stakeholder that has a complicated set of procedures and a complicated framework, then this needs to happen gradually and with humility. Like I honestly think that's the only way, because you can't come in hard. You can't come in and say like, yes, neural network, now we're going to do that.
Do you think that people do that? Do they come in and they try to oversell? Do you see the data scientists do that? Like they come in and they try to oversell and push their idea?
I can't, I'm not sure. I know I have done it over the years and I have learned this the hard way. So I have a story to share when I was at the bank. And so we were trying to push the boundaries there. Like we were trying to incorporate data science into many aspects of the operations of the bank. And I remember quite distinctly this meeting I had with the corporate department. And what we're trying to, I was building a model that can help them quantify and categorize companies that were not doing business with the bank. So basically a clustering algorithm for companies and clustering with respect to their portfolio, their economical size and all of this.
But basically what I did there, so the algorithm I used to demystify these numbers was I normalized them and then I protected them on a normalized space. And then I did some clustering algorithms and protected them back in order to understand that, hey, this company belongs to this category. And the approach that I was adhering to is nothing new in this world. It's been done for decades. And I came in hard in a way that I started to explain to them how the algorithm worked and how it is superior to the way they're working.
So, and that was coming from a good place. Like I was trying my best to explain that was made probably from coming from a place of, hey, I need to convince them that this is correct. Whereas what they needed was why does this help me? Not, hey, they don't need to be convinced that the algorithm is this way or that. And so that what I mean is like, I just hit a wall there and I didn't understand like, why are they not super excited about this? This is awesome.
And the interesting thing there, like after that meeting and my old manager at the time just had this discussion with me, like you came in way too hard. And when you do that, even you were overselling, not necessarily overselling, but you're doing it incorrectly. It didn't find this common ground that they needed. You were playing basketball, they wanted to play football or water. And so you just didn't understand the game. And this was just so distinct in my memory, like, okay, I need to be the teacher. I need to be on their side. And when we went back to this, then we went about this differently and talked about, hey, this can help you sell products because we know what kind of company this is.
Sounds like your origin story to data science is more about people than about data.
Yeah, that's the insane thing. I'm super excited about AI. I teach AI at the University of Iceland, but employing it is so humane, like deploying it and operationalising it and getting people on board. This happens through discussions because I also had dialogues where the people I was talking to thought that I was trying to take away their job. And what I was actually doing was I was helping them. And because they, from their point of view, thought this, and then we couldn't have a discussion until later when we just started again.
So especially for stakeholders that don't see how it works, are stubborn, have been there for a long time. You need to respect their boundaries and respect their needs. And that is not easy. That happens through continuous dialogue and trying to inject ideas, being patient.
Yeah, that's the insane thing. I'm super excited about AI. I teach AI at the University of Iceland, but employing it is so humane, like deploying it and operationalising it and getting people on board. This happens through discussions because I also had dialogues where the people I was talking to thought that I was trying to take away their job.
Data ethics and communication
Olli, kind of on a similar thread, there's an anonymous question from Slido. Do you sometimes meet with stakeholders or communicating back and forth with them where they're asking you for a specific product, but they don't realise or really accept that you don't have the data to create that?
Yes, yes. I've been in this position. I've been in this position, I'm going to say several times. In the case of the city, we have so much faulty data and data that needs to be cleaned. So we can't do the things that they're asking of us right now. But then I try to say that, but we will do it. It's bound to happen.
The dark side of data science is when your clients want a certain result, but the data is not supporting it. And I feel like when you're in, if you're in that position in your life, in your academic life, any professional life or academic life, never sacrifice your values for somebody else's needs.
And that also is true for the more benign version of this. If you don't have the data, just tell them and explain, like, I can't do that. And so we need to do a certain analysis for the Department of Welfare. So this is data on financial aid, but how the data is currently being served into our data lake and how we accept the data. We accept it in an aggregated form. The reason behind that is the financial aid system was built this way three years ago. And also because there at the time, they didn't have the GDPR regulations correctly in place to get more granular data. So the data on financial aid is aggregated on predefined groups.
But when you have aggregated data, you cannot answer all of the questions that you're being asked on certain groups. Let's say it's aggregated on gender, then you of course can't answer aggregation questions on AIDS, or is there a difference in financial aids between neighborhoods or whatever. And you need to have this aggregation, this granularity to do the correct aggregations. And I feel this is something that I'm constantly in this discussion. If I get aggregated data, then I can't answer questions on granular data, but going into the granular data, then you're answering a more, let's say more difficult discussion because that, that is something to do with, with GDPR regulations and personal rights.
Can I add a follow up question to this, if I can bite in, because the question of making the data say something that it doesn't, which is, it sounds like a very clear ethical question, but it reminds me, I think just last week, there was a really nice article. And they had this report where, I mean, it's a phenomenon that we all know that if you change the metric, you can change the results. So there is a relationship. If you, if you consider this phenomenon, but there isn't, if you consider this, it depends like how you define it. And it can be like, oh, I define it measured by hour, or I define it measured by a minute. Like, like Simpson's paradox, right?
Absolutely. And this is something that, I think this is difficult when, when you output a certain analysis, certain graphs and certain inference on the data, and that is out there. Then it goes to the city council or, or the media and it's misinterpreted. And, and, and, you know, that's dangerous. And I feel like going into the 21st century, I think one of the most important upskilling skills or something that people need to upskill is definitely the data literacy. And just now during the COVID epidemic, like there's so much misinformation out there. I was just reading an interesting article based on the data from Israel on vaccinations, which is a super example of the Simpson paradox, and where you could assume that, hey, vaccinated people are getting more effective than unvaccinated. This is totally not true when you look at the data in a correct, granular form.
Sorry. You hear this a lot that people need to be more data literate, but I, I would, I would argue against it actually to be the devil's advocate, because it's impossible to get millions or billions of people to become data literate, right? Nobody says you need to be auto literate. You need to be iPhone literate. You just get the iPhone and you do it, right? Nobody reads the iPhone manual. So I feel like it's the designer's job to make a product that's intuitive and simple that people use. And the iPhone did that brilliantly. And it's the data scientist's job to produce a data science product that is not going to confuse.
I agree with this statement and I certainly agree with that. When you're in a position where you need to share important data and important analysis, it is your responsibility as a data scientist or a teacher or a knowledge sharer to do it correctly and in a way that it can be read and not misinterpreted easily. I absolutely agree on that. Whereas I also, I think that, so that like here in Iceland, we have like this open course that everybody can take, which is on the basics of statistics or AI. It's super nice. It's actually super cool. But absolutely, can I just please quote comics like with great power comes great responsibility.
Data visualisation and aesthetics
It's true. It's just totally like that. I've been thinking about that a lot when, I mean, especially with COVID, but when you scroll through Twitter or LinkedIn and you see all these like plots where I can look at that and have no idea what it's actually saying. But how do you, how do you train people to do that or to be able to communicate through visualization?
Yes. Um, that's a difficult question. I guess like when I'm thinking about the products that we've made and they've gone through so many iterations and with my new changes for color schemes and just color schemes in scale, something like that, that matters a lot because unconsciously you, you like, like blue is cold, red is hot. Like this is just something that we think about and all of these things need to have their space and need to have the discussion. So what we have been doing and so we test this with our clients, but we also have like just an access to awesome designers that are thinking about this design of digital products. And we have meetings with them when we're building a product and just comments on, on, hey, what are you trying to convey here? Like these are not the correct colors.
So I would say in one sentence, pay attention to detail and pay attention to aesthetics because great looking graphs catch the eye and they have this, like they have their metal space, you know, these old school axle graphs, nobody wants to look, no person wants to like print them out and put them on the wall, but you want to build graphs that you could, you have in your living room almost. And interestingly, he came to a conference here in Iceland in 2010, when I was starting out my PhD and I met him there and he told me like, if you spent months doing research, spent at least a few days to make sure that your graphs look cool. And that's so true, like make space for it.
Closing thoughts and resources
Yeah, I know a few people have to drop as we just get past the top of the hour here. So I want to be sure to ask you, Olli, if people have other questions or want to connect with you, what's the best way to do so?
Yes. So you can find me on LinkedIn. I'd be happy to talk there. I think that is the easiest way to contact, to be in touch with me. Yeah, it's right there. I think that's like, that's my more unofficial, like where I just share my professional thoughts, not ask the chief data officer, like more like, but what I think about data science as a whole and please reach out. I think this forum that we have here is incredibly valuable. And so cool to see people from other countries and all the origins that will be thinking about similar things as I am.
One more quick thing I will ask you, someone asked me to keep making this a recurring question. Are there certain podcasts or resources that you recommend or things you listen to or read?
Yes. So lately I've been more focusing on the human side of things. So for a team leader, I would argue that this holds true for everybody, not only the team leader or the executives or managers, but there is a book called five dysfunctions of a team. That I would argue that everybody needs to read that is who is working within a team. And that has to do with realizing that you can't do confrontation. You can't go into difficult discussions on subject matters unless you have trust. So if you're working with somebody that doesn't trust you on a certain instinctual level, you cannot have a discussion like that. And these discussions where we are, have different opinions, where opinions clash, those are extremely valuable. Everybody's voices needs to be heard that, that coincides or is along the lines of having a diversified team. But that doesn't bring value unless there's trust within the team.
Such that I'm okay with sharing my thoughts and I'm not going to get harassed or bullied if I have different opinions than the team. So, so if I were to suggest one thing, then it's five dysfunctions of a team. Um, I'm with respect to podcasts, I'm really mainly doing Icelandic podcasts, so I'm not going to recommend them to you guys.
And there is a podcast on data science that are, it's called standard deviation something. I don't remember the name. I'm horrible with names. But I think that, yeah, that is pretty cool. And that being a good person one was reminding me of the four agreements. Yes, that's the one. It was the four agreements. Yeah. Yeah. Read four agreements. It's short, but it's just so comforting to think about when you're having confrontations and you will have confrontations at work when you're trying to move against the tide. It's like, hey, it's never about me. It is nothing to do with me. And then it just feels easier.
Sums up nicely your quote from the beginning that data science is more about people than about data. And I feel like if we were going to name this recording something, it would probably be that.
Sums up nicely your quote from the beginning that data science is more about people than about data.
Yeah. Yeah. And thank you for having me. It's been a vivid and lively discussion and all the best to you guys. Thank you. Have a great rest of the day, everyone.