Colin Gillespie | How to win an AI Hackathon, without using AI

Transcript#

This transcript was generated automatically and may contain errors.

hackathons, without using AI hacking. I will explain the basics of data, spatial data. Northumbrian water has two sites in the UK, one by the border in Northumbria in England and down in the south.

We have spatial data on that. We had time series. We had data on 15-minute intervals, we had data on that. We had leakage estimates, so how much water was actually lost due to leakage. We had flow rates.

So in the UK, you can imagine how the water sort of authority, the water companies have evolved over the last few decades. And we have something called a DMA. It's a collection of zones of postcodes, some are quite big, some are quite small, and the flow rate is just how much water is passing through, how much water is being used by people.

We have the reported leaks, it's basically an Excel spreadsheet with some dates, times, postcodes and sort of what the person said over the phone. We have pipe size, so that was measurements in metric, so centimetres, millimetres. Imperial UK, Imperial US, Imperial Roman, you name it, we have pipe location. That was more an indication rather than an exact measurement.

Again, pipes have been laid over the UK for decades, so they knew there was a pipe on this road somewhere on this road and they just hoped for the best. We've got pipe type, so you've got plastic, you've got metal. Turns out there's also asbestos, which, to be honest, worried me somewhat, but asbestos, we were told, was fine as long as no one drills into it, so we're all safe.

We've got social data, that's basically saying who's living in the area, is it students, is it families, is it business, what sort of mixture. We've got weather, rain, sun, not so often, but how much rain we've got, how much business, what sort of businesses are nearby, what their usage and then we've got stuff. What I'm guessing is someone had a week to put this data together. The first four days, they got all the stuff on the right, and then they spent the last day just putting in other stuff, and I can't remember what the other stuff was, but it's just general stuff, so it might be interesting. So this is what we had.

So our solution is not to try and predict leaks. Our solution is not to try and predict where leaks are going to happen, but to reduce engineering time. We can get that two hours down to about 10 or 15 minutes.

Building the Shiny app in half a day

Now, we're now in day two of the hack. And on the second half of the day we have to do a presentation of what we've built. So we've got half a day to do an app, and there's myself and Seb. So what we need to do is we need to decide what we're going to make and decide what to bluff, or lie, or pretend, or whatever you want to call it.

So for example, if we can demo a single DMA, we don't need to show two, three, four, five. We can do one, we don't need to do more. If we can display pipe location, we don't need to tell them about pipe type, that's obvious. We must be able to do it. So we've decided what to make and what to bluff.

So this is a screenshot of the winning app. This first screenshot contains lots of bluffing. So for example, in the top left-hand corner, there's a little person that indicates we've got authentication. We can do authentication, we do authentication for our customers. We did not do authentication on that half a day. We've got some high-priority DMAs all bluffed. We can do that. We didn't need to do that in a day.

Here we've got a screenshot of what an engineer could possibly see. So this was a long column of different graphs and different data. The first couple of graphs were nice and busy and interactive. The last few graphs were a bit more static in nature. And if you look at that little blue slider, when that was clicked on, the graph was going grey, and that was indicating that it wasn't going to be part of an overall report the engineer would take with them. Also an idea about taking a picture and then being able to upload it onto the app. Again, we didn't do this. It was just an indication. And this was all done using lots of dplyr, ggplot, Shiny app, and all done in half a day.

So the app has now been used in production. I'm the second person on the right, the tall baldy chap. And I would love to show you screenshots, but because it's got national infrastructure data, it's got locations of pipes, we can't just show screenshots easily. So that was when we won it two years ago. We also won it this year as well. So sorry for bragging, but we won this year's hackathon as well. And we also won that one as well. So I'm not really sorry at all about bragging. I'm now just bragging completely.

Summary and lessons learned

It's obvious, but think about your problem. And if someone told me before the hack, I would say, of course you'll think about your problem. But I really didn't. I jumped in with both feet. I'm good at dplyr, I'm good at R, I'm good at building models, and I went straight for that. And that's not the best way. Talk to people, ask them what the problem really is.

And that's not the best way. Talk to people, ask them what the problem really is.

Also, don't send me to a hackathon without others. So the one hackathon that we've lost is the only one that I've been to by myself. So that's somewhat embarrassing. So I'm now in this strange situation where if the team go on a hack, I'm not sure if I want them to win or lose, because, yeah.

Anyway, so thank you very much for listening. Sorry I couldn't join you in San Francisco. It's really painful watching Twitter and just seeing all this. Isn't our conference absolutely wonderful? So it's painful over in Newcastle. And by all means, please come over and say hi, not to me, but to Seb and the others at the stall, and thank you very much for listening. Thanks.

Thank you so much, Colin. That was great. Do you have any advice on where to find some hackathons in North America?

Not so many. Not so much. Typically, one should know would be my answer. I think you've caught me with that question. I suspect user groups would know, so if you're part of an R user group or a data science user group, typically any hackathons would approach the user group first of all for a bit of advertising. So that's where I think you'd find. And once you sort of stumble into that field, you then find them everywhere.

Awesome. Well, thank you so much for joining us from the UK. We really appreciate it. So this concludes this track. I believe lunch is going to be served somewhere out there around 1pm. So thank you so much.

Colin Gillespie | How to win an AI Hackathon, without using AI | RStudio (2020)

Transcript#

The machine learning framing

Day one: coding vs. coffee

The real problem: reducing engineering time

Building the Shiny app in half a day

Summary and lessons learned

Featured software#

rstudio

Shiny

tidyverse