Hezi Buba | Do It For Yourself: Creating a data input platform using R

Transcript#

This transcript was generated automatically and may contain errors.

So, I want to share with you a story of evolution, and as you can see, this evolution unsurprisingly starts at sea. See, this is me, and I just started out my master's degree in marine ecology, and I went out to the Red Sea to do some fish sampling, and when I was there, I was completely overwhelmed by the beauty of the coral reef, with all these beautiful shapes, colors, and really nice-looking fish.

But when I was done sampling and looking at all these crazy fish, and it was time for me to go in and fill in the data, I was welcomed by this, ugly, unintuitive, hard-to-navigate Microsoft Access forms, and this is the starting point for our journey.

This journey is a walk-through of how I moved from using Microsoft Access to using Google Sheets and using R to do our own lab custom-made database.

And I hope that whether you're using Microsoft Access now or Google Sheets or any of these different alternatives for making data forms, you will find some lessons that I've learned, something that you could use in your workflow.

Lastly, what I really like is that we have Drive Share, and we use this function in order to send out specific e-mail for specific users so they can only change their own data. So they can only fill out their own form. And it's really good because they don't have to mess around with anything in the database. And they can't accidentally erase or change someone else's data.

So this is what we get. This is a folder for a day, and we have the metadata up there, and as you can see, we have individual spreadsheets for each of the different observers.

User-oriented data forms

In order to do that, I had to do some, to be creative in a way, and let's zoom in into one of the spreadsheets. So first thing, as you can see, the colors are much better. They're not really bright anymore.

And I wanted to have a strong connection between the environmental data and the fish data. So we're doing ecology, which is basically the relationship between a species and its environment. So we wanted to have this demonstrated in our database, in our forms. To do that, you can see columns A and B don't follow the same logic as columns C and H. Some people would say it's unorthodox, incorrect.

For those of you who can't see that, A and B, each row does not represent an observation. It's a long format. But once you realize that you can go from this long format into a wide format fairly easily using tidyverse , then it opens up a whole world of possibilities for you.

For example, on another project where we do quadrat sampling, I managed to take the exact piece of paper that surveyors go to, to the field, and I managed to copy it almost verbatim into a digital form, making the transition from the actual survey data, from the actual real-world data to the digital data much more intuitive for my users.

And I encourage you all to think about how user-oriented your data forms are at the moment. Because I know I've been taught as well that you should make your data as computer-readable as possible for analysis, visualization, et cetera. But at the same time, all of us were taught how easy it is to manipulate the data shape. So we could all stand to benefit from having more user-oriented data so we could then get so we could minimize the errors from those users, because it would be much more intuitive for them to fill in the data forms and have we make our data much better at the end.

And I encourage you all to think about how user-oriented your data forms are at the moment. Because I know I've been taught as well that you should make your data as computer-readable as possible for analysis, visualization, et cetera. But at the same time, all of us were taught how easy it is to manipulate the data shape.