Resources

Shelmith Kariuki | rKenyaCensus Package | RStudio

The rKenyaCensus package contains the results of the 2019 Kenya Population Census. The census exercise was carried out in August 2019, and the results were released in February 2020. Kenya leveraged on technology to capture data during cartographic mapping, enumeration and data transmission, making the 2019 Census the first paperless census to be conducted in Kenya. The data was published in four different pdf files (Volume 1 - Volume 4) which can be found in the Kenya National Bureau of statistics website. The data in its current form was open and accessible, but not usable and so there was need to convert it into a machine readable format. This data can be used by the government, non-governmental organizations and any other entities for data driven policy making and development. During the talk, I will explain the reasons behind development of the package, take you through the steps I took during the process and finally showcase analysis of certain aspects of the data. About Shelmith: Shelmith Kariuki is a Senior Data Analyst based in Nairobi, Kenya. She is an RStudio Certified Tidyverse trainer (https://education.rstudio.com/trainers/), currently working as a Data Analytics consultant with UN DESA. She previously worked as a Research Manager at Geopoll, and as a Data Analyst at Busara Center for Behavioral Economics. She also worked as an assistant lecturer in various Kenyan universities, teaching units in Statistics and Actuarial Science. She has extensive experience in data analysis using R. She co-organizes a community of R users in Nairobi (https://www.linkedin.com/feed/hashtag/nairobir/) and in Africa (https://twitter.com/AfricaRUsers). One of the missions of her community work is to make sure that there is an increased number of R adopters, in Africa. She is very passionate about training and using data analytics to drive development projects in Africa

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

It has been 14 years since the phrase data is the new oil was first used. This phrase was coined by Cliff Harville, implying that just as oil needs to be refined into gas, chemicals, plastics, etc., data must be broken down and analysed for it to have value. But what if it is not? Of what value is the data if it is open, accessible, but not usable?

Hi, my name is Shalmith Kariuki and I'm here to talk about a small project I embarked on in March 2020. rKenyaCensus is a package that contains the 2019 Kenya Population and Housing Census results. The Population Census exercise was held in August and this was the first time that Kenya was using technology for both cartographic mapping and enumeration. The survey questionnaire covered areas such as population characteristics, agriculture, education, information communication and technology, henceforth referred to as ICT, etc. This data was published in four PDF files, which can be found on the Kenya National Bureau of Statistics website.

Now, is this data open? Yes, it is available on the website. Is it accessible? Yes, anyone can download it, but is it usable in its current format? But why do we even care? Why is usability important?

ICT and education during COVID-19

Let us look at one of the aspects covered in the survey, ICT. We are going to look at the influence that ICT has had on education in Kenya during the COVID pandemic. But first, a few statistics. According to a new report by UNICEF, two out of three school-age children do not have internet access at home. In fact, the number of unconnected pupils is higher in Africa and Asia as compared to the rest of the world.

The first COVID case was announced in Kenya on 12 March and two days after, the government announced closure of schools and institutions were encouraged to move their learning online. The Ministry of Education also made a directive for the curriculum teaching to be delivered via radio, TV, YouTube and other digital platforms. And so, online learning began. Kenyans adapted very quickly. One university trained their students and staff and also entered into a partnership with a mobile network operator to ensure that their staff received cheaper data bundles specifically for e-learning.

Several other institutions did their best to heed to the government's directive. Entertainment programs were aired on TV during school hours and young kids quickly learned how to use tools such as Zoom and Google Hangouts. But then, online learning was not going well for everyone. Leaders from semi-arid areas came out claiming that online learning in their areas was just but a fallacy because their areas lacked basic infrastructure.

So, what basic infrastructure is required for smooth online learning? First, one will need a laptop or a mobile phone to access materials in electronic format. And these devices have to actually be connected to the internet so that you can actually access this information and electricity is needed to power both.

A lot of students took to social media to complain that online learning was not working for them. A lot claimed that they lacked the minimum basic requirements needed to partake in online learning. Electricity has also been a problem in Kenya. Even for us who have access to electricity, it's not reliable. I'm actually hoping that I'll be done with this presentation by the time lights go off. You can visit the Kenya Power Twitter handle to just see more of the complaints that we channel through that platform complaining about blackouts day in, day out.

Manzi and Charlie: two extremes

Many families do not have the devices and internet access that enable children to take part in remote schooling. Let's take an example of two extreme cases. Manzi comes from an average family. She has a bedroom of her own in their house. Manzi's father pays for internet at the end of each month and he recently purchased a new laptop for her. Manzi attends Shule Kubwa, which is a university based in the capital city of Kenya, Nairobi.

Charlie, on the other hand, comes from a humble background. His parents are not able to pay school fees for him and so he has to depend on the area's bursary fund. And after emerging top in the Kenya Secondary National Examination, Charlie got a scholarship to also attend Shule Kubwa. So Charlie and Manzi, together with some of the friends they grew up with, are in the same class. And while in school, they are subjected to the same conditions, the same lecture room, the same lecture period, the same exam, and so on.

But what happens now that they're at home? Well, Manzi is actually very happy. She gets to seclude herself in her room during class sessions. And while on her break, she's able to enjoy her favorite comedy on Netflix. But then for Charlie, he's actually depressed. He's even thinking of repeating the current class because he feels he has been left behind.

He's even thinking of repeating the current class because he feels he has been left behind.

After six months, schools reopened for a small subset of students, those preparing for the national examinations. In Kenya, national examinations take place at the end of each year. Now, an effective education response to the COVID-19 pandemic needs to be built on evidence and data. For example, what do we know about our population? How many families own radios and televisions? And do children actually use and actually learn from these online platforms? And that's why the development of our Kenya Census was important.

Building the rKenyaCensus package

Development of this package involved three steps. The first one was downloading the data from the Kenya National Bureau of Statistics website. The second one involved cleaning, scrapping, cleaning, and manipulating the data. I used the TabuLaser package to scrap the data and tidyverse and some bizarre functions for cleaning and manipulating it. The third step was the actual package development. And I'm very grateful to the authors of DevTools, TestFat, and UseThis who have made package development very easy. I'm also grateful to Hadley and Jenny for the R packages book that is written in such a simple language, such that it is easy to understand some of the concepts covered.

For consistency, the data sets were named in a similar manner as the tables in the published documents. So V1 and S42.1 is actually Table 2.1 in Volume 1. Data cleaning and manipulation skills are really important in this task. I had to deal with a lot of trailing and leading spaces, extra white spaces, blank spaces, and other weird characters. I also needed to restructure the data by generating new variables that distinguished counties from sub-counties. I'm grateful to the R4DS community, especially Scott and Stephen who are of great help when I encountered errors while coding, while I still battle with rejects up to date.

I also created a small Shiny app where anyone, especially those who do not use R, could download the data in either .csv or .xlsx format.

What the data reveals about online learning readiness

Now could some of this data have helped us to determine whether Kenya was ready for online learning? Let's have a look. This map shows the proportion of households where the main type of lighting is electricity. The data can be found in Volume 4, Table 2.19. So in essence, in the data, it's going to be V4 underscore 2.19. The darker the region, the higher the proportion.

The darkest region is the capital city of Kenya, which is Nairobi. Most regions in the semi-arid areas, that's the light yellow regions, use either burnt wood, torches, or paraffin lamps as a source of light. Charlie comes from one of these areas. In case anyone is curious, these maps were created using SF and ggplot2.

The pattern is the same when we look at access to computers or laptops and usage of the internet. Areas that have high population based in the rural areas tend to be at a disadvantage. There's no way Charlie could have reaped the benefits of online learning as much as Manzi would have, even at a bare minimum. If this kind of analysis was available beforehand, the insights obtained could have helped the government and stakeholders make better decisions regarding learning during the COVID pandemic.

The data has been used by the International Center for Humanitarian Affairs, which is part of the Red Cross, to assess COVID-19 high-risk areas in Kenya. It's my prayer that most entities will adopt this data to develop insights that will help drive development in Kenya.

Conclusion

And to conclude, we have only touched on the ICT part of this data. In essence, only two out of 74 tables. Imagine how much information we could get if we analysed all the census data, and imagine how much more information and insights we could get if we converted all the datasets that are lying in various websites into machine-readable formats that are easy to analyse. Insights obtained from this data will help the country prepare itself better for the next calamity.

R is an open source language, and there are so many learning resources online. It's up to us as R developers and R users to use the knowledge and skills we have to develop tools that will help us in solving problems that exist in our society. Thank you for listening to me. Questions are welcome.

It's up to us as R developers and R users to use the knowledge and skills we have to develop tools that will help us in solving problems that exist in our society.