
Jesse Sadler | Learning and using the tidyverse for historical research | RStudio (2019)
My talk will discuss how R, the tidyverse, and the community around R helped me to learn to code and create my first R package. My positive experiences with the resources for learning R and the community itself led me to create a blog detailing my experiences with R as a way to pass along the knowledge that I gained. The next step was to develop my first package. The debkeepr package integrates non-decimal monetary systems of pounds, shillings, and pence into R, making it possible to accurately analyze and visualize historical account books. It is my hope that debkeepr can help bring to light crucial and interesting social interactions that are buried in economic manuscripts, making these stories accessible to a wider audience. VIEW MATERIALS https://github.com/jessesadler/rstudioconf-2019-slides About the Author Jesse Sadler I am an early modern historian interested in the social and familial basis of politics, religion, and trade. I received a Ph.D. in European History from UCLA in 2015 and have taught courses on cultural and intellectual history of early modern Europe and the Atlantic. My research investigates the familial basis of the early modern capitalism through archival research on two mercantile families from Antwerp at the end of the sixteenth and beginning of the seventeenth century. I am currently working on a manuscript that argues for the significance of sibling relationships and inheritance in the development of early modern trade. My manuscript places concepts such as patriarchy, emotion, exile, and friendship at the heart of the efficacy of long-distance trade networks and the growth of capitalism
image: thumbnail.jpg
Transcript#
This transcript was generated automatically and may contain errors.
Alright, thank you for all, thank you everyone for being here, the last talk. Thank you to everyone at RStudio for putting on this great conference and for inviting me to give this talk which will be a bit of a change of pace as I am much more of a historian than a programmer.
So my name is Jesse Sadler and I'm currently a lecturer at Loyola Marymount University in Los Angeles.
And today, instead of going into some technical details about TidyEval, though I have used TidyEval, I am going to talk about how the tidyverse helped me to learn to code and then it helped me to learn to create something to make a package that solved a problem that I had long had as a researcher in history.
Research background
As a historian, my research is based around studying sibling relationships and inheritance in early modern Europe and in the development of early modern capitalism. Specifically, I look at two merchant families from Antwerp and their social relationships, the sibling relationships and their inheritance.
Basically, I spend a lot of my time reading letters from the 16th century, reading letters, lawsuits that were made between family members especially, testaments and account books. And so I do this in archives in the Netherlands and Belgium. There's even an archive that I work on that's in a castle in Belgium. So that's fun.
Learning to code
So I had long been interested in learning to code or to do something with what we call in humanities, digital humanities. And I had been interested in this, but I didn't know where to start. The power of coding from the outside was very alluring.
As I learned more about it, I was interested in the ability to build something that was set up exactly for what I wanted to do. I had used spreadsheets. I was very much interested in the separation of data entry, because I have to enter most of my own data from the archives. And separating that from data analysis, as Jenny Bryan has talked about a lot. And so I was interested in it, but I didn't know what to do.
And a lot of what people say when people come up and they say, oh, I want to learn to code. They say, OK, well, pick a project. Well, I didn't know what a project was. OK, I'll pick a project. What's a project? What programming language am I going to need? What do I even need? I don't know what I need.
So this really kind of kept me for a long time in trying to take what I knew from history, from the humanities, and learning a whole new skill. I didn't really know what to do.
And so I think that the Tidy Tuesday whole project is really interesting in this regard, because it demonstrates, OK, here's something that you can do. And it's manageable, and it's reasonable, and you can get it. But I didn't know that.
So I said, OK, I'm going to, after a long period of not doing anything, I said, OK, I'm going to actually put my foot down and learn this whole digital thing. So I said, OK, I think I have a project that would be a project. I have in the archives 6,000 letters, or over 6,000 letters, that were sent to a merchant, Daniel von der Molen, over a period of 22 years, from 1578 to 1600. And I said, OK, I could probably map these.
And so I hem and hawed, and I thought about, OK, is there a GUI application that I could just put some data in and get something out? And I said, no, I'm going to learn how to do it by myself. I had some help from a friend, and I looked around, and I said, OK, why not this whole R thing?
How the tidyverse helped
Basically, well, how did I learn how to program in R, and how did the tidyverse help me? That's my talk.
So the short story is that I basically just read Gerrit Grohlmann's and Hadley Wickham's R for data science. I sat there, and I had it pinned to my browser, and I would read a chapter, and I would take notes, and then I would read it again. And so that's pretty much how I learned.
But there's a lot of sophisticated arguments about nonstandard evaluation and whether the syntax in tidyverse is better than it is in base R and all those things, and I'm not going to talk about that, because as a beginner, it didn't really matter to me which way was better. It just mattered that there was a way.
It just mattered that there was a way.
And tidyverse, I think, really helped me in that it was nice and bounded, and you could do things with it, but it was also limited. Yes, dplyr is a huge package, but you don't need to know at least at the beginning about most of dplyr. So I really enjoyed the bounded nature of it. You read in data with read R, you do something with it with dplyr, and then you visualize it with ggplot, and you're like, yes, I have something.
So that ability, it showed me that, okay, one, I can learn to do this, so that's good, but two, that it would be valuable for me to put in the effort to learn how to do this.
So what really scared me, I think, at the beginning of learning to code in R was one of the things that makes R great, which is all the packages. There's so many different ways to do all these different things. But I was not interested in that. And so at the beginning, what I really was doing was I took solace in library tidyverse and just typed that in, and then know that everything will be okay.
And so here I really have to thank the entire R community, RStudio community, tidyverse community for helping me learn, for all the great resources that there are, for keeping me excited about learning more. I started a blog where I went through sort of my learning process at jessysadler.com, and the response that I've gotten from that sort of has been way larger than I would have ever expected.
In the first project that I was working on, the SF package came out, right, or was really getting going right as I was doing that. So it was great. I really had fun delving into SF and into GIS. And so far, I have a map of letters up to 1591. It's not the code that's holding me back. It's going through and cataloging all those letters. Which is a little bit more— when you have to actually input your own data, the coding seems easier. At least it takes less time.
The debkeepr package
So what I want to do for the rest of the talk is to discuss a new project that I've been working on, a package that I have built that deals specifically with a historic issue. And so we're really going down to narrow use case, but hopefully you'll bear with me here.
So the package that I have developed, I've called DebKeeper, which is short for Double Entry Bookkeeping, or Double Entry Bookkeeper, which I know sounds amazing already. So this is on GitHub right now. But the package is basically— there's more I want to do with it, but it's basically ready. And it has a package down website with three vignettes and descriptions of how to do things.
The historical problem: non-decimal currency
So let me start— before I get into what that is, let me discuss what the historical issue that I was seeking to solve is. So I'm interested in the history of accounting. And that might not sound that interesting. Or you may be an accountant. I don't know. But the reason I'm interested in the history of accounting is because accounting deals with social relationships. It's not the accounting part, it's the social relationships.
And why accounting is particularly interesting for me, in my context, is that in the early modern period, in most of the low countries where I do my work, inheritance was perfectly equal, meaning that all male and female heirs received the exact same amount of inheritance. And when you're dealing with merchant families, that means accounting has to go into figuring out how much the estate is worth and then how much each individual person gets. And it has to be exactly right.
So merchant families inheritance involves accounting. Now one problem with that, other than it's accounting, is that the currency that they use, as you can see on the right, is essentially pounds, shillings, and pence. One pound, if you can't remember before pre-decimalization, equals 20 shillings. One shilling is equal to 12 pence. So this creates a number of problems because it's not decimal. Firstly, arithmetic calculations are cumbersome. I never thought I would be doing so much arithmetic as a historian. So I have pages and pages of just doing handwritten arithmetic.
It's not hard, but more importantly, probably, is how to deal with tripartite non-decimal values in a database. And so this has the same issues if you're going to try and think about it in R or any computer. Computers are decimal. Firstly, you have three separate units to make up one value. The units have non-decimal bases, and to make things worse, the bases can be different. So 20 and 12 is the most usual, but there's also 20 and 16, and there's 60 and 12, and all sorts of different things.
The LSD class
In the DebKeeper package, I created a class that I call LSD, which comes from the Latin names for pounds, shillings, and pence, Libra, Solidus, and Denarius. So you're getting some Latin in here. So what does it do? Basically, it makes these values a numeric vector of length three. It also has a bases attribute that keeps track of what the bases for the shillings and pence are, and then they are stored as lists so that you can use them as a list column and you can have multiple in one object.
So this is basically what it looks like here from a vector, Deb as LSD, Deb as in double-entry bookkeeping, and then you can also input these values as separate columns in a spreadsheet or something, a CSV, and then bring them together with a function called Deb LSD gather.
So the coolest thing that I first started off with, and you might not think this is cool, but it was really cool to me, one might say magical, was that I could normalize the values. So I could add up pounds, add up shillings, add up pence, and then get what that value actually is. And when I got this to work, and it's just pretty much arithmetic behind the scenes, it really was nice. Or you can add these together within a function call.
So let's do some harder arithmetic. So we can compare the DebKeeper method with the method from the third edition of the Encyclopedia Britannica from 1797. So this is how they actually talked about doing arithmetic. So this is in the arithmetic article. So here you can see on the left a fairly complex way in which they multiply 15 pounds, 3 shillings, and 8 pence by 32. They break it down into 8, multiply it by 8, and then multiply it by 4. And they're doing, they're carrying over the units sort of behind the scenes. Division is even harder.
So I'm going to concentrate on, there's two examples here, but the example on the right is, and here we get to different bases. So this is a weight. So it's dividing 345 hundredweight, 1 quarter, and 8 pounds by 22. And if you try and figure out what they're doing here, it's really confusing. And they have little dots to figure out where they are, but it's not helpful. And so we have to change our bases. And so that's what you see there at the bottom of the third line there, is that the bases are, so a quarter weight is, there's 4 in a hundredweight, and there's 28 pounds in a quarter. I didn't know this either before I looked at this.
Visualizing historical account books
Okay, so it's nice that we get to not have to do all this, like fifth grade math. I mean, I never knew that fifth grade math would be so important. But what about larger account books?
So this is a picture of the person whose account book I've worked on the most, and the account book itself. Jan Delafayette de Oude, or the Elder. So his account book, if you look at the inside, it looks like this. And it's about 300 pages, and then there's two of them. They cover the period from his death on November 8, 1582, until the end of December, 1594. Even then, the heirs hadn't received all their inheritance, and in fact, the heirs continued to argue over the inheritance until basically they died in the 1610s, and then their children also continued to argue over it.
So what I want to do, I don't have much time, but I just want to show some, instead of showing the code to do this, show some visualizations. So basically, this is things that I couldn't do because I wouldn't want to put all this data into a spreadsheet because I didn't know how to do anything with it, and now I can.
Here you have an overview of the beginning of the estate, when he passed away. In the red circles show the accounts that are in debt to the estate, and the blue in credit to the estate. And so the biggest blue shows basically the amount that Jan de Aude's assets were greater than his liabilities, 41,112 pounds, 10 shillings, 5 pence. And remember, the pence mattered because they had to be equally divided.
Here you just have a little line chart showing how and when the nine heirs received their inheritance. This is my version of a financial chart, but it goes over many years. You see that before his death, he owed his children money because he owed his children money for their maternal inheritance, which makes things even worse. Then there's a long gap because of the problems that they had, and then at the end, you still see nothing that they didn't become even.
So just one last thing just to show what's possible. This is a network graph, again, showing the accounts and how they're related to each other. So this has obviously gotten me a long way from where I started, but basically I was able to get here because of the skills that I gained by looking into the tidyverse, and that's pretty much what I used in the package.
So thank you for listening, and thank you for having such a great conference.
