Resources

Motley Crews: Collaborating with Quarto - posit::conf(2023)

Presented by Susan McMillan, Wyl Schuth, and Michael Zenz Adoption of Quarto for document creation has transformed the collaborative workflow for our small higher-education analytics team. Historically, content experts wrote in Word documents and data analysts used R for statistics and graphics. Specialization in different software tools created challenges for producing collaborative analytic reports, but Quarto has solved this problem. We will describe how we use Quarto for writing and editing text, embedding statistical analysis and graphics, and producing reports with a standard style in multiple formats, including web pages. Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Elevating your reports. Session Code: TALK-1157

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Welcome to Motley Crews Collaborating with Quarto. My name is Will Schuth. I'm going to be presenting with my colleagues Susan McMillan and Mike Zent.

This is our Motley Crew. Pictured on the right is the director of our office and a drummer, Mike Fleger, who's not able to join us today. We are a team of analysts for the Liberal Arts College at the University of Wisconsin-Madison. We analyze student outcomes for administration and faculty. And an important thing to know about us is that none of us are data scientists or developers.

We work in a collaborative environment where our different types of analyses and professional backgrounds often contribute to a single analytic product. Our backgrounds are, I'm a historian, Susan's a political scientist, and Mike is a philosopher.

I conduct mainly narrative analyses of curriculum and university academic policy. I'm also our primary editor, and I've been writing in Markdown for years.

The fragmented workflow problem

As we began to work together and increase our collaboration, we quickly realized that we had a significant problem. And our problem was that we had fragmented workflows.

So every time that we attempted a collaborative analysis, we were dealing with an assemblage of data that was obtained using query tools like Hyperion and Toad Data Point and SQL Developer that we then analyzed using Excel, or R as the situation called for. We plotted it. We merged it with narrative analyses that were written and drawn from flat files that were stored in Obsidian or other PKMs, and then that we tailored together in Word, published as a PDF because, my goodness, do people like their PDFs? And then we distributed to our audiences via email.

This workflow, we quickly realized, was unsustainable. We needed to find a better way for us to work together. Keep in mind also, this is happening after our university has gone remote during COVID, and so we're each working remotely. We don't have an opportunity to meet together to do these things. We have to unite our work in some way.

So what we're going to talk to you is how we abandoned this and what we learned as we came to a new way of working together. I'm going to share with you why the form of the work matters. Susan's going to speak about how our new way of working together reduced barriers for the non-technical collaborators that we have. And Mike's going to share with you how this new way has enhanced the accuracy of the work that we do.

Why form matters

Design hurdles at the end of a long analytic project are mentally exhausting, and they can be very frustrating. As I shared, we were assembling draft and final analytic products by merging analyses that were drawn from different areas into a Word document. And we quickly realized that, like the James R. Thompson Center, which you see here pictured, this became a maintenance nightmare for us.

Every single time that we were asked to add a plot, when we realized that we needed to edit a caption, or when we thought, you know what, this section of this analysis really belongs somewhere else because it amplifies a message that we need our audiences to get. It was a maintenance nightmare. We revisited it every single time that we had one of those.

You may be familiar with the adage, form follows function. This comes from a principle of design that was advanced by Louis Sullivan, who's a prominent Chicago school architect. Sullivan went so far as to say that form following function was a law of human design. You can see his ethos in many buildings that he designed around Chicago.

This is the auditorium building, which you can visit in a short walk down Michigan Avenue. This photo was taken in 1890, about a year after the building was completed. And with respect to Sullivan, who was certainly at the forefront of his field when the 19th century turned to the 20th century, we realized that for folks who do the kind of work that we and many of you do, this simply doesn't go far enough.

So what we'd suggest instead is that the form of the work is its function. Like the John Hancock Center, the structure of the analysis that we create in our new way of working together is visible in the final analytic product. It's visible to us on the back end when we're looking at the source, when we can see how we structured the query, when we can see the R code that we use to analyze the data that we obtain from our query, when we can see the comments that'll explain to us seven months down the line when we've forgotten why we queried things the way that we did, what we were trying to do, and the other things that we tried. None of that's visible to our audiences, but it's there.

Like the John Hancock Center, the structure of the analysis that we create in our new way of working together is visible in the final analytic product.

And because we can more easily accommodate other requests that our audiences have, that structure is something that we can fall back on to do our work better. So the form of our final analysis facilitates its function now. And the function of producing our analysis fashions its final form.

Embracing Conway's Law and a unified workflow

How did we get there? We realized we had to embrace Conway's Law. Melvin Conway is a computer scientist, and he suggested that any organization that attempts to design a product or a system is going to inevitably in some way replicate that organization's structure within that product.

The example is, if you have a four-person group working on something and you need to build a bypass, you're going to have a four-component bypass. That's the basic concept. And what we realized is that we have disparate backgrounds that draw on different analytic styles, and we needed to find a way to bring those together.

We needed to unify the production environment for our analyses, and we needed to unify our workflow. And so that's what we did. We adopted a unified workflow that allows us to leverage RStudio and Quarto as the single place where we now query, process, analyze, and visualize data for our audiences, and where we write our narrative analyses and our executive summaries that we can share with those audiences to help them make sense of the data that we've given them.

Because we're working oftentimes remotely and in different places from each other, and we're working on different components of the analysis, it's really important that we have version control. You may use GitHub for version control, we use GitLab for our version control, so that when I'm working on doing some copy editing in a nearly final draft, and Mike and Susan are tweaking plots or doing some other things, maybe accommodating a last-minute request from a stakeholder, we're not getting in each other's way, and we're not having to recompile a PDF document to incorporate those changes.

And once we've reached the point where our document is finished and we feel like the analysis is ready to share with our audiences, we can publish it online using the pages feature of GitLab, and our audiences can view the document there, and if we have other things that need to be updated, it can be updated.

Reducing barriers for non-technical collaborators

Now Susan's going to share with you how this new workflow has reduced technical barriers for our collaborators.

Thanks, Will. Susan McMillan, I am the social scientist in our group, and despite many years of writing scripted code for multiple software packages, I embody the non-technical user. Computers still are mysterious to me, I don't know any Markdown, I didn't even, I never heard of Markdown actually, I didn't know what a CSS file was.

I worked in a company before I came to this job where we did that painful copy-paste process for every document we produced, analytic document, and it could be a really big document. But I produced a table, or a graph, or text, and I gave it to the document production team, and the document production team put all the disparate pieces together, and then they shipped it out to the client.

We are the document production team now, we're also the analysis team, we're everything, so this was not tenable. So when I got here, Mike was kind of our studio missionary, and he said, no, we're going to do this in R Markdown, I was like, okay. So I started learning R Markdown, which was probably a little bit painful for Mike, because we were still producing PDF files, and so I also had to learn a little bit of LaTeX, which was completely mysterious, but we were trying.

And then, our studio conference last year, we were introduced to Quarto, and we're like, well, there it is. So we went home and very rapidly turned our process into one that begins and ends with RStudio and Quarto, and this is what happened.

We are able to use the visual editor, so the point of this slide is only in the top corner, with the switch, the toggle between visual and source code, and the table is there, but the point here is that it's familiar enough to people who use Word, or some other word processor, but we'll just go with Word, that they can do things in this file.

So our supervisor, who's our expert at both policy and curriculum, needs to be able to write and edit in our documents, and he's able to do that here. He's able to see, oh, if I want a bold text, I can do it. If I want italicized text, I can do it. If I want to add a bullet list, I can do that.

So he doesn't want to learn R or R Markdown, and frankly, he's a little bit afraid of seeing the code. Like, I might put the table in like this. He doesn't want to see that, right? He wants to see just this. And it works for both of us, same file.

And as Will said, we can now publish our results to webpages. Mind you, the only point of this slide is that I published it as a webpage. I don't know HTML. I don't know anything about publications. I just pushed it to GitLab, and it was magically there.

So that's the whole point of this slide, that I was able to do it. And so Quarto reduced massively the technical barriers. It's not a chance I could have published a website on my own. Well, it would have taken a lot, but I mean, yeah.

And so Quarto reduced massively the technical barriers. It's not a chance I could have published a website on my own.

We're now able to publish to webpages. And our education people love their PDF files, but we kind of cold turkeyed them. We just said, now, here, the reports are going to be on webpages now, and it worked.

One of the other benefits to us is that we can control access to our reports now. We used to just send out a PDF file, and there it goes, right? Every now and then, we go to a meeting, and people are clutching printed versions, but now we can update, right? I'm sure it's happened to you. You send out a PDF report, and you find some kind of critical error, and then you have to republish it and resend it, and now you have a version control problem. Now Quarto and publishing via GitLab solves that problem for us.

In addition, our users get improved functionality, right? We have, in our college, I don't even know how many majors, 60-something majors, and a lot of our tables are long, because it's by major, by this, by that. In a PDF, you're pages later. Now we can have our users sort by column, and this table doesn't particularly have searchability, but they can search. And so deans might have a different question, for example, from a faculty member in a department. They can sort the table or search the table for the particular piece of information they're looking for without having to rifle through pages of PDF output.

So despite their reluctance to embrace some non-PDF format, it's been a really good thing for us, because we've literally unified our workflow. We can all work in the same file, and our users get improved functionality, whether they wanted it or not.

Improving accuracy with packages

So I'm going to talk about accuracy. And so one thing about me is, six years ago, I was teaching ethics, had never written a line of R code. I've written quite a bit since then. But no formal development training, I've run Linux at home, but that's about it.

And so what I want to say here is, you can improve accuracy without being a developer, essentially. So to do that, I'm going to talk about a little example.

So let's imagine that Susan and I are each given a task to produce a report that involves, in some way, computer science majors. And we need to know how many there were in 2019 and 2020, right? These might be separate reports. Sometimes we do reports apart from each other, don't talk to each other. And what happens, God forbid, we come up with different numbers. Oh, no. This is embarrassing. It's bad.

So Susan comes up with 2,266. I come up with 2,389, right? This could affect funding for these departments. It could affect the analysis, right, because there's different students, it seems, in these. And it is, again, really embarrassing. So I'm going to go through this little example just to show you how you might solve some of these problems of accuracy.

So why is this happening? OK. So the reason in this case is because the term or semester data is coded in our system, right? I'm sure you have lots of codes in your work. In this case, it's a four-digit code. So for instance, 1204 is spring 2020. What a great code.

So this is how it works. So the century is the first one. So 0 is the 20th century, 1 is the 21st century. The second two digits are the last two digits of the year. But this is not the academic year or the calendar year. And this is actually going to come up as an issue, right? So I'm not sure what this is. It's a year. And then the last digit is fall, spring, summer, right? 2, 4, and 6, which, you know, I think they were thinking, what if there's a winter term someday? We want to have room for it. So there you go. 2, 4, 6. I love codes.

OK. So what's happening here? So this is ambiguity in academic year is the reason why we came up with different numbers. So an academic year, and this is a federal, I think it's federal, it's a federal definition. It's certainly for our reporting, runs from summer to spring. So the 2019-2020 academic year is from summer of 2019 to spring 2020, right?

I see I'm the wrong one, of course. I was thinking the codes were going to help me. The codes are never helpful, right? So I was looking, oh, look, there's 20, 2020. That's the 2019-2020 academic year. It's not, of course. And so I was unaware of the correct way to find academic year. And so that's why I came up with the incorrect answer.

So how do we fix this? Of course, packages. So for you, that's obvious, yeah. I don't need to talk too much about packages. But my point here is that I believe anyone can write them. So I believe, this is just a belief I have, that if you can write a function, and yes, you can write a function, I think anyone who can write our code can write a function, then you can write a package.

And furthermore, you can share them via GitHub or GitLab, and that allows for easy installation and updates. So if I update a package, now Susan or Will can download that new version, and all of our reports change automatically with a fix. There are sometimes fixes.

So we have one package that's called amutils. It has a function called ACADYear. And if you just type it in, you get transparently the code that goes into it. This is ugly R code, but you can just see it. It's there. It's great for learning. So if we have a new analyst, they can now see how this process is done. You can get documentation for this.

And this is the ugly stuff you have in RStudio. I think it's beautiful. But you can also produce beautiful websites with documentation and organize them. You get indices that show you all the functions that are involved in here. And so ACADemicYear is a really simple function. It just converts a term to ACADemicYear. We have some complex functions that actually took a long time to figure out and are difficult when doing onboarding with new people, and so this allows us to do that.

So finally, you can have different types of packages. So we have two types. So one is public. So the AIM utils has innocuous stuff like ACADYear, things that don't sort of expose any data or anything or any sort of proprietary information. We also have an AIM queries package that has queries in them that allow you to access our campus databases, and that requires you to sort of credential a bit behind in GitLab. So we're able to separate those into two different packages.

Recommendations

OK. So we want you to make your motley career a success. Just well, I hope. We're kind of a success. I don't know. So here's how we do it.

Adopt the unified workflow. So we're now querying to publication in a single environment. It's really exciting for us. So draw non-technical users with, you know, RStudio Visual Editor and also the online publication features of GitHub and GitLab. And finally, please use packages. You know, it is a great feature of R. It is amazing. They allow for accurate, consistent, well-documented processes, and, you know, we should all be using them for so many things.

Q&A

I'm going to combine two of these questions here. The first part is, do people still print your HTML reports? Yes. A few. Yeah. They look fine, though. They do.

Follow-up question. Do you give them only the HTML or do you use Quarto to sort of produce an HTML and a PDF from the same thing? No. No. We just... Here's the link. And we repeat that answer. Here's the link. And here's how you can engage with the data, which is something that you can't do on this piece of paper.

Did you guys crash any of the modules of STAT 327, which is apparently your university's R class, at any point? Or did you first pick up R independently? On the job. On the job. Yeah. Totally on the job. We taught ourself. I taught myself really badly. A lot of base R. But then I came to 2019 conference, and it changed my entire perspective on things. So this is a great resource.

So this one here leans a little bit into something that I also have been curious about. So the question is very specific. Do you integrate Learn at UW slash Canvas data into your analytics, as well as campus data?

So we primarily, if we're looking at student outcomes, we're analyzing data from our degree audit system, which is essentially final grades for students and their performance over the entirety of their academic careers. Things that are captured in the learning management system, and we use Canvas, are not things that we typically incorporate, because those belong to the instructors. If an instructor shares that with us, then yes, of course. Right? If they have a specific question about their class. So things students do.