Resources

From Journalist to Coder: Creating a Web Publication with Quarto - posit::conf(2023)

Presented by Brian Tarran This is the story of how a Royal Statistical Society writer discovered Quarto, learned how to code (a bit), and built realworlddatascience.net, an online publication for the data science community. In March 2022, I was tasked by the Royal Statistical Society with creating a new online publication: a data science website for data science professionals. I've been a print journalist for 20 years and have worked on websites in that time, but my coding ability began and ended with wrapping href tags around text and images. That is until I discovered Quarto. In this talk, I describe how I explored, learned, and fell in love with the Quarto publishing system, how I used it to build a website -- Real World Data Science (realworlddatascience.net) -- and how the open source community mindset helped shape my thinking about what a new publication could and should be! Presented at Posit Conference, between Sept 19-20 2023, Learn more at posit.co/conference. -------------------------- Talk Track: Quarto (1). Session Code: TALK-1071

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

I am Brian Tarrant. I'm the editor of a website called Real World Data Science. And my talk today is to tell you about how we built real world data science. This is a screenshot of it here, how we built that using Quarto. And I guess some of my journey of discovery and learning as I moved from being a journalist who knew pretty much nothing about any of these tools to somebody who built a website, which I'm quite proud of. And I hope you'll go and explore it.

So I'm a journalist. That means I think a lot about words and the impact that they can have on people and how a few words, you know, strung together in a short phrase can, you know, have a real impact on your life, make you think differently about the world, change your perspective, whatever it might be. For example, will you marry me? Right? Someone says that to you, you're going to feel pretty good about yourself for a while, right? A few years later, maybe things aren't going so well. When you hear those words, I want a divorce. I know, you know, that will knock you for six, right?

Well, I had a kind of experience like that that knocked me back a little bit recently when in March 2022, my boss came to me and said, build us a website. Now, that isn't quite as dramatic as will you marry me or I want a divorce, but it kind of knocked me back a little bit because the background to this was I had applied for a job to build a new online publication for my employer. I'd worked as a print journalist most of my career, but I'd worked on websites, I'd done some design projects. I'd never built a website from the ground up, but I'd been involved in things and, you know, I was feeling cocky. I knew a bit of HTML. I thought I'd be fine. How difficult can it be to build a website?

And then I started to think about all the steps that I would need to go through in order to build the website. And I felt like I maybe had bitten off more than I could chew. But that all changed when I started speaking to some data scientists and someone said to me, do you know Quarto? This four word phrase, and it really changed my perspective on what's possible to do with digital publications now.

The challenge and context

So I'll back up a little bit and tell you a bit about the context of where I was coming from. So the challenge, I worked for the Royal Statistical Society. We're a membership organisation for statisticians. We've been going for about 200 years. And our president at the time felt that we needed to do more to support statisticians that are working as data scientists and also to contribute more to the data science conversation. Statisticians have a role to play in that and we didn't feel like we were doing as much as we could do. So we wanted to create a new publication for data science professionals, something that would focus on real world examples of data science practice and that would generally kind of support knowledge sharing within the community.

When I came into the project, I was thinking I wanted to make sure that the website that we built would be a suitable home for data scientists to contribute to because in my previous role, I was editing a magazine, a statistics magazine called Significance. And I spent about eight years really annoying contributors because they would present these lovingly crafted late tech articles and they'd send them to me and I'd say, excellent, thank you very much, could you redo that in Word? And often that was followed by a deep sigh. So I thought, right, if we're going to do this, we're going to build something new. We want to build something that uses tools that data scientists use and that, you know, it's kind of speaking your language, essentially.

Discovering Quarto

So I started phoning around, speaking to people I know, and that's when I heard that phrase, do you know Quarto? And the first time I heard it, I just jotted it down in my notebook. I think I misspelt it, but it was the second time that this came up. Another person I was speaking to, somebody else said, do you know Quarto? And I thought, I probably should look into this. So I went across to the website. This is what it looks like sort of now. It was broadly the same back when I started looking at it. And this really intrigued me straight away because, you know, even with my sort of very rudimentary knowledge, I knew that on the left was like the core, the raw underlying code, if you like, and that was generating the outputs on the right. I liked the outputs on the right. I thought they looked nice and clean and crisp, and they had a kind of just a simplicity that I would like in a kind of online publication.

I also liked the fact that the, I could see that the figures were being generated from the code. I'd spent a lot of time, again, annoying contributors to Significance Magazine by telling them that they hadn't output in the right resolution or the right size or whatever it might be. So this could kind of deal with all those sorts of problems. So I thought, great, I'm going to learn Quarto and also borrow that guy's shirt.

Learning Quarto, VS Code, and GitHub

So step one, went to the Quarto website. Easy, right? Download. I've downloaded lots of things in the past on my PC. Not a problem. This is where things started to get difficult for me. I know what a text editor is. I have, I recognized RStudio and Jupyter. I think I once accessed a Jupyter notebook and accidentally collapsed all the text chunks and code cells, and I didn't know what I was doing and got out of there pretty quick. So clearly I needed to learn more than just Quarto. I needed to learn a tool as well. I picked Visual Studio Code because it was first on the list. So two things now, learn Quarto, learn Visual Studio Code. Okay, fine.

Well, anyway, fortunately, Quarto has made this easy because there's a whole load of tutorials on the site, obviously starting with helping me to get familiar with working with a Quarto document and then transitioning over to looking at how to build a website. So I think maybe it took me several days to a week, and I built my first website. A little demo. It's not the world's best website, but it was my first website, and I felt proud of it. You could see that, you know, I'd started to play around with how we might position, you know, different sort of content blocks and things like that.

But there was more work that needed to be done, not least the fact that this was on my local machine, and a website's only really good if it's actually on the internet so everybody can see it. So I went back to the Quarto guide, and then I found this thing about GitHub Pages, right? So I'd been using GitHub mostly as a file, like a kind of personal file storage. I didn't really understand it, but I was familiar with the word, so I figured GitHub Pages would be the way to go for this. GitHub Action, this sounded exciting. Automatically render your files. Didn't want to have to be doing that manually every time, so I thought I would try this. But I quickly realized looking at a few GitHub Actions that I had no idea what I was doing, and I had no idea how these things worked. So now I was learning Quarto, learning Visual Studio Code, and GitHub as well. I have made a huge mistake.

I really thought that that was it. But I then discovered the section on the Quarto website, which was the gallery, and then this little subsection here about websites. These three websites in particular, the NASA and OpenScape sites, Amine's data science in a box, and along with the Quarto website, these were really helpful to me because as well as looking at the websites, I could go onto their GitHub repos and look at how the sites were built. I could interrogate the actions and see broadly how these things were working and try and figure it out slowly in my head. I mean, the actions I'd built failed multiple times, but eventually I got somewhere. I got somewhere with it.

GitHub as a collaboration platform

And as I was digging around in these repos, I realized something that I guess I didn't know about GitHub before, which is that it's not just a personal storage for files. It's a platform for collaboration and co-creation. I don't have a software or engineering background or anything like that, so I never would have had to use it for that reason. But the fact that I could see in each of these repos people working together, collaborating on projects, sharing ideas, solving problems, this really changed my whole perspective about how we could build our website.

So I was thinking about it in a kind of old media sense before, which is we'll build a website, people will come to us and submit articles, and we'll publish them. But I actually thought, well, maybe we could actually create this website together as a community, that it isn't just something that we build and maintain. It's something that everybody can feel involved in and can contribute to.

Maybe we could actually create this website together as a community, that it isn't just something that we build and maintain. It's something that everybody can feel involved in and can contribute to.

I wanted to make sure that if you wanted to share your expertise, but your expertise or your passion wasn't necessarily writing articles about data science projects you work on, maybe you could contribute to the maintenance of the site. Maybe you've seen a really cool thing in Quarto that you think we should add to our site, and you could help us do that. Maybe there's something we could, an extension or whatever, that we could build and share with the wider community. And when I went back to my priorities or the challenge that was set for me, I realized that this idea here about creating a support community around what we wanted to build was already in there. It took this process for me to, I guess, for it to be highlighted to me and to draw it out.

The site launches and the community grows

So if you go to the site now, you'll see that we are up and running. We have people who are contributing to the front end. They're sharing case studies. They are putting together tutorials and explainers about data science tools and methods. They're sharing the stories of their careers and how they got into data science, what they've learned, and giving advice to people who are looking to follow in their footsteps. And we've also created a space for people to talk about the perspectives and big themes and challenges in data science.

But the other thing that started to happen, and this has happened organically, is that people have started to think about, as I was saying, supporting the website beyond just creating content for it. So about a month ago now, a data science student in the Netherlands, Finn Ole Hoene, got in touch with us and said, you know, I'd like to build a template for real-world data science so that if people want to write for you, there's a kind of example Quarto document in there which shows all the different code features that you can use with the article. And when they render it, it will look like it will look on your website. And I think that's really important. If you're thinking about contributing to a publication, thinking about how you're going to design and set out your content, being able to see what the finished product is going to look like is key.

And this really warmed my heart that Finn took this step to do this, because it's something I wanted to do, but I didn't have the kind of know-how to do it. I knew we needed a template of some sort, but I didn't know how to build it. So Finn stepped up and helped us to create it. So that's, I think, showing the strength of the community.

So these are just a sample of the people that have contributed to the site so far. These are only the contributors who have GitHub accounts. There are people that are kind of contributing through different avenues. But now we're starting to see not just people creating content, but people like Finn and Zoe, who I've highlighted, who are helping us sort of work through the science bugs and all that sort of stuff.

I feel like I'm learning a lot from working with this community. And it's been really beneficial.

What the community taught me

One of the things that I pointed out when I first went to the Quarto site was that there's this live rendering of figures from the document. Well, we quickly figured out with some work that this was going to kill our website rendering speed, so we couldn't use that. So the community helped us to figure out a way around that, where we could still output the figures, but drop them into the article as image files instead. Code annotation. This is a cool new feature that's in Quarto. Clearly, I wasn't reading the updates diligently enough, because Finn had to point this out to me that this was a thing, right, that you could build. And it's a really interactive, tactile way for people to kind of understand what's going on in your code.

And I also learned that designing for HTML was a lot harder than it first appeared. We got a design company to kind of come up with the brand and the look of the website for us, but they literally just dropped a load of in-design documents off and said, you go and implement that now. So I had to figure out CSS pretty quickly. And I did that by, again, going to look at other people's websites, seeing what they've done, and then trying to adapt it for our own purposes.

And because I feel like I've learned so much from the community, I'm also trying to sort of give back and share what I've learned, as basic as it might be. So I've been working for a while now with the American Statistical Association's Justice, Equity, Diversity, and Inclusion group to help support and build their website. And that's been really a really good experience. I've been really pleased to be involved in that. And recently, another project came up at the Royal Statistical Society to develop some guidance on data visualization. We were thinking of it as like a kind of notes for contributors for our publications, but following the experience that we had with real-world data science and building that, we decided that actually we were going to take a different approach to it. So now, like real-world data science, it's a living document, essentially. It's built in Quarto. It has a GitHub repo behind it. All the source files are on there. And we're encouraging people to contribute and develop that. Because data visualization changes all the time. Standards and best practices evolve. And we want this document to be a document that evolves.

From journalist to coder

So this is where I am today. I rather, I guess, bullishly described my talk as from journalist to coder. I don't think I should say I'm a coder. I've done some coding. But I wish I'd learned coding at school. I think my career might have gone in a bit of a different direction. I really do enjoy the technical side of the job and spend far too much time on it. Fortunately, my boss isn't to hear that. But for people like me, for journalists and others who don't have a coding background, the message I wanted to get across really is that if I can learn it through Quarto experimenting with this platform, anyone can really.

And also, the discovery about open source collaborative publication has been a really good one for me. Because I, again, am a journalist, old media by training. You're kind of taught to, you know, it'd be quite insular. You only work in small teams or individually. You keep things under wraps. You don't share it with people until you're ready to publish because you don't want anyone to gazump you and steal your story before you publish it. But actually, I think there's a lot of benefit and advantage for being public and open about what it is that you're doing and inviting people in to contribute to that.

I think there's a lot of benefit and advantage for being public and open about what it is that you're doing and inviting people in to contribute to that.

And so this is where we are today as a website. So we went from this to this. And please do check out the site and see how it's changed. This was where it was last week. We published 50 articles so far. We've got the 19 GitHub contributors and many more to come, hopefully. 12,000 users and 40,000 views since we launched. So March 2022, build us a website. That's what my boss said to me. And I can now say I built a website. And I'm very pleased about that. But I think it'd probably be more accurate to say that we, as in the community of people behind real world data science, are building a website. And it continues to develop. And if I had to leave you with a short phrase of my own to think about, it would be, why not join us?

So thank you very much. Here are our details. Thanks for listening.

Q&A

We do have some questions on Slido. If you want to put your own there, please do.

So the first one is, Brian, what is your solution for hosting and deploying your Quarto website? Oh, so we use GitHub Pages, basically. That's how we use it. I know that there's a danger that the bigger the site gets, because there starts being caps on the amount of traffic you can have going to the site. But we'll cross that bridge when we come to it.

Could the website template be used for offline deliverables as well, so that the brand matches between the two? Yeah, yeah, you can do that. So it can render out to, I guess, oh no, it actually won't. No, it won't. The styling won't carry across. It'll just be in HTML, the styling. But you can. You can use the template to output to Word docs, but it won't, the theme and stuff like that won't carry across. Well, it would be quite nice to do that, yeah. But I was inspired by Mine's talk about all the things that you can do in terms of journal templates and things like that. So yeah, I have to look into that and research it in more depth.

What is your next Quarto project going to be? My next Quarto project? Well, I am slowly building my own personal website, but that was really, I only kind of set that up as a place where I could break things without worrying about the actual website falling over. But at some point, I would like to finish that and maybe start like blogging and doing things, writing about my interests, like video games and movies and stuff like that.

Yeah, I think as I say, I've kind of become a bit of an advocate for it now. And I think any project that we're going to do in the Royal Statistical Society that involves an online publication at some point, I think we're going to make sure it's rooted in Quarto and in that kind of open source collaborative space.

How are you handling multiple contributors? Meaning do you have different rights levels or permission levels for your contributors? So at the moment, we don't have multiple people contributing directly to the GitHub repo. So what it tends to work is that people might build an article in their own repository and then they'll share the files with us. Over time, yeah, we would look to do that as people become more familiar with it. But actually it's most of the people we've worked with so far haven't kind of gone all in using GitHub in the way we would like them to, as in being the source of creating the files, submitting the article, editing, reviewing and all that sort of stuff. But that's kind of where we want to get to. That's our vision anyway.

How long did it take you to get the first version online? Oh, okay. It took quite a long time getting it up and running, I have to say. I don't know. I didn't keep count, but I think my email inbox, there's a folder somewhere of all the failed runs of the GitHub Action. And I was wondering, why is it not displaying yet? And then I went in there and it was like, I don't know, a hundred failed attempts. So yeah, it was pretty sad. And I still have problems with GitHub Actions now. We're trying to figure out on this data visualization guidance site, somewhere where there's a thing we want to do with it, but we just can't get the GitHub Actions to work. And there are people more knowledgeable than me who can't figure it out.

Thank you so much for the great presentation. Thanks for listening.