Resources

Yihui Xie | pagedown Creating beautiful PDFs with R Markdown and CSS | RStudio (2019)

The traditional way to beautiful PDFs is often through LaTeX or Word, but have you ever thought of printing a web page to PDF? Web technologies (HTML/CSS/JavaScript) are becoming more and more amazing. It is entirely possible to create high-quality PDFs through Google Chrome or Chromium now. Web pages are usually single-page documents, but they can be paginated thanks to the JavaScript library Paged.js, so that you can have elements like headers, footers, and page margins for the printing purpose. In this talk, we introduce a new R package, pagedown (https://github.com/rstudio/pagedown), to create PDF documents based on R Markdown and Paged.js. Applications of pagedown includes, but not limited to, books, articles, posters, resumes, letters, and business cards. With the power of CSS and JavaScript, you can typeset your documents with amazing elegance (e.g., a single line of CSS, "tr:nth-child(even) { background: #eee; }", will give you a striped table, and "border-radius: 50%;" gives you a circular element) and power (e.g., HTML Widgets). VIEW MATERIALS https://bit.ly/pagedown

image: thumbnail.jpg

Transcript#

This transcript was generated automatically and may contain errors.

Good morning, everyone. So if you have ever used R Markdown before, you can probably recognize me. So since about 2014, I have created and co-authored a series of packages related to R Markdown, like the R Markdown itself, and then 2016 bookdown, 2017 blogdown. So I'm responsible for a lot of things down, except the current government shutdown.

So today I'm going to introduce a new member in the R Markdown ecosystem named pagedown. So this is joint work with an excellent collaborator, actually, I should say the best collaborator I have ever met in my life, named Romain Lustreau, it's a French name, sorry, I don't really I can't really pronounce that name precisely.

So just before I get started, just a word of caution. This package is still very young, so I would treat it as experimental. So don't set your expectation too high. So after this talk, you may feel that, oh, this is great, so we can finally throw away LaTeX and Word, and unfortunately, the answer is not yet. There are still many rough edges.

Why pagedown?

Well, why do I choose to work on this pagedown package? So basically this pagedown just allows you to skip Word or LaTeX, but it basically creates paged documents from your web pages, like HTML pages. I personally have strong belief in HTML, like, basically sometimes I just say in HTML and the web I trust, so basically the reasons are, like, so first of all, HTML is very accessible because you only need a web browser, right? Pretty much everyone has a browser, so the output from pagedown is an HTML page, so pretty much everyone can have access to that without installing any special software.

And it's also very easy to embed interactive and rich media on web pages, and it's also much easier to parse HTML pages than PDF documents. For example, if you have a table on your HTML page, technically it's very easy to parse the data in the table. So the bottom line is that I think HTML and CSS will eventually catch up with LaTeX in typesetting, but it will be difficult for LaTeX or Word to catch up in other aspects of HTML, such as the interactivity.

So the bottom line is that I think HTML and CSS will eventually catch up with LaTeX in typesetting, but it will be difficult for LaTeX or Word to catch up in other aspects of HTML, such as the interactivity.

Installation and usage

So the installation is pretty simple. So although I have released an initial version of pagedown to CRAN, I still strongly recommend you to test this package from GitHub, so you just install that from GitHub, and note that this package requires a higher version of Pandoc, Pandoc version 2, which is currently bundled in the preview version of RStudio, and because the output is HTML, and I would recommend you to use Google Chrome or Chromium to view or print the HTML pages generated from this package.

So the usage is pretty simple. So if you're an RStudio user, you can just create a new RMarkdown document with the pagedown output from it through the menu file, new file, RMarkdown from template, and you can find quite a few pagedown templates in that list. So for those who do not use RStudio, I'm not sure if you're aware of the name of this conference. But that's totally fine. You don't have to use RStudio. You can use any editor you like. So all the example or the template documents are in the source code of this package, so you can find them in the installation directory of your package or on GitHub.

To preview the HTML output from pagedown, I recommend, like, for complicated reasons, as Hadley would say, that the most reliable to preview a pagedown document is through the weird RStudio add-in named infinite moon reader, or equivalently you can call a function to preview it. And to generate a PDF from your HTML document, you can just open that HTML document in your web browser in Chrome and print it to PDF there. There's also a function named Chrome print in this package, but at the moment it doesn't really work perfectly well. So it will be improved in the future.

Output formats and examples

And next I'm going to show you some of the output existing output formats and examples in this package. So the first one I would like to show you is the paged HTML documents. So this is based on another JavaScript library named paged.js. This is basically a library to implement the W3C spec named paged media properties for CSS. So basically it allows you to typeset your HTML pages through CSS, but the problem is that no web browsers really support that W3C spec, so paged.js did that job. So it implemented that spec so that you can view a paged document in your web browser.

The name of the output format is pagedown colon colon HTML underscore paged. So just specify that as your output format and you click the knit button or use the infinite moon reader and you will see a paged HTML document in your browser. This is just a screenshot created from the PDF generated in Chrome. So that is what a paged HTML document looks like. If you open the link here, pagedown.rbind.io, hopefully you can see that in your web browser that you've got a paged HTML document. So in this document you can have things like the title page of your book, a table of content, several chapters, math equations. If your eyes are really good, you can probably see the running header on the fourth page. So you can have headers, page numbers, footers, all kinds of elements that you could see as you would see in a PDF document.

So besides creating paged documents, you can also create other applications like a business card. So on the left-hand side, that is just a full R Markdown source. So you can see there's title, author, output format, business card, and then the body of your card. So the output would be something like the picture on the right-hand side. So this is basically an unsolicited business card I made for Mr. Shiny, also known as the president of RStudio.

You can also create a resume from pagedown. The output format is pagedown, colon, colon, HTML, underscore, resume. And R Markdown source basically looks like this. So you can specify a sidebar in the aside part. So then you can have several subsections. You can have bullet lists. And in the main area, you can have an arbitrary number of blocks. So like listing your education, working experience. So basically each block is a level 2 header. And then inside each block, you can list the details about your resume. So that is what a resume produced from pagedown looks like. So if you click that link, you can also see a web page that looks almost identical. So by the way, this is actually a real resume. Although this is just an example in the pagedown package, this is real. So if you are looking for a Ph.D. student with strong computing skills and knowledge in bioinformatics, you may consider this person.

And you can also create posters, HTML posters from pagedown. So currently I have included two poster formats. One is called Poster Relax. The other is called Poster Jacobs. So I will show you what they look like. So this is Poster Relax. Well, actually this CSS style is borrowed from another package named Relaxed. It's not an R package, but a Node.js package. I only ported their CSS into pagedown. So yeah, hopefully you can see our lovely Carl Broman in the middle, smiling at you. So that is what a Relaxed poster looks like.

And then we've got a Poster Jacobs. Yeah, basically I found this style on a LaTeX website. So I think this is probably very famous in the LaTeX community to create posters. So basically I saw that appearance, and I wrote all the CSS from scratch. In one evening, after my little kids went to bed, it took me about two or three hours to write CSS. The total number of lines of CSS is only a little over 100. So the CSS is really simple. And in case you are curious, the technique behind the CSS is called the CSS grid, which allows you to arrange elements on your posters on a grid.

You can also write letters with pagedown if you want. So there's a format named HTML underscore letter. So as an example, I wrote a letter of recommendation for Amy Tanaka, who is a really cool hacker, I believe, in our community. So the R Markdown source also looks simple. It's from me, to the hiring manager in the school of Ninja, the hacker's university, and the address is 404 North Front Road in the undefined city in an NA state. And then, yeah, the letter would look like this. So you can open that in your web browser and see the real letter.

So actually, since this is just HTML, as I said, there are many advantages of using HTML. So for example, my slides are actually HTML, so I can embed the whole letter in my slides through an iframe. So you can see the letter live here. The one thing that I'm proud of this is that this is probably the first letter in history that contains a GIF. So this shows this GIF shows Amy's talent, and the last GIF actually is my description of Amy. She's such a just, like, such a cool hacker.

So well, actually, despite of all these kinds of advances in technology, I feel a little sad that, I mean, we receive so many letters and e-mails every day, and we are no longer excited like ten years ago, right? Ten years ago, you would be excited to receive an e-mail, and now you would be excited to receive a real letter.

So besides letters, you can also write books with page down. There's a format named bookdown underscore CRC, which is a format for the publisher Chapman Hall CRC. So just to give you a quick overview, so this is what the book would like actually this is a real book. The bookdown book reproduced with CSS and HTML.

Journal format and the JSS

Last I want to talk a journal format. So before that, I want to mention the talk from Catherine Mullen at the USAR 2014 conference where she mentioned the history and future of the Journal of Statistical Software, JSS. So that JSS was originally founded by Yangde Liu. So in Catherine's slides, she actually presented the original e-mail of Yangde Liu, where you can find his original proposal. So that was in 1995. 1995. That was quite early. It was only a few years after I learned how to put on my own pants and stopped wetting my bed at night.

So in the proposal, he mentioned four points. He would like to create a journal that is electronic and freely available and then done in HTML and interactive and peer reviewed. I want to highlight two points in that proposal. So the journal should be done in HTML. And I would say that this is quite possible now because of page.js. And then the paper should contain interactive content, and that is also possible now because we have interactive things like HTML widgets and shiny apps. So in fact, we have my excellent collaborator, Romain, has actually recreated the style of JSS through HTML and CSS. So I would like you to guess which one is HTML CSS and which one is LaTeX. It's hard to tell, right? It's just amazing, I should say.

Reflections on typesetting and technology

So last I want to digress a little bit in this technical talk to talk about something totally non-technical because I feel it is very important. So last year I read this book by Sigmund Freud, and I should say it heavily influenced me. Civilization and its discontent. Because I could relate this to many things in my real life. Like PDF and its discontent, word and its discontent, typesetting and its discontent, journal publication and its discontent. There are just so many things I'm not happy with.

So basically the main thing that he talked about in this book was the friction between civilization and the individual. And the friction is from the individual's instinctive freedom and the civilization's demand for conformity and repression of instincts. So basically civilization is built upon control, beauty, hygiene, and order. If you look at these words, you may often think of your journal editor. Control, beauty, hygiene, and order. And in the last paragraph of that book, Freud said, so if you survey the aims of culture in Denver and the means that it employs, you would come to the conclusion that the whole effort is not worth it. And the individual will be unable to tolerate the outcome of civilization. So I'd like to repeat, maybe the whole effort of typesetting is just not worth it. It's just not worth the trouble.

So I'd like to repeat, maybe the whole effort of typesetting is just not worth it. It's just not worth the trouble.

Last year the quote that influenced me most was this, we become what we behold. We shape our tools and our tools shape us. Sometimes I just feel really confused. So when you feel confused, you can take a look at or you can hear what little kids say and observe what they do. So last Saturday I told my older son that I would come to this conference and give a talk. Because every day when I floss or brush his teeth, he would watch a video. So last Saturday I had him watch my last year's art student conference talk. And after he finished, so that was about block down. After he finished, his only comment was, I want to watch Bleepy. In case you don't know Bleepy, this is Bleepy and his garbage truck. So yeah. I thought my last year's talk was awesome. Of course not as important as Bleepy.

So just think about some journal typesetting guidelines. Like sometimes you are required to use code, SAM, preformatted keyboards, variable environment option command file, package, command package, URL, so many latex commands. Are they really important? I really don't know. And I doubt so. So sometimes you just as an adult, we just forget what our original Bleepy was. So why do we write journal papers? Why do we write books? To share knowledge, right? Why do we write letters? We show our care for our friends or family members, right? So will Markdown and CSS save us? Maybe. Because Markdown is very limited. But on the other hand, I believe it's very good to have constraints. So who is causing the trouble, actually? Is it humans or technology? Will CSS save us? Probably not. So sometimes in terms of working, I mean, the biggest tragedy is probably not that people are lazy and they don't work. The real tragedy is that people are working extremely hard on trivialities. So just trivial things, like typesetting details.

I hope pagedown can save you some effort in typesetting, because I think HTML and CSS are just fantastic. Thank you.

Q&A

One thing that I struggle with, whoa, there we go. It's really difficult to get our CFO to read anything unless it is in the body of an email. So my question for pagedown is with the CSS that's in it, I know when you open it in Chrome, you can download external files, but in an email client, you can't. So could I render that into an email?

Render it into email? Yeah. So you can render it into HTML and then send an email with the HTML file as the body? Yes. There are packages. Actually, the next speaker, Rich, he has written what's the name of that package? GT. No. I mean the package to send emails. Oh, last one. Last? Last email.

One of the reasons for LaTeX popularity is equation, mathematical typesetting. Where is pagedown or Markdown on that? For typesetting math and equations? Yeah. There's support from MathJax.