I’m pleased to announced that the first version of xml2 is now available on CRAN. Xml2 is a wrapper around the comprehensive libxml2 C library that makes it easier to work with XML and HTML in R:
-
Read XML and HTML with
read_xml()andread_html(). -
Navigate the tree with
xml_children(),xml_siblings()andxml_parent(). Alternatively, use xpath to jump directly to the nodes you’re interested in withxml_find_one()andxml_find_all(). Get the full path to a node withxml_path(). -
Extract various components of a node with
xml_text(),xml_attrs(),xml_attr(), andxml_name(). -
Convert to list with
as_list(). -
Where appropriate, functions support namespaces with a global url -> prefix lookup table. See
xml_ns()for more details. -
Convert relative urls to absolute with
url_absolute(), and transform in the opposite direction withurl_relative(). Escape and unescape special characters withurl_escape()andurl_unescape(). -
Support for modifying and creating xml documents in planned in a future version.
This package owes a debt of gratitude to Duncan Temple Lang who’s XML package has made it possible to use XML with R for almost 15 years!
Usage#
You can install it by running:
|
|
(If you’re on a mac, you might need to wait a couple of days - CRAN is busy rebuilding all the packages for R 3.2.0 so it’s running a bit behind.)
Here’s a small example working with an inline XML document:
|
|
Development#
Xml2 is still under active development. If notice any problems (including crashes), please try the development version , and if that doesn’t work, file an issue .

