We are pleased to announced that xml2 1.0.0 is now available on CRAN. Xml2 is a wrapper around the comprehensive libxml2 C library, and makes it easy to work with XML and HTML files in R. Install the latest version with:
|
|
There are three major improvements in 1.0.0:
-
You can now modify and create XML documents.
-
xml_find_first()replacesxml_find_one(), and provides better semantics for missing nodes. -
Improved namespace handling when working with XPath.
There are many other small improvements and bug fixes: please see the release notes for a complete list.
Modification and creation#
xml2 now supports modification and creation of XML nodes. This includes new functions xml_new_document(), xml_new_child(), xml_new_sibling(), xml_set_namespace(), xml_remove(), xml_replace(), xml_root(), and replacement methods for xml_name(), xml_attr(), xml_attrs() and xml_text().
The basic process of creating an XML document by hand looks something like this:
|
|
For a complete description of creation and mutation, please see vignette("modification", package = "xml2")
.
xml_find_first()#
xml_find_one() has been deprecated in favor of xml_find_first(). xml_find_first() now always returns a single node: if there are multiple matches, it returns the first (without a warning), and if there are no matches, it returns a new xml_missing object.
This makes it much easier to work with ragged/inconsistent hierarchies:
|
|
Missing nodes are replaced by missing values in functions that return vectors:
|
|
XPath and namespaces#
XPath is challenging to use if your document contains any namespaces:
|
|
To make life slightly easier, the default xml_ns() object is automatically passed to xml_find_*():
|
|
If you just want to avoid the hassle of namespaces altogether, we have a new nuclear option: xml_ns_strip():
|
|

