Skip to content

Latest commit

 

History

History
364 lines (254 loc) · 9.58 KB

xml.md

File metadata and controls

364 lines (254 loc) · 9.58 KB

XML

Perfect provides XML & HTML parsing support via the Perfect-XML module.

Getting Started

In addition to the PerfectLib, you will need the Perfect-XML dependency in the Package.swift file:

.Package(url: "https://github.com/PerfectlySoft/Perfect-XML.git", majorVersion: 3)

In each Swift source file that references the XML classes, include the import directive:

import PerfectXML

macOS Build Notes

If you receive a compile error that says the following, you need to install and link libxml2.

note: you may be able to install libxml-2.0 using your system-packager:

    brew install libxml2

Compile Swift Module 'PerfectXML' (2 sources)
<module-includes>:1:9: note: in file included from <module-includes>:1:
#import "libxml2.h"

To install and link libxml2 with Homebrew, use the following two commands:

brew install libxml2
brew link --force libxml2

Linux Setup note

Ensure that you have installed libxml2-dev and pkg-config.

sudo apt-get install libxml2-dev pkg-config

Parsing an XML string

Instantiate an XDocument object with your XML string:

let xDoc = XDocument(fromSource: <XMLSource>)

Now you can get the root node of the XML structure by using the documentElement property.

Convert the node tree to String

Converting the XML node tree to a string, with optional pretty-print formatting:

let xDoc = XDocument(fromSource: rssXML)
let prettyString = xDoc?.string(pretty: true)
let plainString = xDoc?.string(pretty: false)

Working with Nodes

XML is a structured document standard consisting of nodes in the format <A>B</A>. Each node has a tag name and either a value, or sub nodes we call "children". Each node has several important properties:

  • nodeValue
  • nodeName
  • parentNode
  • childNodes

Each node also has a getElementsByTagName method that recursively searches through it and its children to return an array of all nodes that have that name. This method makes it easy to find a single value in the XML file.

To demonstrate, this recursive function iterates through all elements in the node:

func printChildrenName(_ xNode: XNode) {

	// try treating the current node as a text node
	guard let text = xNode as? XText else {

			// print out the node info
			print("Name:\t\(xNode.nodeName)\tType:\(xNode.nodeType)\t(children)\n")

			// find out children of the node
			for n in xNode.childNodes {

				// call the function recursively and take the same produces
				printChildrenName(n)
			}
			return
	}

	// it is a text node and print it out
	print("Name:\t\(xNode.nodeName)\tType:\(xNode.nodeType)\tValue:\t\(text.nodeValue!)\n")
}

printChildrenName( xDoc!)

Access Node by Tag Name

The snippet below shows the method of getElementsByTagName in a general manner:

func testTag(tagName: String) {

	// use .getElementsByTagName to get this node.
	// check if there is such a tag in the xml document
		guard let node = xDoc?.documentElement?.getElementsByTagName(tagName) else {
			print("There is no such a tag: `\(tagName)`\n")
			return
		}

		// if is, get the first matched
		guard let value = node.first?.nodeValue else {
			print("Tag `\(tagName)` has no value\n")
			return
		}

		// show the node value
		print("Tag '\(tagName)' has a value of '\(value)'\n")
}

testTag(tagName: "link")
testTag(tagName: "description")

Access Node by ID

Alternatively querying a node by its id using .getElementById():

func testID(id: String) {

		// Access node by its id, if available
		guard let node = xDoc?.getElementById(id) else {
			print("There is no such a id: `\(id)`\n")
			return
		}

		guard let value = node.nodeValue else {
			print("id `\(id)` has no value\n")
			return
		}

		print("id '\(id)' has a value of '\(value)'\n")
}

testID(id: "rssID")
testID(id: "xmlID")

XElement

The method .getElementsByTagName() returns an array of nodes of type XElement.

The following code demonstrates how to iterate all element in this array:

func showItems() {
		// get all items with tag name of "item"
		let feedItems = xDoc?.documentElement?.getElementsByTagName("item")

		// checkout how many items indeed.
		let itemsCount = feedItems?.count
		print("There are totally \(itemsCount!) Items Found\n")

		// iterate all items in the result set
		for item in feedItems!
		{
				let title = item.getElementsByTagName("title").first?.nodeValue
				let link = item.getElementsByTagName("link").first?.nodeValue
				let description = item.getElementsByTagName("description").first?.nodeValue
				print("Title: \(title!)\tLink: \(link!)\tDescription: \(description!)\n")
		}
}

showItems()

Relationships of a Node Family

PerfectXML provides a convenient way to access all relationships of a specific XML node: Parent, Siblings & Children.

Parent of a Node

Parent node can be accessed using parentNode:

func showParent(tag: String) {

		guard let node = xDoc?.documentElement?.getElementsByTagName(tag).first else {
			print("There is no such a tag '\(tag)'.\n")
			return
		}

		// Accessing the parent node
		guard let parent = node.parentNode else {
			print("Tag '\(tag)' is the root node.\n")
			return
		}
		let name = parent.nodeName
		print("Parent of '\(tag)' is '\(name)'\n")
}

showParent(tag: "link")

Node Siblings

Each XML node can have two siblings: previousSibling and nextSibling.

func showSiblings (tag: String) {

		let node = xDoc?.documentElement?.getElementsByTagName(tag).first

		// Check the previous sibling of current node
		let previousNode = node?.previousSibling
		var name = previousNode?.nodeName
		var value = previousNode?.nodeValue
		print("Previous Sibling of \(tag) is \(name!)\t\(value!)\n")

		// Check the next sibling of current node
		let nextNode = node?.nextSibling
		name = nextNode?.nodeName
		value = nextNode?.nodeValue
		print("Next Sibling of \(tag) is \(name!)\t\(value!)\n")
}

showSiblings(tag: "link")
showSiblings(tag: "description")

First & Last Child

If an XML node has a child, .firstChild and .lastChild can be used:

func firstLast() {
		let node = xDoc?.documentElement?.getElementsByTagName("channel").first

		/// retrieve the first child
		let firstChild = node?.firstChild
		var name = firstChild?.nodeName
		var value = firstChild?.nodeValue
		print("First Child of Channel is \(name!)\t\(value!)\n")

		/// retrieve the last child
		let lastChild = node?.lastChild
		name = lastChild?.nodeName
		value = lastChild?.nodeValue
		print("Last Child of Channel is \(name!)\t\(value!)\n")
}

firstLast()

Node Attributes

Any XML node or element can have attributes:

<node attribute1="value of attribute1" attribute2="value of attribute2">
</node>

Node method .getAttribute(name: String) provides the functionality of accessing attributes:

func showAttributes() {
		let node = xDoc?.documentElement?.getElementsByTagName("title").first

		/// get some attributes of a node
		let att1 = node?.getAttribute(name: "attribute1")
		print("attribute1 of title is \(att1)\n")

		let att2 = node?.getAttributeNode(name: "attribute2")
		print("attribute2 of title is \(att2?.value)\n")
}

showAttributes()

Namespaces

XML namespaces are used for providing uniquely named elements and attributes in an XML document. An XML instance may contain element or attribute names from more than one XML vocabulary. If each vocabulary is given a namespace, the ambiguity between identically named elements or attributes can be resolved.

Both .getElementsByTagName() and .getAttributeNode() have namespace versions: .getElementsByTagNameNS() and .getAttributeNodeNS(). In these cases namespaceURI and localName are required.

The following code demonstrates the usage of .getElementsByTagNameNS() and .getNamedItemNS():

func showNamespaces(){
	let deeper = xDoc?.documentElement?.getElementsByTagName("deeper").first
	let atts = deeper?.firstChild?.attributes;
	let item = atts?.getNamedItemNS(namespaceURI: "foo:bar", localName: "atr2")
	print("Namespace of deeper has an attribute of \(item?.nodeValue)\n")

	let foos = xDoc?.documentElement?.getElementsByTagNameNS(namespaceURI: "foo:bar", localName: "fool")
	var count = foos?.count
	let node = foos?.first
	let name = node?.nodeName
	let localName = node?.localName
	let prefix = node?.prefix
	let nuri = node?.namespaceURI

	print("Namespace of 'foo:bar' has \(count!) element:\n")
	print("Node Name: \(name!)\n")
	print("Local Name: \(localName!)\n")
	print("Prefix: \(prefix!)\n")
	print("Namespace URI: \(nuri!)\n")

	let children = node?.childNodes

	count = children?.count

	let a = node?.firstChild
	let b = node?.lastChild

	let na = a?.nodeName
	let nb = b?.nodeName
	let va = a?.nodeValue
	let vb = b?.nodeValue

	print("This node also has \(count!) children.\n")
	print("The first one is called '\(na!)' with value of '\(va!)'.\n")
	print("And the last one is called '\(nb!)' with value of '\(vb!)'\n")
}

showNamespaces()

XPath

XPath (XML Path Language) is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g. strings, numbers, or Boolean values) from the content of an XML document.

The following code demonstrates extracting a specific path:

func showXPath(xpath: String) {
    /// Use .extract() method to deal with the XPath request.
		let pathResource = xDoc?.extract(path: xpath)
		print("XPath '\(xpath)':\n\(pathResource!)\n")
}

showXPath(xpath: "/rss/channel/item")
showXPath(xpath: "/rss/channel/title/@attribute1")
showXPath(xpath: "/rss/channel/link/text()")
showXPath(xpath: "/rss/channel/item[2]/deeper/foo:bar")