-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This page defines a few XML directives supported for interacting with complex XML documents through Dataprep directives.
PARSE-XML-TO-DOCUMENT directive converts an XML string or XML byte array into a XML Document.
parse-xml-to-document :<column>
PARSE-XML-TO-DOCUMENT directive transforms an XML string into a Document object. This is equivalent to performing the following within Java.
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
builder = builderFactory.newDocumentBuilder();
...
Document xmlDocument = builder.parse(<column value>);
...
This directive replaces :<column> value from string type to XML Document type. If there any issues with XML parsing, an error is thrown terminating the processing.
EXTRACT-XPATH directive extracts a XML node using a XPath.
extract-xpath '<xpath>' :<source column> :<destination column>
This directive extracts the node value using XPath. The XPath is applied to the :<source column> and the result is stored in :<destination column>. The directive can be applied directly on XML that is of type string or on XML Document generated by PARSE-XML-TO-DOCUMENT directive.
Following is an example of XML and the XPaths that are valid and defintion of different ways the XML nodes can be extracted.
<?xml version="1.0"?>
<Employees>
<Employee emplid="1111" type="admin">
<firstname>John</firstname>
<lastname>Watson</lastname>
<age>30</age>
<email>johnwatson@sh.com</email>
</Employee>
<Employee emplid="2222" type="admin">
<firstname>Sherlock</firstname>
<lastname>Homes</lastname>
<age>32</age>
<email>sherlock@sh.com</email>
</Employee>
<Employee emplid="3333" type="user">
<firstname>Jim</firstname>
<lastname>Moriarty</lastname>
<age>52</age>
<email>jim@sh.com</email>
</Employee>
<Employee emplid="4444" type="user">
<firstname>Mycroft</firstname>
<lastname>Holmes</lastname>
<age>41</age>
<email>mycroft@sh.com</email>
</Employee>
</Employees>
| Expression | Description |
|---|---|
| nodename | Selects all nodes with the name “nodename” |
/ |
Selects from the root node |
// |
Selects nodes in the document from the current node that match the selection no matter where they are |
. |
Selects the current node |
.. |
Selects the parent of the current node |
@ |
Selects attributes |
employee |
Selects all nodes with the name “employee” |
employees/employee |
Selects all employee elements that are children of employees |
//employee |
Selects all book elements no matter where they are in the document |
Below list of expressions are called Predicates. The Predicates are defined in square brackets [ … ]. They are used to find a specific node or a node that contains a specific value.
Path Expression Result
-
/employees/employee[1]Selects the first employee element that is the child of the employees element. -
/employees/employee[last()]Selects the last employee element that is the child of the employees element -
/employees/employee[last()-1]Selects the last but one employee element that is the child of the employees element -
//employee[@type='admin']Selects all the employee elements that have an attribute named type with a value of ‘admin’
Copyright @ 2017 Cask Data, Inc.
- Home