Skip to content

DataTreeShell

Hika van den Hoven edited this page May 5, 2017 · 16 revisions

The DataTreeShell wrapper class

This is a wrapper class around DATAtree/HTMLtree/JSONtree. It offers some pre- and post-processing functionality and auto-selection between JSON and HTML. The DATAtree once initialized can be accessed trough the .searchtree attribute. While the DATAtree classes focus on the data to be read into the tree and what to extract from it, DataTreeShell focuses on the data_def with helping to retrieve the data to be used, further unifying the data-extraction and ordering the data after extraction being its functions.
The following functions in DATAtree are also available here. See there for a description:

DataTreeGrab.DataTreeShell() takes the following options:

The Functions

.init_data_def([data_def = None, init_start_node = True)

This gets called on class initialization. But if you want to apply another data_def to the same data, call this function to reinitialize. If no data_def is supplied the current one is reinitialized. It sets the timezone, current_date and reads in the "empty-values" keyword. This is later used by the .link_values() function. If the .searchtree attribute was already initialized (through the .init_data() function) the .searchtree.check_data_def() function is called and if init_start_node = True also the .searchtree.find_start_node() function.

.get_url([url_data = None])

This function returns a 5 part tuple containg:

  1. An URL extracted from the "url" keyword in data_def. If it contains a string, this is literally return. If it is a list every item is evaluated according to the The data_def language URL extension and concatenated into a single string.
  2. An encoding string (e.g. "utf-8" or "iso-8859-1") extracted from the "encoding" keyword in data_def.
  1. A dict with post data items and values extracted from the "url-data" keyword in data_def.
    Each value in the dict is evaluated according to the The data_def language URL extension
  2. True/False on whether this URL will return JSON. It looks in "url-date-format" in data_def and if it finds 'json' in there it returns True, else False.

.init_data(data[, init_start_node = True])

If you call the class with a value for data, this gets called on class initialization. Else you call it once you have retrieved your data or if you want to use the same data_def on another data-set.
It initializes a new DATAtree object and places it in the .searchtree attribute.

  • If the data is a list or dict it calls JSONtree() and on a string starting with "[" or "{" it first tries to convert to JSON and then calls JSONtree(), but in both case first is checked for the presence of a "sort" keyword and it is processed.
  • On finding a string starting with "<" it calls HTMLtree(). In the later case you can use the "autoclose-tags" keyword in your data_def to supply a list of tags to autoclose. If your HTML data consists out of more than one HTML tree, you need to encapsulate it into a single root tag. Set the "enclose-with-html-tag" keyword value in data_def to True to add a leading "<html>" and a trailing "</html>" to the datastring. Also if a "text_replace" and or "unquote_html" keyword with one or more regexes is found, those are processed prior to importing the data into theHTMLtree().
  • If neither HTML nor JSON data is recognized, nothing is done.
    Next it copies its own values for the debug settings of .show_result and .print_searchtree to the new DATAtree object and runs the .searchtree.check_data_def(data_def) function. Last if init_start_node is set to True it runs .searchtree.find_start_node().

.extract_datalist([init_start_node = False])

This in essence calls the corresponding extract_datalist() function in the DATAtree object loaded into the .searchtree attribute. But if init_start_node is set to True it first runs .searchtree.find_start_node().
Afterwards if a "values" keyword is present in data_def with a valid set of value_defs the next function DataTreeShell.link_values(linkdata) is called for every retrieved key-node and coresponding data-list. The resulting list of dicts is placed in the .result attribute. If no valid "values" keyword is found or no data was extracted from the DATAtree a warning is issued.

.link_values(linkdata)

Looks in data_def for a valid set of value_defs onder the "values" keyword and processes them on linkdata. See the Link extension to the data_def language on how to create those value_defs.

Clone this wiki locally