- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2
examples
Linking to a subdata-list in JSON
Using index to go the the next item.
Linking two tags as a name/value pair.
Using the "member-off" keyword.
Using the "previous" and "last" keywords.
An URL defenition.
Some Link data_defs.
My add_on_link_functions
Example of linking to a subdata-list in JSON
[{"key":"abstract_key", "link":1},
	{"path":"root"},
	{"key":"library"},
	{"path":"all"},
	{"key":"abstracts"},
	{"path":"all", "childkeys":{"abstract_key":{"link":1}}},
	{"key":"name","default":""}],
[{"key":"season_key", "link":2},
	{"path":"root"},
	{"key":"library"},
	{"path":"all"},
	{"key":"seasons"},
	{"path":"all", "childkeys":{"season_key":{"link":2}}},
	{"key":"season_number","default":0, "type":"int"}],
The dataset being queried:
{
    "schedule": [
        {"abstract_key":"132237",
            "season_key":"351987",
            "episode_key":"352024",
            "station":"RTL4",
            "rerun":false,
            "unixtime":1454800860},
...
    ],
    "library": [{
        "abstracts": [
            {"abstract_key":"132237",
                "name":"RTL Nieuws"},
...
        ],
        "seasons": [
            {"season_key":"351987",
                "season_number":"84",
                "name":"Seizoen 84"},
...
        ]
    }]
}
As no further data definition is added, the link will contain the default being the key value. Next in the abstracts and seasons list, the dict containing this value under the abstract_key/season_key is selected.
Retrieve the same value both from the current JSON-list item and the next. Do not use this on JSON dicts as Python does randomize the order of items in a dict. See the dataset in the previous example. Here the index value is selected to be stored in the link, going to the next item in the list.
[{"key":"unixtime", "type":"timestamp"}],
[{"select":"index", "link":1},
	{"path":"parent"}, 
	{"index":{"link":1,"calc":["plus",1]}},
	{"key":"unixtime", "type":"timestamp"}], 
And similar on HTML code
[{"path":"parent","select":"index", "link":1}, 
	{"path":"parent"}, 
	{"tag":"div", "attrs":{"class":"program"}, "index":{"link":1,"calc":["plus",1]}},
	{"tag":"div", "attrs":{"class":"time"}, "type":"time"}],
With the following dataset:
<div class="program">
	<div class="time">09:00</div>
	<div class="title">
		<a href="http://www.nieuwsblad.be/tv-gids/een/gisteren/zomerbeelden">Zomerbeelden</a>
	</div>
</div>
<div class="program">
	<div class="time">10:00</div>
	<div class="title">
		<a href="http://www.nieuwsblad.be/tv-gids/een/gisteren/radio-2-op-een">Radio 2 op één</a>
	</div>
</div>
Linking two tags as a name/value pair. The text content of the label tag is stored in lower-case and with the trailing ":" removed. The name definition seems on the wrong place as you would expect it in the {"tag":"label"} node_def. On thinking deeper you will realize that you're still in the parent node at that moment.
[{"tag":"div", "attrs":{"class":null}},
	{"tag":"label"},
	{"path":"parent", "name":{"select":"text", "rstrip":":", "lower":""}},
	{"tag":"span", "select":"text"}]
On the following dataset:
    <div class='details'>
        <div class='program-details-title'>
            <label>Programmanaam:</label>
            <span> Vranckx</span>
        </div>
        <div class='program-details-time'>
            <label>Datum en tijd:</label>
            <span> 29-11-2015, 19u30 - 20u00</span>
        </div>
        <div class='program-details-genre'>
            <label>Genre:</label>
            <span> reportage</span>
        </div>
        <div class='program-details-genre'>
            <label>Zender:</label>
            <span> Canvas</span>
        </div>
        <div class='program-details-presentation'>
            <label>Presentatie:</label>
            <span> Rudi Vranckx</span>
        </div>
    </div>
</div>
An example of using the "member-off" keyword. Any resulting value of this selection not in the self.value_filters["channelid"] list will result in the containing keynode being dumped. As this value is retrieved from a node way up the tree, covering multiple desired keynodes, it can not be used as a keynode. You can of cause filter the resulting data afterwards, but by doing this early, you reduce the amount of data retrieved and probably reduce on needed memory and time.
"key-path":[
	{"tag":"div", "attrs":{"id":"tvprograms"}},
	{"tag":"div", "attrs":{"id":"program-channel-programs"}},
	{"tag":"div", "attrs":{"id":"programs-content"}},
	{"tag":"div", "attrs":{"class":"slider"}},
	{"tag":"div", "attrs":{"class":{"not":["time-elapsed"]}}},
	{"tag":"div"},
	{"tag":"div"},
	{"tag":"h3", "attr":"id"}
	],
"values":[
	[{"path":"parent"},
		{"path":"parent"},
		{"path":"parent","attr":"class","split":[["\\s",0]],"member-off":"channelid"}],
In case you have a list of keynodes divided into groups by intermediate nodes, you can select the last occurrence of that intermediate node with the "previous" and "last" keywords. Check this html code:
<TABLE cellpadding="0" cellspacing="0" border="0">
	<TR>
		<TD valign="top" class="pgMenuKopSub" style="font-size:0.7em">
			<I><A name="A">Landelijke publieke zenders:</A></I>
		</TD>
  ...
	</TR>
	<TR>
		<TD valign=top class="pgMenuInhoud" style="border-right-color: #BBDDFF">
  ...
			<P class="pnZender" style="margin-right: 30px; width:124px">
				<IMG src="/img/logo/npo-radio-1.png" width=20 height=20/>
				NPO Radio 1
			</P>
		</TD>
  ...
	</TR>
  ...
	<TR>
		<TD valign="top" class="pgMenuKopSub" style="font-size:0.7em">
			<I><A name="B">Landelijke commerciële zenders:</A></I>
		</TD>
  ...
	</TR>
	<TR>
		<TD valign=top class="pgMenuInhoud" style="border-right-color: #BBDDFF">
 ...
			<P class="pnZender" style="margin-right: 30px; width:124px">
				<IMG src="/img/logo/100procentnl.png" width=20 height=20/>
				100%NL
			</P>
		</TD>
  ...
	</TR>
  ...
</TABLE>
And the following data_def:
{"key-path":[
	{"tag":"tr"},
	{"tag":"td", "attrs":{"class":"pgMenuInhoud"}},
	{"tag":"p", "attrs":{"class":"pnZender"}},
	{"tag":"img", "select":"tail"}
	],
"values":[
	[{"path":"parent"},
		{"path":"parent"},
		{"tag":"a", "attr":"href", "first":"","split":[["/",2]],"type":"lower-ascii"}],
	[{"path":"parent"},
		{"path":"parent"},
		{"path":"parent", "select":"index", "link":1},
		{"path":"parent"},
		{"tag":"tr", "index":{"link":1,"previous":""}},
		{"tag":"td", "attrs":{"class":"pgMenuKopSub"}},
		{"tag":"i"},
		{"tag":"a", "attr":"name", "replace":{"a":11,"b":11,"c":17,"d":17}, "last":""}]
]}
An URL defenition
{
    "url":["http://services.vrt.be/epg/schedules/", 11],
    "data-format":"application/json",
    "encoding":"utf-8",
    "accept-header":"application/vnd.epg.vrt.be.schedule_3.1+json",
    "url-data":{
        "channel_code":["channel"],
        "type":"day"},
    "url-date-format":"%Y%m%d",
    "data":{
        "total-item-count":[{"key":"totalResults", "default": 0}],
        "page-item-count":[{"key":"entryCount", "default": 0}],
        ...
    }
}
Some Link data_defs.
{
    "data":{
        ...
    },
    "values":{
        "start-time":{"funcid":4, "data":[{"varid": 2},{"varid": 3},0]},
        "episode title":{"funcid":11, "data":[{"varid": 6}, "aflevering"],"regex":"\\d*/?\\d* ?(.*)"},
        "episode":{"funcid":11, "data":[{"varid": 6}, "aflevering"],"regex":"(\\d*)/.*", "type":"int", "default":null},
        "episodecount":{"funcid":11, "data":[{"varid": 6}, "aflevering"],"regex":"\\d*/?(\\d*).*", "type":"int", "default":null},
        "rerun":{"funcid":5, "data":[{"funcid":11, "data":[{"varid": 6},"bijzonderheden"]},"herhaling"]},
        "subgenre":{"funcid":11, "data":[{"varid": 5}, "genre"], "max length":25},
        "star-rating":{"funcid":11, "data":[{"varid":1},"rating"], "calc":{"multiplier":2.5}}
    }
}
The add_on_link_functions used in tvgrabpyAPI.
from DataTreeGrab import DataTreeShell, is_data_value, data_value
class DataTree(DataTreeShell):
    def __init__(self, source, data_def, warnaction = "default"):
        self.source = source
        self.config = self.source.config
        self.fetch_string_parts = re.compile("(.*?[.?!:]+ |.*?\Z)")
        DataTreeShell.__init__(self, data_def, warnaction = warnaction, warngoal = self.config.logging.log_queue)
        self.print_tags = source.print_tags
        self.print_searchtree = source.print_searchtree
        self.show_result = source.show_parsing
        self.fle = source.test_output
    def get_string_parts(self, sstring, header_items = None):
        if not isinstance(header_items, (list, tuple)):
            header_items = []
        test_items = []
        for hi in header_items:
            if isinstance(hi, (str, unicode)):
                test_items.append((hi.lower(), hi))
            elif isinstance(hi, (list, tuple)):
                if len(hi) > 0 and isinstance(hi[0], (str, unicode)):
                    hi0 = hi[0].lower()
                    if len(hi) > 1 and isinstance(hi[1], (str, unicode)):
                        hi1 = hi[1]
                    else:
                        hi1 = hi[0]
                    test_items.append((hi0, hi1))
        string_parts = self.fetch_string_parts.findall(sstring)
        string_items = {}
        act_item = 'start'
        string_items[act_item] = []
        for dp in string_parts:
            if dp.strip() == '':
                continue
            if dp.strip()[-1] == ':':
                act_item = dp.strip()[0:-1].lower()
                string_items[act_item] = []
            else:
                for ti in test_items:
                    if dp.strip().lower()[0:len(ti[0])] == ti[0]:
                        act_item = ti[1]
                        string_items[act_item] = []
                        string_items[act_item].append(dp[len(ti[0]):].strip())
                        break
                else:
                    string_items[act_item].append(dp.strip())
        return string_items
    def add_on_link_functions(self, fid, data = None, default = None):
        def split_kommastring(dstring):
            return re.sub('\) ([A-Z])', '), \g<1>', \
                re.sub(self.config.language_texts['and'], ', ', \
                re.sub(self.config.language_texts['and others'], '', dstring))).split(',')
        def add_person(prole, pname, palias = None):
            if pname in ('', None):
                return
            if pname[-1] in '\.,:;-':
                pname = pname[:-1].strip()
            if not prole in credits:
                credits[prole] = []
            if prole in ('actor', 'guest'):
                p = {'name': pname, 'role': palias}
                credits[prole].append(p)
            else:
                credits[prole].append(pname)
        try:
            # split logo name and logo provider
            if fid == 101:
                if is_data_value(0, data, str):
                    d = data[0].split('?')[0]
                    for k, v in self.config.xml_output.logo_provider.items():
                        if d[0:len(v)] == v:
                            return (d[len(v):], k)
                return ('',-1)
            # Extract roles from a set of lists or named dicts
            if fid == 102:
                credits = {}
                if len(data) == 0:
                    return default
                if len(data) == 1 and isinstance(data[0], (list,tuple)):
                    for item in data[0]:
                        if not isinstance(item, dict):
                            continue
                        for k, v in item.items():
                            if k.lower() in self.config.roletrans.keys():
                                role = self.config.roletrans[k.lower()]
                                for pp in v:
                                    pp = pp.split(',')
                                    for p in pp:
                                        cn = p.split('(')
                                        if len(cn) > 1:
                                            add_person(role, cn[0].strip(), cn[1].split(')')[0].strip())
                                        else:
                                            add_person(role, cn[0].strip())
                    return credits
                if len(data) < 2:
                    return default
                if isinstance(data[1], (list,tuple)):
                    for item in range(len(data[0])):
                        if item >= len(data[1]):
                            continue
                        if data[1][item].lower() in self.config.roletrans.keys():
                            role = self.config.roletrans[data[1][item].lower()]
                            if isinstance(data[0][item], (str, unicode)):
                                cast = split_kommastring(data[0][item])
                            else:
                                cast = data[0][item]
                            if isinstance(cast, (list, tuple)):
                                for person in cast:
                                    if len(data) > 2 and isinstance(data[2],(list, tuple)) and len(data[2]) > item:
                                        add_person(role, person.strip(), data[2][item])
                                    else:
                                        add_person(role, person.strip())
                elif isinstance(data[1], (str,unicode)) and data[1].lower() in self.config.roletrans.keys():
                    role = self.config.roletrans[data[1].lower()]
                    if isinstance(data[0], (str, unicode)):
                        cast = split_kommastring(data[0])
                    else:
                        cast = data[0]
                    if isinstance(cast, (list, tuple)):
                        for item in range(len(cast)):
                            if len(data) > 2 and isinstance(data[2],(list, tuple)) and len(data[2]) > item:
                                add_person(role, cast[item].strip(), data[2][item])
                            else:
                                add_person(role, cast[item].strip())
                return credits
            # Extract roles from a string
            if fid == 103:
                if len(data) == 0 or data[0] == None:
                    return {}
                if isinstance(data[0], (str, unicode)) and len(data[0]) > 0:
                    tstr = unicode(data[0])
                elif isinstance(data[0], list) and len(data[0]) > 0:
                    tstr = unicode(data[0][0])
                    for index in range(1, len(data[0])):
                        tstr = u'%s %s' % (tstr, unicode(data[0][index]))
                else:
                    return {}
                if len(data) == 1:
                    cast_items = self.get_string_parts(tstr)
                else:
                    cast_items = self.get_string_parts(tstr, data[1])
                credits = {}
                for crole, cast in cast_items.items():
                    if len(cast) == 0:
                        continue
                    elif crole.lower() in self.config.roletrans.keys():
                        role = self.config.roletrans[crole.lower()]
                        cast = split_kommastring(cast[0])
                        for cn in cast:
                            cn = cn.split('(')
                            if len(cn) > 1:
                                add_person(role, cn[0].strip(), cn[1].split(')')[0].strip())
                            else:
                                add_person(role, cn[0].strip())
                return credits
            # Process a rating item
            if fid == 104:
                rlist = []
                if is_data_value(0, data, str):
                    # We treat a string as a list of items with a maximaum length
                    if data_value(1, data, str) == 'as_list':
                        item_length = data_value(2, data, int, 1)
                        unique_added = False
                        for index in range(len(data[0])):
                            code = None
                            for cl in range(item_length):
                                if index + cl >= len(data[0]):
                                    continue
                                tval = data[0][index: index + cl + 1]
                                if tval in self.source.rating.keys():
                                    code = self.source.rating[tval]
                                    break
                            if code != None:
                                if code in self.config.rating["unique_codes"].keys():
                                    if unique_added:
                                        continue
                                    rlist.append(code)
                                    unique_added = True
                                elif self.source.rating[code] in self.config.rating["addon_codes"].keys():
                                    rlist.append(code)
                            elif self.config.write_info_files:
                                self.config.infofiles.addto_detail_list(u'new %s rating => %s' % (self.source.source, code))
                    else:
                        if data[0].lower() in self.source.rating.keys():
                            v = self.source.rating[data[0].lower()]
                            if v in self.config.rating["unique_codes"].keys():
                                rlist.append(v)
                            elif v in self.config.rating["addon_codes"].keys():
                                rlist.append(v)
                        elif self.config.write_info_files:
                            self.config.infofiles.addto_detail_list(u'new %s rating => %s' % (self.source.source, data[0]))
                elif is_data_value(0, data, list):
                    unique_added = False
                    for item in data[0]:
                        if item.lower() in self.source.rating.keys():
                            v = self.source.rating[item.lower()]
                            if v in self.config.rating["unique_codes"].keys():
                                if unique_added:
                                    continue
                                rlist.append(v)
                                unique_added = True
                            elif v in self.config.rating["addon_codes"].keys():
                                rlist.append(v)
                        elif self.config.write_info_files:
                            self.config.infofiles.addto_detail_list(u'new %s rating => %s' % (self.source.source, data[0]))
                return rlist
            # Check the text in data[1] for the presence of keywords to determine genre
            if fid == 105:
                if len(data) < 2 or not isinstance(data[0], dict):
                    return default
                for k, v in data[0].items():
                    for i in range(1, len(data)):
                        if isinstance(data[i], (str, unicode)) and k in data[i]:
                            return v
            # split a genre code in a generic part of known length and a specific part
            if fid == 106:
                if len(data) == 0 or not isinstance(data[0],(str, unicode, list)):
                    return []
                if len(data) == 1:
                    if isinstance(data[0], list):
                        return data[0]
                    else:
                        return [data[0]]
                if isinstance(data[0], list):
                    if len(data[0]) == 0:
                        return []
                    data[0] = data[0][0]
                if not isinstance(data[1], int) or len(data[0]) <= data[1]:
                    return [data[0]]
                return [data[0][:data[1]], data[0][data[1]:]]
            # Return unlisted values to infofiles in a fid 14 dict
            if fid == 201:
                if len(data) < 2 or not isinstance(data[0], (list, tuple)):
                    return default
                if not isinstance(data[1], (list,tuple)):
                    data[1] = [data[1]]
                for index in range(len(data[1])):
                    data[1][index] = data[1][index].lower().strip()
                for sitem in data[0]:
                    for k, v in sitem.items():
                        if k.lower().strip() in data[1]:
                            continue
                        if k.lower().strip() in self.config.roletrans.keys():
                            continue
                        if self.config.write_info_files:
                            self.config.infofiles.addto_detail_list(u'new %s dataitem %s => %s' % (self.source.source, k, v))
        except:
            self.config.log([self.config.text('fetch', 69, ('link', fid, self.source.source)), traceback.format_exc()], 1)
            return default
Glossary
accept-header
autoclose-tags
caller_id
current_date
current_ordinal
child_index
data_def
data-format
DATAnode
DATAtree
date-range-splitter
date-sequence
date-splitter
datetimestring
default-item-count
empty-values
enclose-with-html-tag
encoding
init_def
item-range-splitter
key_def
key-node
link_def
link-value
month-names
name-value
node_def
NULLnode
path_def
.print_searchtree
relative-weekdays
root-node
severity
.show_result
start_node
str-list-splitter
text_replace
time-splitter
time-type
timezone
unquote_html
URL_def
url
url-data
url-date-format
url-date-multiplier
url-date-type
url-header
url-type
url-weekdays
value_def
value-filters
warngoal
weekdays