Evaluate and improve how different types of overlaps of entities across previously existing xml tags influence the detection/re-integration (additional tagging) process. An example, in
<a> "Kuala" <b> "Lumpur" </b> "Airport" </a>
if entities are detected, are they also written back into the output xml file? Is whitespace important, e.g. if an entity is detected within a string?
<a>Kuala <b> Lumpur</b></a> will be written as <name n="ner" type="LOC">Lumpur</name></b><name n="ner" type="LOC">Kuala</name> in the final output