You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fixing issue #60 to ensure that SpacyQuickUMLS cannot add entity spans which overlap on a token. Also added some documentation to the class and README.
Copy file name to clipboardExpand all lines: README.md
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -54,7 +54,7 @@ If the matcher throws a warning during initialization, read [this page](https://
54
54
55
55
## spaCy pipeline component
56
56
57
-
QuickUMLS can be used for standalone processing but it can also be use as a component in a modular spaCy pipeline. This follows traditional spaCy handling of concepts to be entity objects added to the Document object. These entity objects contain the CUI, similarity score and Semantic Types in the spacy "underscore" object.
57
+
QuickUMLS can be used for standalone processing but it can also be use as a component in a modular spaCy pipeline. This follows traditional spaCy handling of concepts to be entity objects added to the Document object. These entity objects contain the CUI, similarity score and Semantic Types in the spacy "underscore" object. Note that this implementation follows a [known spacy convention](https://github.com/explosion/spaCy/issues/3608) that entity Spans cannot overlap on a single token. To prevent token overlap, matches are ranked according to the `overlapping_criteria` supplied so that overlap of any tokens will be prioritized by this order.
58
58
59
59
Adding QuickUMLS as a component in a pipeline can be done as follows:
This creates a QuickUMLS spaCy component which can be used in modular pipelines.
15
15
This module adds entity Spans to the document where the entity label is the UMLS CUI and the Span's "underscore" object is extended to contains "similarity" and "semtypes" for matched concepts.
16
+
Note that this implementation follows and enforces a known spacy convention that entity Spans cannot overlap on a single token.
16
17
17
18
Args:
18
19
nlp: Existing spaCy pipeline. This is needed to update the vocabulary with UMLS CUI values
19
20
quickumls_fp (str): Path to QuickUMLS data
20
21
best_match (bool, optional): Whether to return only the top match or all overlapping candidates. Defaults to True.
21
-
ignore_syntax (bool, optional): Wether to use the heuristcs introduced in the paper (Soldaini and Goharian, 2016). TODO: clarify,. Defaults to False
22
+
ignore_syntax (bool, optional): Whether to use the heuristcs introduced in the paper (Soldaini and Goharian, 2016). TODO: clarify,. Defaults to False
22
23
**kwargs: QuickUMLS keyword arguments (see QuickUMLS in core.py)
23
24
"""
24
25
@@ -43,6 +44,15 @@ def __call__(self, doc):
43
44
# pass in the document which has been parsed to this point in the pipeline for ngrams and matches
0 commit comments