@@ -23,44 +23,63 @@ Then, in your favorite virtual environment, simply do:
23
23
```
24
24
pip install flair
25
25
```
26
- Furthermore, we recommend to install [ SciSpaCy] ( https://allenai.github.io/scispacy/ ) for improved pre-processing
27
- and tokenization of scientific / biomedical texts:
28
- ```
29
- pip install scispacy==0.2.5
30
- pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
31
- ```
32
26
33
- #### Example Usage
27
+ #### Example 1: Biomedical NER
34
28
Let's run named entity recognition (NER) over an example sentence. All you need to do is
35
29
make a Sentence, load a pre-trained model and use it to predict tags for the sentence:
36
30
``` python
37
31
from flair.data import Sentence
38
- from flair.models import MultiTagger
39
- from flair.tokenization import SciSpacyTokenizer
32
+ from flair.nn import Classifier
40
33
41
- # make a sentence and tokenize with SciSpaCy
42
- sentence = Sentence(" Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome" ,
43
- use_tokenizer = SciSpacyTokenizer())
34
+ # make a sentence
35
+ sentence = Sentence(" Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome" )
44
36
45
37
# load biomedical tagger
46
- tagger = MultiTagger .load(" hunflair" )
38
+ tagger = Classifier .load(" hunflair" )
47
39
48
40
# tag sentence
49
41
tagger.predict(sentence)
50
42
```
51
43
Done! The Sentence now has entity annotations. Let's print the entities found by the tagger:
52
44
``` python
53
- for annotation_layer in sentence.annotation_layers.keys():
54
- for entity in sentence.get_spans(annotation_layer):
55
- print (entity)
45
+ for entity in sentence.get_labels():
46
+ print (entity)
56
47
```
57
48
This should print:
58
- ~~~
49
+ ``` console
59
50
Span[0:2]: "Behavioral abnormalities" → Disease (0.6736)
60
51
Span[9:12]: "Fragile X Syndrome" → Disease (0.99)
61
52
Span[4:5]: "Fmr1" → Gene (0.838)
62
53
Span[6:7]: "Mouse" → Species (0.9979)
63
- ~~~
54
+ ```
55
+
56
+
57
+ #### Example 2: Biomedical NER with Better Tokenization
58
+
59
+ Scientific texts are difficult to tokenize. For this reason, we recommend to install [ SciSpaCy] ( https://allenai.github.io/scispacy/ ) for improved pre-processing and tokenization of scientific / biomedical texts:
60
+ ```
61
+ pip install scispacy==0.2.5
62
+ pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
63
+ ```
64
+
65
+ Use this code to apply scientific tokenization:
66
+
67
+ ``` python
68
+ from flair.data import Sentence
69
+ from flair.nn import Classifier
70
+ from flair.tokenization import SciSpacyTokenizer
71
+
72
+ # make a sentence and tokenize with SciSpaCy
73
+ sentence = Sentence(" Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome" ,
74
+ use_tokenizer = SciSpacyTokenizer())
75
+
76
+ # load biomedical tagger
77
+ tagger = Classifier.load(" hunflair" )
78
+
79
+ # tag sentence
80
+ tagger.predict(sentence)
81
+ ```
82
+
64
83
65
84
## Comparison to other biomedical NER tools
66
85
Tools for biomedical NER are typically trained and evaluated on rather small gold standard data sets.
0 commit comments