Skip to content

Commit 8b3568e

Browse files
authored
Merge pull request #3137 from flairNLP/bioner-tutorial
Update HunFlair tutorial to Flair 0.12
2 parents 0b94c7f + 49b6488 commit 8b3568e

File tree

2 files changed

+44
-26
lines changed

2 files changed

+44
-26
lines changed

resources/docs/HUNFLAIR.md

+37-18
Original file line numberDiff line numberDiff line change
@@ -23,44 +23,63 @@ Then, in your favorite virtual environment, simply do:
2323
```
2424
pip install flair
2525
```
26-
Furthermore, we recommend to install [SciSpaCy](https://allenai.github.io/scispacy/) for improved pre-processing
27-
and tokenization of scientific / biomedical texts:
28-
```
29-
pip install scispacy==0.2.5
30-
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
31-
```
3226

33-
#### Example Usage
27+
#### Example 1: Biomedical NER
3428
Let's run named entity recognition (NER) over an example sentence. All you need to do is
3529
make a Sentence, load a pre-trained model and use it to predict tags for the sentence:
3630
```python
3731
from flair.data import Sentence
38-
from flair.models import MultiTagger
39-
from flair.tokenization import SciSpacyTokenizer
32+
from flair.nn import Classifier
4033

41-
# make a sentence and tokenize with SciSpaCy
42-
sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome",
43-
use_tokenizer=SciSpacyTokenizer())
34+
# make a sentence
35+
sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome")
4436

4537
# load biomedical tagger
46-
tagger = MultiTagger.load("hunflair")
38+
tagger = Classifier.load("hunflair")
4739

4840
# tag sentence
4941
tagger.predict(sentence)
5042
```
5143
Done! The Sentence now has entity annotations. Let's print the entities found by the tagger:
5244
```python
53-
for annotation_layer in sentence.annotation_layers.keys():
54-
for entity in sentence.get_spans(annotation_layer):
55-
print(entity)
45+
for entity in sentence.get_labels():
46+
print(entity)
5647
```
5748
This should print:
58-
~~~
49+
```console
5950
Span[0:2]: "Behavioral abnormalities" → Disease (0.6736)
6051
Span[9:12]: "Fragile X Syndrome" → Disease (0.99)
6152
Span[4:5]: "Fmr1" → Gene (0.838)
6253
Span[6:7]: "Mouse" → Species (0.9979)
63-
~~~
54+
```
55+
56+
57+
#### Example 2: Biomedical NER with Better Tokenization
58+
59+
Scientific texts are difficult to tokenize. For this reason, we recommend to install [SciSpaCy](https://allenai.github.io/scispacy/) for improved pre-processing and tokenization of scientific / biomedical texts:
60+
```
61+
pip install scispacy==0.2.5
62+
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
63+
```
64+
65+
Use this code to apply scientific tokenization:
66+
67+
```python
68+
from flair.data import Sentence
69+
from flair.nn import Classifier
70+
from flair.tokenization import SciSpacyTokenizer
71+
72+
# make a sentence and tokenize with SciSpaCy
73+
sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome",
74+
use_tokenizer=SciSpacyTokenizer())
75+
76+
# load biomedical tagger
77+
tagger = Classifier.load("hunflair")
78+
79+
# tag sentence
80+
tagger.predict(sentence)
81+
```
82+
6483

6584
## Comparison to other biomedical NER tools
6685
Tools for biomedical NER are typically trained and evaluated on rather small gold standard data sets.

resources/docs/HUNFLAIR_TUTORIAL_1_TAGGING.md

+7-8
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,9 @@ Let's use the pre-trained *HunFlair* model for biomedical named entity recogniti
77
This model was trained over 24 biomedical NER data sets and can recognize 5 different entity types,
88
i.e. cell lines, chemicals, disease, gene / proteins and species.
99
```python
10-
from flair.models import MultiTagger
10+
from flair.nn import Classifier
1111

12-
tagger = MultiTagger.load("hunflair")
12+
tagger = Classifier.load("hunflair")
1313
```
1414
All you need to do is use the predict() method of the tagger on a sentence.
1515
This will add predicted tags to the tokens in the sentence.
@@ -23,7 +23,7 @@ sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fra
2323
tagger.predict(sentence)
2424

2525
# print sentence with predicted tags
26-
print(sentence.to_tagged_string())
26+
print(sentence)
2727
```
2828
This should print:
2929
~~~
@@ -40,7 +40,7 @@ Often named entities consist of multiple words spanning a certain text span in t
4040
"_Behavioral Abnormalities_" or "_Fragile X Syndrome_" in our example sentence.
4141
You can directly get such spans in a tagged sentence like this:
4242
```python
43-
for disease in sentence.get_spans("hunflair-disease"):
43+
for disease in sentence.get_labels("hunflair-disease"):
4444
print(disease)
4545
```
4646
This should print:
@@ -71,9 +71,8 @@ You can retrieve all annotated entities of the other entity types in analogous w
7171
for cell lines, `hunflair-chemical` for chemicals, `hunflair-gene` for genes and proteins, and `hunflair-species`
7272
for species. To get all entities in one you can run:
7373
```python
74-
for annotation_layer in sentence.annotation_layers.keys():
75-
for entity in sentence.get_spans(annotation_layer):
76-
print(entity)
74+
for entity in sentence.get_labels():
75+
print(entity)
7776
```
7877
This should print:
7978
~~~
@@ -117,7 +116,7 @@ abstract = "Fragile X syndrome (FXS) is a developmental disorder caused by a mut
117116
To work with complete abstracts or full-text, we first have to split them into separate sentences.
118117
Again we can apply the integration of the [SciSpaCy](https://allenai.github.io/scispacy/) library:
119118
```python
120-
from flair.tokenization import SciSpacySentenceSplitter
119+
from flair.splitter import SciSpacySentenceSplitter
121120

122121
# initialize the sentence splitter
123122
splitter = SciSpacySentenceSplitter()

0 commit comments

Comments
 (0)