Merge pull request #3137 from flairNLP/bioner-tutorial

alanakbik · web-flow · commit 8b3568e06168 · 2023-03-10T14:17:23.000+01:00
Update HunFlair tutorial to Flair 0.12
diff --git a/resources/docs/HUNFLAIR.md b/resources/docs/HUNFLAIR.md
@@ -23,44 +23,63 @@ Then, in your favorite virtual environment, simply do:
 ```
 pip install flair
 ```
-Furthermore, we recommend to install [SciSpaCy](https://allenai.github.io/scispacy/) for improved pre-processing
-and tokenization of scientific / biomedical texts:
- ```
-pip install scispacy==0.2.5
-pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
-```
 
-#### Example Usage
+#### Example 1: Biomedical NER 
 Let's run named entity recognition (NER) over an example sentence. All you need to do is
 make a Sentence, load a pre-trained model and use it to predict tags for the sentence:
 ```python
 from flair.data import Sentence
-from flair.models import MultiTagger
-from flair.tokenization import SciSpacyTokenizer
+from flair.nn import Classifier
 
-# make a sentence and tokenize with SciSpaCy
-sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome",
-                    use_tokenizer=SciSpacyTokenizer())
+# make a sentence 
+sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome")
 
 # load biomedical tagger
-tagger = MultiTagger.load("hunflair")
+tagger = Classifier.load("hunflair")
 
 # tag sentence
 tagger.predict(sentence)
 ```
 Done! The Sentence now has entity annotations. Let's print the entities found by the tagger:
 ```python
-for annotation_layer in sentence.annotation_layers.keys():
-    for entity in sentence.get_spans(annotation_layer):
-        print(entity)
+for entity in sentence.get_labels():
+    print(entity)
 ```
 This should print:
-~~~
+```console
 Span[0:2]: "Behavioral abnormalities" → Disease (0.6736)
 Span[9:12]: "Fragile X Syndrome" → Disease (0.99)
 Span[4:5]: "Fmr1" → Gene (0.838)
 Span[6:7]: "Mouse" → Species (0.9979)
-~~~
+```
+
+
+#### Example 2: Biomedical NER with Better Tokenization
+
+Scientific texts are difficult to tokenize. For this reason, we recommend to install [SciSpaCy](https://allenai.github.io/scispacy/) for improved pre-processing and tokenization of scientific / biomedical texts:
+ ```
+pip install scispacy==0.2.5
+pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.2.5/en_core_sci_sm-0.2.5.tar.gz
+```
+
+Use this code to apply scientific tokenization: 
+
+```python
+from flair.data import Sentence
+from flair.nn import Classifier
+from flair.tokenization import SciSpacyTokenizer
+
+# make a sentence and tokenize with SciSpaCy
+sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fragile X Syndrome",
+                    use_tokenizer=SciSpacyTokenizer())
+
+# load biomedical tagger
+tagger = Classifier.load("hunflair")
+
+# tag sentence
+tagger.predict(sentence)
+```
+
 
 ## Comparison to other biomedical NER tools
 Tools for biomedical NER are typically trained and evaluated on rather small gold standard data sets.
diff --git a/resources/docs/HUNFLAIR_TUTORIAL_1_TAGGING.md b/resources/docs/HUNFLAIR_TUTORIAL_1_TAGGING.md
@@ -7,9 +7,9 @@ Let's use the pre-trained *HunFlair* model for biomedical named entity recogniti
 This model was trained over 24 biomedical NER data sets and can recognize 5 different entity types,
 i.e. cell lines, chemicals, disease, gene / proteins and species.
 ```python
-from flair.models import MultiTagger
+from flair.nn import Classifier
 
-tagger = MultiTagger.load("hunflair")
+tagger = Classifier.load("hunflair")
 ```
 All you need to do is use the predict() method of the tagger on a sentence.
 This will add predicted tags to the tokens in the sentence.
@@ -23,7 +23,7 @@ sentence = Sentence("Behavioral abnormalities in the Fmr1 KO2 Mouse Model of Fra
 tagger.predict(sentence)
 
 # print sentence with predicted tags
-print(sentence.to_tagged_string())
+print(sentence)
 ```
 This should print:
 ~~~
@@ -40,7 +40,7 @@ Often named entities consist of multiple words spanning a certain text span in t
 "_Behavioral Abnormalities_" or "_Fragile X Syndrome_" in our example sentence.
 You can directly get such spans in a tagged sentence like this:
 ```python
-for disease in sentence.get_spans("hunflair-disease"):
+for disease in sentence.get_labels("hunflair-disease"):
     print(disease)
 ```
 This should print:
@@ -71,9 +71,8 @@ You can retrieve all annotated entities of the other entity types in analogous w
 for cell lines,  `hunflair-chemical` for chemicals, `hunflair-gene` for genes and proteins, and `hunflair-species`
 for species. To get all entities in one you can run:
 ```python
-for annotation_layer in sentence.annotation_layers.keys():
-    for entity in sentence.get_spans(annotation_layer):
-        print(entity)
+for entity in sentence.get_labels():
+    print(entity)
 ```
 This should print:
 ~~~
@@ -117,7 +116,7 @@ abstract = "Fragile X syndrome (FXS) is a developmental disorder caused by a mut
 To work with complete abstracts or full-text, we first have to split them into separate sentences.
 Again we can apply the integration of the [SciSpaCy](https://allenai.github.io/scispacy/) library:
 ```python
-from flair.tokenization import SciSpacySentenceSplitter
+from flair.splitter import SciSpacySentenceSplitter
 
 # initialize the sentence splitter
 splitter = SciSpacySentenceSplitter()