JohnSnowLabs · josejuanmartinez · Oct 2, 2022 · Sep 19, 2022 · Sep 20, 2022 · Sep 27, 2022
diff --git a/docs/_posts/josejuanmartinez/2022-09-19-legner_bert_indemnifications_en.md b/docs/_posts/josejuanmartinez/2022-09-19-legner_bert_indemnifications_en.md
@@ -0,0 +1,144 @@
+---
+layout: model
+title: Legal Indemnification NER
+author: John Snow Labs
+name: legner_bert_indemnifications
+date: 2022-09-19
+tags: [en, legal, ner, indemnification, licensed]
+task: Named Entity Recognition
+language: en
+edition: Spark NLP for Legal 1.0.0
+spark_version: 3.2
+supported: true
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Indemnification clauses.
+
+## Predicted Entities
+
+`INDEMNIFICATION`, `INDEMNIFICATION_SUBJECT`, `INDEMNIFICATION_ACTION`, `INDEMNIFICATION_INDIRECT_OBJECT`
+
+{:.btn-box}
+[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_bert_indemnifications_en_1.0.0_3.2_1663605909112.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+documentAssembler = DocumentAssembler()\
+        .setInputCol("text")\
+        .setOutputCol("document")
+
+sentencizer = SentenceDetectorDLModel\
+        .pretrained("sentence_detector_dl", "en") \
+        .setInputCols(["document"])\
+        .setOutputCol("sentence")
+
+tokenizer = Tokenizer()\
+        .setInputCols(["sentence"])\
+        .setOutputCol("token")
+
+tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
+  .setInputCols("token", "sentence")\
+  .setOutputCol("label")\
+  .setCaseSensitive(True)
+
+ner_converter = NerConverter()\
+    .setInputCols(["sentence","token","label"])\
+    .setOutputCol("ner_chunk")
+
+nlpPipeline = Pipeline(stages=[
+        documentAssembler,
+        sentencizer,
+        tokenizer,
+        tokenClassifier,
+        ner_converter
+        ])
+
+empty_data = spark.createDataFrame([[""]]).toDF("text")
+
+model = nlpPipeline.fit(empty_data)
+
+text='''The Company shall protect and indemnify the Supplier against any damages, losses or costs whatsoever'''
+
+data = spark.createDataFrame([[text]]).toDF("text")
+model = nlpPipeline.fit(data)
+lmodel = LightPipeline(model)
+res = lmodel.annotate(text)
+```
+
+</div>
+
+## Results
+
+```bash
++----------+---------------------------------+
+|     token|                        ner_label|
++----------+---------------------------------+
+|       The|                                O|
+|   Company|                                O|
+|     shall|         B-INDEMNIFICATION_ACTION|
+|   protect|         I-INDEMNIFICATION_ACTION|
+|       and|                                O|
+| indemnify|         B-INDEMNIFICATION_ACTION|
+|       the|                                O|
+|  Supplier|B-INDEMNIFICATION_INDIRECT_OBJECT|
+|   against|                                O|
+|       any|                                O|
+|   damages|                B-INDEMNIFICATION|
+|         ,|                                O|
+|    losses|                B-INDEMNIFICATION|
+|        or|                                O|
+|     costs|                B-INDEMNIFICATION|
+|whatsoever|                                O|
++----------+---------------------------------+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|legner_bert_indemnifications|
+|Compatibility:|Spark NLP for Legal 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Input Labels:|[sentence, token]|
+|Output Labels:|[ner]|
+|Language:|en|
+|Size:|412.2 MB|
+|Case sensitive:|true|
+|Max sentence length:|128|
+
+## References
+
+In-house annotated examples from CUAD legal dataset
+
+## Benchmarking
+
+```bash
+                                   precision    recall  f1-score   support
+
+                B-INDEMNIFICATION       0.91      0.89      0.90        36
+         B-INDEMNIFICATION_ACTION       0.92      0.71      0.80        17
+B-INDEMNIFICATION_INDIRECT_OBJECT       0.88      0.88      0.88        40
+        B-INDEMNIFICATION_SUBJECT       0.71      0.56      0.63         9
+                I-INDEMNIFICATION       0.88      0.78      0.82         9
+         I-INDEMNIFICATION_ACTION       0.81      0.87      0.84        15
+I-INDEMNIFICATION_INDIRECT_OBJECT       1.00      0.53      0.69        17
+                                O       0.97      0.91      0.94       510
+
+                         accuracy                           0.88       654
+                        macro avg       0.71      0.61      0.81       654
+                     weighted avg       0.95      0.88      0.91       654
+```
diff --git a/docs/_posts/josejuanmartinez/2022-09-19-legre_indemnifications_en.md b/docs/_posts/josejuanmartinez/2022-09-19-legre_indemnifications_en.md
@@ -0,0 +1,152 @@
+---
+layout: model
+title: Legal Indemnification Relation Extraction
+author: John Snow Labs
+name: legre_indemnifications
+date: 2022-09-19
+tags: [en, legal, re, indemnification, licensed]
+task: Relation Extraction
+language: en
+edition: Spark NLP for Legal 1.0.0
+spark_version: 3.2
+supported: true
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This is a Relation Extraction model to group the different entities extracted with the Indemnification NER model (see `legner_bert_indemnifications` in Models Hub).
+
+## Predicted Entities
+
+`is_indemnification_subject`, `is_indemnification_object`, `is_indemnification_indobject`
+
+{:.btn-box}
+[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legre_indemnifications_en_1.0.0_3.2_1663605721178.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+documentAssembler = DocumentAssembler()\
+        .setInputCol("text")\
+        .setOutputCol("document")
+
+sentencizer = SentenceDetectorDLModel\
+        .pretrained("sentence_detector_dl", "en") \
+        .setInputCols(["document"])\
+        .setOutputCol("sentence")
+
+tokenizer = Tokenizer()\
+        .setInputCols(["sentence"])\
+        .setOutputCol("token")
+
+tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
+  .setInputCols("token", "sentence")\
+  .setOutputCol("label")\
+  .setCaseSensitive(True)
+
+ner_converter = NerConverter()\
+    .setInputCols(["sentence","token","label"])\
+    .setOutputCol("ner_chunk")
+
+# ONLY NEEDED IF YOU WANT TO FILTER RELATION PAIRS OR SYNTACTIC DISTANCE
+# =================
+pos_tagger = PerceptronModel()\
+    .pretrained() \
+    .setInputCols(["sentence", "token"])\
+    .setOutputCol("pos_tags")
+
+dependency_parser = DependencyParserModel() \
+    .pretrained("dependency_conllu", "en") \
+    .setInputCols(["sentence", "pos_tags", "token"]) \
+    .setOutputCol("dependencies")
+
+#Set a filter on pairs of named entities which will be treated as relation candidates
+re_filter = RENerChunksFilter()\
+    .setInputCols(["ner_chunk", "dependencies"])\
+    .setOutputCol("re_ner_chunks")\
+    .setMaxSyntacticDistance(20)\
+    .setRelationPairs(['INDEMNIFICATION_SUBJECT-INDEMNIFICATION_ACTION', 'INDEMNIFICATION_SUBJECT-INDEMNIFICATION_INDIRECT_OBJECT', 'INDEMNIFICATION_ACTION-INDEMNIFICATION', 'INDEMNIFICATION_ACTION-INDEMNIFICATION_INDIRECT_OBJECT'])
+# =================
+
+reDL = legal.RelationExtractionDLModel()\
+    .pretrained("legre_indemnifications", "en", "legal/models")\
+    .setPredictionThreshold(0.5)\
+    .setInputCols(["re_ner_chunks", "sentence"])\
+    .setOutputCol("relations")
+
+nlpPipeline = Pipeline(stages=[
+        documentAssembler,
+        sentencizer,
+        tokenizer,
+        tokenClassifier,
+        ner_converter,
+        pos_tagger,
+        dependency_parser,
+        re_filter,
+        reDL])
+
+empty_data = spark.createDataFrame([[""]]).toDF("text")
+
+model = nlpPipeline.fit(empty_data)
+
+text='''The Company shall indemnify and hold harmless HOC against any losses, claims, damages or liabilities to which it may become subject under the 1933 Act or otherwise, insofar as such losses, claims, damages or liabilities (or actions in respect thereof) arise out of or are based upon '''
+
+data = spark.createDataFrame([[text]]).toDF("text")
+model = nlpPipeline.fit(data)
+lmodel = LightPipeline(model)
+res = lmodel.annotate(text)
+```
+
+</div>
+
+## Results
+
+```bash
+relation	entity1	entity1_begin	entity1_end	chunk1	entity2	entity2_begin	entity2_end	chunk2	confidence
+1	is_indemnification_subject	INDEMNIFICATION_SUBJECT	4	10	Company	INDEMNIFICATION_ACTION	32	44	hold harmless	0.8847967
+2	is_indemnification_indobject	INDEMNIFICATION_SUBJECT	4	10	Company	INDEMNIFICATION_INDIRECT_OBJECT	46	48	HOC	0.96191925
+3	is_indemnification_indobject	INDEMNIFICATION_ACTION	12	26	shall indemnify	INDEMNIFICATION_INDIRECT_OBJECT	46	48	HOC	0.7332646
+10	is_indemnification_object	INDEMNIFICATION_ACTION	32	44	hold harmless	INDEMNIFICATION	70	75	claims	0.9728908
+11	is_indemnification_object	INDEMNIFICATION_ACTION	32	44	hold harmless	INDEMNIFICATION	78	84	damages	0.9727499
+12	is_indemnification_object	INDEMNIFICATION_ACTION	32	44	hold harmless	INDEMNIFICATION	89	99	liabilities	0.964168
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|legre_indemnifications|
+|Compatibility:|Spark NLP for Legal 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Language:|en|
+|Size:|405.9 MB|
+
+## References
+
+In-house annotated examples from CUAD legal dataset
+
+## Benchmarking
+
+```bash
+Relation           Recall Precision        F1   Support
+
+is_indemnification_indobject     0.966     1.000     0.982        29
+is_indemnification_object     0.929     0.929     0.929        42
+is_indemnification_subject     0.931     0.931     0.931        29
+no_rel              0.950     0.941     0.945       100
+
+Avg.                0.944     0.950     0.947
+
+Weighted Avg.       0.945     0.945     0.945
+```