Models hub legal (#12835)

josejuanmartinez · jsl-models · web-flow · commit 9bd64aeaa62c · 2022-09-27T12:28:41.000+02:00
* 2022-09-19-legre_indemnifications_en (#12758) * Add model 2022-09-19-legre_indemnifications_en * Add model 2022-09-19-legner_bert_indemnifications_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * 2022-09-20-legclf_cuad_confidentiality_clause_en (#12770) * Add model 2022-09-20-legclf_cuad_confidentiality_clause_en * Add model 2022-09-20-legclf_cuad_indemnifications_clause_en * Add model 2022-09-20-legclf_cuad_licenses_clause_en * Add model 2022-09-20-legclf_cuad_obligations_clause_en * Add model 2022-09-20-legclf_cuad_whereas_clause_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_licenses_clause_en (#12827) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_indemnifications_clause_en (#12828) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legner_bert_indemnifications_en (#12831) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legassertion_time_en (#12832) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
diff --git a/docs/_posts/josejuanmartinez/2022-09-27-legassertion_time_en.md b/docs/_posts/josejuanmartinez/2022-09-27-legassertion_time_en.md
@@ -0,0 +1,114 @@
+---
+layout: model
+title: Temporality / Certainty Assertion Status
+author: John Snow Labs
+name: legassertion_time
+date: 2022-09-27
+tags: [en, licensed]
+task: Assertion Status
+language: en
+edition: Spark NLP for Legal 1.0.0
+spark_version: 3.0
+supported: true
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This is an Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your legal documents
+
+## Predicted Entities
+
+`PRESENT`, `PAST`, `FUTURE`, `POSSIBLE`
+
+{:.btn-box}
+[Live Demo](https://demo.johnsnowlabs.com/legal/LEGASSERTION_TEMPORALITY){:.button.button-orange}
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legassertion_time_en_1.0.0_3.0_1664274039847.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+# YOUR NER HERE
+# ...
+embeddings = BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
+    .setInputCols(["sentence", "token"]) \
+    .setOutputCol("embeddings")
+
+chunk_converter = ChunkConverter() \
+    .setInputCols(["entity"]) \
+    .setOutputCol("ner_chunk")
+
+assertion = leg.AssertionDLModel.pretrained("legassertion_time", "en", "legal/models")\
+    .setInputCols(["sentence", "ner_chunk", "embeddings"]) \
+    .setOutputCol("assertion")
+    
+nlpPipeline = Pipeline(stages=[
+    documentAssembler, 
+    tokenizer,
+    embeddings,
+    ner,
+    chunk_converter,
+    assertion
+    ])
+
+empty_data = spark.createDataFrame([[""]]).toDF("text")
+
+model = nlpPipeline.fit(empty_data)
+
+lp = LightPipeline(model)
+
+texts = ["The subsidiaries of Atlantic Inc will participate in a merging operation",
+    "The Conditions and Warranties of this agreement might be modified"]
+
+lp.annotate(texts)
+```
+
+</div>
+
+## Results
+
+```bash
+chunk,begin,end,entity_type,assertion
+Atlantic Inc,20,31,ORG,FUTURE
+
+chunk,begin,end,entity_type,assertion
+Conditions and Warranties,4,28,DOC,POSSIBLE
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|legassertion_time|
+|Compatibility:|Spark NLP for Legal 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Input Labels:|[document, doc_chunk, embeddings]|
+|Output Labels:|[assertion]|
+|Language:|en|
+|Size:|2.2 MB|
+
+## References
+
+In-house annotations on financial and legal corpora
+
+## Benchmarking
+
+```bash
+label	 tp	 fp	 fn	 prec	 rec	 f1
+PRESENT	 201	 11	 16	 0.9481132	 0.92626727	 0.937063
+POSSIBLE	 171	 3	 6	 0.98275864	 0.9661017	 0.9743589
+FUTURE	 119	 6	 4	 0.952	 0.96747965	 0.95967746
+PAST	 270	 16	 10	 0.9440559	 0.96428573	 0.9540636
+tp: 761 fp: 36 fn: 36 labels: 4
+Macro-average	 prec: 0.9567319, rec: 0.9560336, f1: 0.95638263
+Micro-average	 prec: 0.9548306, rec: 0.9548306, f1: 0.9548306
+```
diff --git a/docs/_posts/josejuanmartinez/2022-09-27-legner_bert_indemnifications_en.md b/docs/_posts/josejuanmartinez/2022-09-27-legner_bert_indemnifications_en.md
@@ -0,0 +1,144 @@
+---
+layout: model
+title: Legal Indemnification NER (Bert, base)
+author: John Snow Labs
+name: legner_bert_indemnifications
+date: 2022-09-27
+tags: [indemnifications, en, licensed]
+task: Named Entity Recognition
+language: en
+edition: Spark NLP for Legal 1.0.0
+spark_version: 3.0
+supported: true
+article_header:
+  type: cover
+use_language_switcher: "Python-Scala-Java"
+---
+
+## Description
+
+This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Indemnification clauses.
+
+## Predicted Entities
+
+`INDEMNIFICATION`, `INDEMNIFICATION_SUBJECT`, `INDEMNIFICATION_ACTION`, `INDEMNIFICATION_INDIRECT_OBJECT`
+
+{:.btn-box}
+[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
+<button class="button button-orange" disabled>Open in Colab</button>
+[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_bert_indemnifications_en_1.0.0_3.0_1664273651991.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
+
+## How to use
+
+
+
+<div class="tabs-box" markdown="1">
+{% include programmingLanguageSelectScalaPythonNLU.html %}
+```python
+documentAssembler = DocumentAssembler()\
+        .setInputCol("text")\
+        .setOutputCol("document")
+
+sentencizer = SentenceDetectorDLModel\
+        .pretrained("sentence_detector_dl", "en") \
+        .setInputCols(["document"])\
+        .setOutputCol("sentence")
+                      
+tokenizer = Tokenizer()\
+        .setInputCols(["sentence"])\
+        .setOutputCol("token")
+
+tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
+  .setInputCols("token", "sentence")\
+  .setOutputCol("label")\
+  .setCaseSensitive(True)
+
+ner_converter = NerConverter()\
+    .setInputCols(["sentence","token","label"])\
+    .setOutputCol("ner_chunk")
+    
+nlpPipeline = Pipeline(stages=[
+        documentAssembler,
+        sentencizer,
+        tokenizer,
+        tokenClassifier,
+        ner_converter
+        ])
+
+empty_data = spark.createDataFrame([[""]]).toDF("text")
+
+model = nlpPipeline.fit(empty_data)
+
+text='''The Company shall protect and indemnify the Supplier against any damages, losses or costs whatsoever'''
+
+data = spark.createDataFrame([[text]]).toDF("text")
+model = nlpPipeline.fit(data)
+lmodel = LightPipeline(model)
+res = lmodel.annotate(text)
+```
+
+</div>
+
+## Results
+
+```bash
++----------+---------------------------------+
+|     token|                        ner_label|
++----------+---------------------------------+
+|       The|                                O|
+|   Company|                                O|
+|     shall|         B-INDEMNIFICATION_ACTION|
+|   protect|         I-INDEMNIFICATION_ACTION|
+|       and|                                O|
+| indemnify|         B-INDEMNIFICATION_ACTION|
+|       the|                                O|
+|  Supplier|B-INDEMNIFICATION_INDIRECT_OBJECT|
+|   against|                                O|
+|       any|                                O|
+|   damages|                B-INDEMNIFICATION|
+|         ,|                                O|
+|    losses|                B-INDEMNIFICATION|
+|        or|                                O|
+|     costs|                B-INDEMNIFICATION|
+|whatsoever|                                O|
++----------+---------------------------------+
+```
+
+{:.model-param}
+## Model Information
+
+{:.table-model}
+|---|---|
+|Model Name:|legner_bert_indemnifications|
+|Compatibility:|Spark NLP for Legal 1.0.0+|
+|License:|Licensed|
+|Edition:|Official|
+|Input Labels:|[sentence, token]|
+|Output Labels:|[ner]|
+|Language:|en|
+|Size:|412.2 MB|
+|Case sensitive:|true|
+|Max sentence length:|128|
+
+## References
+
+In-house annotated examples from CUAD legal dataset
+
+## Benchmarking
+
+```bash
+                                   precision    recall  f1-score   support
+
+                B-INDEMNIFICATION       0.91      0.89      0.90        36
+         B-INDEMNIFICATION_ACTION       0.92      0.71      0.80        17
+B-INDEMNIFICATION_INDIRECT_OBJECT       0.88      0.88      0.88        40
+        B-INDEMNIFICATION_SUBJECT       0.71      0.56      0.63         9
+                I-INDEMNIFICATION       0.88      0.78      0.82         9
+         I-INDEMNIFICATION_ACTION       0.81      0.87      0.84        15
+I-INDEMNIFICATION_INDIRECT_OBJECT       1.00      0.53      0.69        17
+                                O       0.97      0.91      0.94       510
+
+                         accuracy                           0.88       654
+                        macro avg       0.71      0.61      0.81       654
+                     weighted avg       0.95      0.88      0.91       654
+```