Skip to content

Models hub legal #12877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Oct 2, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
---
layout: model
title: Legal Indemnification NER
author: John Snow Labs
name: legner_bert_indemnifications
date: 2022-09-19
tags: [en, legal, ner, indemnification, licensed]
task: Named Entity Recognition
language: en
edition: Spark NLP for Legal 1.0.0
spark_version: 3.2
supported: true
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Indemnification clauses.

## Predicted Entities

`INDEMNIFICATION`, `INDEMNIFICATION_SUBJECT`, `INDEMNIFICATION_ACTION`, `INDEMNIFICATION_INDIRECT_OBJECT`

{:.btn-box}
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_bert_indemnifications_en_1.0.0_3.2_1663605909112.zip){:.button.button-orange.button-orange-trans.arr.button-icon}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentencizer = SentenceDetectorDLModel\
.pretrained("sentence_detector_dl", "en") \
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)

ner_converter = NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")

nlpPipeline = Pipeline(stages=[
documentAssembler,
sentencizer,
tokenizer,
tokenClassifier,
ner_converter
])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text='''The Company shall protect and indemnify the Supplier against any damages, losses or costs whatsoever'''

data = spark.createDataFrame([[text]]).toDF("text")
model = nlpPipeline.fit(data)
lmodel = LightPipeline(model)
res = lmodel.annotate(text)
```

</div>

## Results

```bash
+----------+---------------------------------+
| token| ner_label|
+----------+---------------------------------+
| The| O|
| Company| O|
| shall| B-INDEMNIFICATION_ACTION|
| protect| I-INDEMNIFICATION_ACTION|
| and| O|
| indemnify| B-INDEMNIFICATION_ACTION|
| the| O|
| Supplier|B-INDEMNIFICATION_INDIRECT_OBJECT|
| against| O|
| any| O|
| damages| B-INDEMNIFICATION|
| ,| O|
| losses| B-INDEMNIFICATION|
| or| O|
| costs| B-INDEMNIFICATION|
|whatsoever| O|
+----------+---------------------------------+
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legner_bert_indemnifications|
|Compatibility:|Spark NLP for Legal 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Input Labels:|[sentence, token]|
|Output Labels:|[ner]|
|Language:|en|
|Size:|412.2 MB|
|Case sensitive:|true|
|Max sentence length:|128|

## References

In-house annotated examples from CUAD legal dataset

## Benchmarking

```bash
precision recall f1-score support

B-INDEMNIFICATION 0.91 0.89 0.90 36
B-INDEMNIFICATION_ACTION 0.92 0.71 0.80 17
B-INDEMNIFICATION_INDIRECT_OBJECT 0.88 0.88 0.88 40
B-INDEMNIFICATION_SUBJECT 0.71 0.56 0.63 9
I-INDEMNIFICATION 0.88 0.78 0.82 9
I-INDEMNIFICATION_ACTION 0.81 0.87 0.84 15
I-INDEMNIFICATION_INDIRECT_OBJECT 1.00 0.53 0.69 17
O 0.97 0.91 0.94 510

accuracy 0.88 654
macro avg 0.71 0.61 0.81 654
weighted avg 0.95 0.88 0.91 654
```
152 changes: 152 additions & 0 deletions docs/_posts/josejuanmartinez/2022-09-19-legre_indemnifications_en.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
layout: model
title: Legal Indemnification Relation Extraction
author: John Snow Labs
name: legre_indemnifications
date: 2022-09-19
tags: [en, legal, re, indemnification, licensed]
task: Relation Extraction
language: en
edition: Spark NLP for Legal 1.0.0
spark_version: 3.2
supported: true
article_header:
type: cover
use_language_switcher: "Python-Scala-Java"
---

## Description

This is a Relation Extraction model to group the different entities extracted with the Indemnification NER model (see `legner_bert_indemnifications` in Models Hub).

## Predicted Entities

`is_indemnification_subject`, `is_indemnification_object`, `is_indemnification_indobject`

{:.btn-box}
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
<button class="button button-orange" disabled>Open in Colab</button>
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legre_indemnifications_en_1.0.0_3.2_1663605721178.zip){:.button.button-orange.button-orange-trans.arr.button-icon}

## How to use



<div class="tabs-box" markdown="1">
{% include programmingLanguageSelectScalaPythonNLU.html %}
```python
documentAssembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("document")

sentencizer = SentenceDetectorDLModel\
.pretrained("sentence_detector_dl", "en") \
.setInputCols(["document"])\
.setOutputCol("sentence")

tokenizer = Tokenizer()\
.setInputCols(["sentence"])\
.setOutputCol("token")

tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
.setInputCols("token", "sentence")\
.setOutputCol("label")\
.setCaseSensitive(True)

ner_converter = NerConverter()\
.setInputCols(["sentence","token","label"])\
.setOutputCol("ner_chunk")

# ONLY NEEDED IF YOU WANT TO FILTER RELATION PAIRS OR SYNTACTIC DISTANCE
# =================
pos_tagger = PerceptronModel()\
.pretrained() \
.setInputCols(["sentence", "token"])\
.setOutputCol("pos_tags")

dependency_parser = DependencyParserModel() \
.pretrained("dependency_conllu", "en") \
.setInputCols(["sentence", "pos_tags", "token"]) \
.setOutputCol("dependencies")

#Set a filter on pairs of named entities which will be treated as relation candidates
re_filter = RENerChunksFilter()\
.setInputCols(["ner_chunk", "dependencies"])\
.setOutputCol("re_ner_chunks")\
.setMaxSyntacticDistance(20)\
.setRelationPairs(['INDEMNIFICATION_SUBJECT-INDEMNIFICATION_ACTION', 'INDEMNIFICATION_SUBJECT-INDEMNIFICATION_INDIRECT_OBJECT', 'INDEMNIFICATION_ACTION-INDEMNIFICATION', 'INDEMNIFICATION_ACTION-INDEMNIFICATION_INDIRECT_OBJECT'])
# =================

reDL = legal.RelationExtractionDLModel()\
.pretrained("legre_indemnifications", "en", "legal/models")\
.setPredictionThreshold(0.5)\
.setInputCols(["re_ner_chunks", "sentence"])\
.setOutputCol("relations")

nlpPipeline = Pipeline(stages=[
documentAssembler,
sentencizer,
tokenizer,
tokenClassifier,
ner_converter,
pos_tagger,
dependency_parser,
re_filter,
reDL])

empty_data = spark.createDataFrame([[""]]).toDF("text")

model = nlpPipeline.fit(empty_data)

text='''The Company shall indemnify and hold harmless HOC against any losses, claims, damages or liabilities to which it may become subject under the 1933 Act or otherwise, insofar as such losses, claims, damages or liabilities (or actions in respect thereof) arise out of or are based upon '''

data = spark.createDataFrame([[text]]).toDF("text")
model = nlpPipeline.fit(data)
lmodel = LightPipeline(model)
res = lmodel.annotate(text)
```

</div>

## Results

```bash
relation entity1 entity1_begin entity1_end chunk1 entity2 entity2_begin entity2_end chunk2 confidence
1 is_indemnification_subject INDEMNIFICATION_SUBJECT 4 10 Company INDEMNIFICATION_ACTION 32 44 hold harmless 0.8847967
2 is_indemnification_indobject INDEMNIFICATION_SUBJECT 4 10 Company INDEMNIFICATION_INDIRECT_OBJECT 46 48 HOC 0.96191925
3 is_indemnification_indobject INDEMNIFICATION_ACTION 12 26 shall indemnify INDEMNIFICATION_INDIRECT_OBJECT 46 48 HOC 0.7332646
10 is_indemnification_object INDEMNIFICATION_ACTION 32 44 hold harmless INDEMNIFICATION 70 75 claims 0.9728908
11 is_indemnification_object INDEMNIFICATION_ACTION 32 44 hold harmless INDEMNIFICATION 78 84 damages 0.9727499
12 is_indemnification_object INDEMNIFICATION_ACTION 32 44 hold harmless INDEMNIFICATION 89 99 liabilities 0.964168
```

{:.model-param}
## Model Information

{:.table-model}
|---|---|
|Model Name:|legre_indemnifications|
|Compatibility:|Spark NLP for Legal 1.0.0+|
|License:|Licensed|
|Edition:|Official|
|Language:|en|
|Size:|405.9 MB|

## References

In-house annotated examples from CUAD legal dataset

## Benchmarking

```bash
Relation Recall Precision F1 Support

is_indemnification_indobject 0.966 1.000 0.982 29
is_indemnification_object 0.929 0.929 0.929 42
is_indemnification_subject 0.931 0.931 0.931 29
no_rel 0.950 0.941 0.945 100

Avg. 0.944 0.950 0.947

Weighted Avg. 0.945 0.945 0.945
```
Loading