Skip to content

Commit 9bd64ae

Browse files
Models hub legal (#12835)
* 2022-09-19-legre_indemnifications_en (#12758) * Add model 2022-09-19-legre_indemnifications_en * Add model 2022-09-19-legner_bert_indemnifications_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * 2022-09-20-legclf_cuad_confidentiality_clause_en (#12770) * Add model 2022-09-20-legclf_cuad_confidentiality_clause_en * Add model 2022-09-20-legclf_cuad_indemnifications_clause_en * Add model 2022-09-20-legclf_cuad_licenses_clause_en * Add model 2022-09-20-legclf_cuad_obligations_clause_en * Add model 2022-09-20-legclf_cuad_whereas_clause_en Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_licenses_clause_en (#12827) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legclf_cuad_indemnifications_clause_en (#12828) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legner_bert_indemnifications_en (#12831) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> * Add model 2022-09-27-legassertion_time_en (#12832) Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com> Co-authored-by: jsl-models <74001263+jsl-models@users.noreply.github.com>
1 parent 5d3024a commit 9bd64ae

File tree

2 files changed

+258
-0
lines changed

2 files changed

+258
-0
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
layout: model
3+
title: Temporality / Certainty Assertion Status
4+
author: John Snow Labs
5+
name: legassertion_time
6+
date: 2022-09-27
7+
tags: [en, licensed]
8+
task: Assertion Status
9+
language: en
10+
edition: Spark NLP for Legal 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
article_header:
14+
type: cover
15+
use_language_switcher: "Python-Scala-Java"
16+
---
17+
18+
## Description
19+
20+
This is an Assertion Status Model aimed to detect temporality (PRESENT, PAST, FUTURE) or Certainty (POSSIBLE) in your legal documents
21+
22+
## Predicted Entities
23+
24+
`PRESENT`, `PAST`, `FUTURE`, `POSSIBLE`
25+
26+
{:.btn-box}
27+
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGASSERTION_TEMPORALITY){:.button.button-orange}
28+
<button class="button button-orange" disabled>Open in Colab</button>
29+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legassertion_time_en_1.0.0_3.0_1664274039847.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
30+
31+
## How to use
32+
33+
34+
35+
<div class="tabs-box" markdown="1">
36+
{% include programmingLanguageSelectScalaPythonNLU.html %}
37+
```python
38+
# YOUR NER HERE
39+
# ...
40+
embeddings = BertEmbeddings.pretrained("bert_embeddings_sec_bert_base","en") \
41+
.setInputCols(["sentence", "token"]) \
42+
.setOutputCol("embeddings")
43+
44+
chunk_converter = ChunkConverter() \
45+
.setInputCols(["entity"]) \
46+
.setOutputCol("ner_chunk")
47+
48+
assertion = leg.AssertionDLModel.pretrained("legassertion_time", "en", "legal/models")\
49+
.setInputCols(["sentence", "ner_chunk", "embeddings"]) \
50+
.setOutputCol("assertion")
51+
52+
nlpPipeline = Pipeline(stages=[
53+
documentAssembler,
54+
tokenizer,
55+
embeddings,
56+
ner,
57+
chunk_converter,
58+
assertion
59+
])
60+
61+
empty_data = spark.createDataFrame([[""]]).toDF("text")
62+
63+
model = nlpPipeline.fit(empty_data)
64+
65+
lp = LightPipeline(model)
66+
67+
texts = ["The subsidiaries of Atlantic Inc will participate in a merging operation",
68+
"The Conditions and Warranties of this agreement might be modified"]
69+
70+
lp.annotate(texts)
71+
```
72+
73+
</div>
74+
75+
## Results
76+
77+
```bash
78+
chunk,begin,end,entity_type,assertion
79+
Atlantic Inc,20,31,ORG,FUTURE
80+
81+
chunk,begin,end,entity_type,assertion
82+
Conditions and Warranties,4,28,DOC,POSSIBLE
83+
```
84+
85+
{:.model-param}
86+
## Model Information
87+
88+
{:.table-model}
89+
|---|---|
90+
|Model Name:|legassertion_time|
91+
|Compatibility:|Spark NLP for Legal 1.0.0+|
92+
|License:|Licensed|
93+
|Edition:|Official|
94+
|Input Labels:|[document, doc_chunk, embeddings]|
95+
|Output Labels:|[assertion]|
96+
|Language:|en|
97+
|Size:|2.2 MB|
98+
99+
## References
100+
101+
In-house annotations on financial and legal corpora
102+
103+
## Benchmarking
104+
105+
```bash
106+
label tp fp fn prec rec f1
107+
PRESENT 201 11 16 0.9481132 0.92626727 0.937063
108+
POSSIBLE 171 3 6 0.98275864 0.9661017 0.9743589
109+
FUTURE 119 6 4 0.952 0.96747965 0.95967746
110+
PAST 270 16 10 0.9440559 0.96428573 0.9540636
111+
tp: 761 fp: 36 fn: 36 labels: 4
112+
Macro-average prec: 0.9567319, rec: 0.9560336, f1: 0.95638263
113+
Micro-average prec: 0.9548306, rec: 0.9548306, f1: 0.9548306
114+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
layout: model
3+
title: Legal Indemnification NER (Bert, base)
4+
author: John Snow Labs
5+
name: legner_bert_indemnifications
6+
date: 2022-09-27
7+
tags: [indemnifications, en, licensed]
8+
task: Named Entity Recognition
9+
language: en
10+
edition: Spark NLP for Legal 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
article_header:
14+
type: cover
15+
use_language_switcher: "Python-Scala-Java"
16+
---
17+
18+
## Description
19+
20+
This is a Legal Named Entity Recognition Model to identify the Subject (who), Action (web), Object(the indemnification) and Indirect Object (to whom) from Indemnification clauses.
21+
22+
## Predicted Entities
23+
24+
`INDEMNIFICATION`, `INDEMNIFICATION_SUBJECT`, `INDEMNIFICATION_ACTION`, `INDEMNIFICATION_INDIRECT_OBJECT`
25+
26+
{:.btn-box}
27+
[Live Demo](https://demo.johnsnowlabs.com/legal/LEGALRE_INDEMNIFICATION/){:.button.button-orange}
28+
<button class="button button-orange" disabled>Open in Colab</button>
29+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/legal/models/legner_bert_indemnifications_en_1.0.0_3.0_1664273651991.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
30+
31+
## How to use
32+
33+
34+
35+
<div class="tabs-box" markdown="1">
36+
{% include programmingLanguageSelectScalaPythonNLU.html %}
37+
```python
38+
documentAssembler = DocumentAssembler()\
39+
.setInputCol("text")\
40+
.setOutputCol("document")
41+
42+
sentencizer = SentenceDetectorDLModel\
43+
.pretrained("sentence_detector_dl", "en") \
44+
.setInputCols(["document"])\
45+
.setOutputCol("sentence")
46+
47+
tokenizer = Tokenizer()\
48+
.setInputCols(["sentence"])\
49+
.setOutputCol("token")
50+
51+
tokenClassifier = legal.BertForTokenClassification.pretrained("legner_bert_indemnifications", "en", "legal/models")\
52+
.setInputCols("token", "sentence")\
53+
.setOutputCol("label")\
54+
.setCaseSensitive(True)
55+
56+
ner_converter = NerConverter()\
57+
.setInputCols(["sentence","token","label"])\
58+
.setOutputCol("ner_chunk")
59+
60+
nlpPipeline = Pipeline(stages=[
61+
documentAssembler,
62+
sentencizer,
63+
tokenizer,
64+
tokenClassifier,
65+
ner_converter
66+
])
67+
68+
empty_data = spark.createDataFrame([[""]]).toDF("text")
69+
70+
model = nlpPipeline.fit(empty_data)
71+
72+
text='''The Company shall protect and indemnify the Supplier against any damages, losses or costs whatsoever'''
73+
74+
data = spark.createDataFrame([[text]]).toDF("text")
75+
model = nlpPipeline.fit(data)
76+
lmodel = LightPipeline(model)
77+
res = lmodel.annotate(text)
78+
```
79+
80+
</div>
81+
82+
## Results
83+
84+
```bash
85+
+----------+---------------------------------+
86+
| token| ner_label|
87+
+----------+---------------------------------+
88+
| The| O|
89+
| Company| O|
90+
| shall| B-INDEMNIFICATION_ACTION|
91+
| protect| I-INDEMNIFICATION_ACTION|
92+
| and| O|
93+
| indemnify| B-INDEMNIFICATION_ACTION|
94+
| the| O|
95+
| Supplier|B-INDEMNIFICATION_INDIRECT_OBJECT|
96+
| against| O|
97+
| any| O|
98+
| damages| B-INDEMNIFICATION|
99+
| ,| O|
100+
| losses| B-INDEMNIFICATION|
101+
| or| O|
102+
| costs| B-INDEMNIFICATION|
103+
|whatsoever| O|
104+
+----------+---------------------------------+
105+
```
106+
107+
{:.model-param}
108+
## Model Information
109+
110+
{:.table-model}
111+
|---|---|
112+
|Model Name:|legner_bert_indemnifications|
113+
|Compatibility:|Spark NLP for Legal 1.0.0+|
114+
|License:|Licensed|
115+
|Edition:|Official|
116+
|Input Labels:|[sentence, token]|
117+
|Output Labels:|[ner]|
118+
|Language:|en|
119+
|Size:|412.2 MB|
120+
|Case sensitive:|true|
121+
|Max sentence length:|128|
122+
123+
## References
124+
125+
In-house annotated examples from CUAD legal dataset
126+
127+
## Benchmarking
128+
129+
```bash
130+
precision recall f1-score support
131+
132+
B-INDEMNIFICATION 0.91 0.89 0.90 36
133+
B-INDEMNIFICATION_ACTION 0.92 0.71 0.80 17
134+
B-INDEMNIFICATION_INDIRECT_OBJECT 0.88 0.88 0.88 40
135+
B-INDEMNIFICATION_SUBJECT 0.71 0.56 0.63 9
136+
I-INDEMNIFICATION 0.88 0.78 0.82 9
137+
I-INDEMNIFICATION_ACTION 0.81 0.87 0.84 15
138+
I-INDEMNIFICATION_INDIRECT_OBJECT 1.00 0.53 0.69 17
139+
O 0.97 0.91 0.94 510
140+
141+
accuracy 0.88 654
142+
macro avg 0.71 0.61 0.81 654
143+
weighted avg 0.95 0.88 0.91 654
144+
```

0 commit comments

Comments
 (0)