Skip to content

Commit 842f19a

Browse files
Add model 2022-10-22-finclf_bert_sentiment_analysis_lt (#12975)
Co-authored-by: josejuanmartinez <jjmcarrascosa@gmail.com>
1 parent ef9ee69 commit 842f19a

File tree

1 file changed

+110
-0
lines changed

1 file changed

+110
-0
lines changed
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
---
2+
layout: model
3+
title: Financial Sentiment Analysis (Lithuanian)
4+
author: John Snow Labs
5+
name: finclf_bert_sentiment_analysis
6+
date: 2022-10-22
7+
tags: [lt, legal, classification, sentiment, analysis, licensed]
8+
task: Text Classification
9+
language: lt
10+
edition: Spark NLP for Finance 1.0.0
11+
spark_version: 3.0
12+
supported: true
13+
article_header:
14+
type: cover
15+
use_language_switcher: "Python-Scala-Java"
16+
---
17+
18+
## Description
19+
20+
This is a Lithuanian Sentiment Analysis Text Classifier, which will retrieve if a text is either expression a Positive Emotion or a Negative one.
21+
22+
## Predicted Entities
23+
24+
`APPLICANT`, `COMMISSION/CHAMBER`, `ECHR`, `OTHER`, `STATE`, `THIRD PARTIES`
25+
26+
{:.btn-box}
27+
<button class="button button-orange" disabled>Live Demo</button>
28+
<button class="button button-orange" disabled>Open in Colab</button>
29+
[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finclf_bert_sentiment_analysis_lt_1.0.0_3.0_1666475378253.zip){:.button.button-orange.button-orange-trans.arr.button-icon}
30+
31+
## How to use
32+
33+
34+
35+
<div class="tabs-box" markdown="1">
36+
{% include programmingLanguageSelectScalaPythonNLU.html %}
37+
```python
38+
# Test classifier in Spark NLP pipeline
39+
document_assembler = nlp.DocumentAssembler() \
40+
.setInputCol('text') \
41+
.setOutputCol('document')
42+
43+
tokenizer = nlp.Tokenizer() \
44+
.setInputCols(['document']) \
45+
.setOutputCol('token')
46+
47+
# Load newly trained classifier
48+
sequenceClassifier_loaded = finance.BertForSequenceClassification.pretrained("finclf_bert_sentiment_analysis", "lt", "finance/models")\
49+
.setInputCols(["document",'token'])\
50+
.setOutputCol("class")
51+
52+
pipeline = Pipeline(stages=[
53+
document_assembler,
54+
tokenizer,
55+
sequenceClassifier_loaded
56+
])
57+
58+
# Generating example
59+
example = spark.createDataFrame([["Pagalbos paraðiuto laukiantis verslas priemones vertina teigiamai tik yra keli „jeigu“"]]).toDF("text")
60+
61+
result = pipeline.fit(example).transform(example)
62+
63+
# Checking results
64+
result.select("text", "class.result").show(truncate=False)
65+
```
66+
67+
</div>
68+
69+
## Results
70+
71+
```bash
72+
+---------------------------------------------------------------------------------------+------+
73+
|text |result|
74+
+---------------------------------------------------------------------------------------+------+
75+
|Pagalbos paraðiuto laukiantis verslas priemones vertina teigiamai tik yra keli „jeigu“|[POS] |
76+
+---------------------------------------------------------------------------------------+------+
77+
```
78+
79+
{:.model-param}
80+
## Model Information
81+
82+
{:.table-model}
83+
|---|---|
84+
|Model Name:|finclf_bert_sentiment_analysis|
85+
|Compatibility:|Spark NLP for Finance 1.0.0+|
86+
|License:|Licensed|
87+
|Edition:|Official|
88+
|Input Labels:|[document, token]|
89+
|Output Labels:|[class]|
90+
|Language:|lt|
91+
|Size:|406.6 MB|
92+
|Case sensitive:|true|
93+
|Max sentence length:|128|
94+
95+
## References
96+
97+
An in-house augmented version of [this dataset](https://www.kaggle.com/datasets/rokastrimaitis/lithuanian-financial-news-dataset-and-bigrams?select=dataset%28original%29.csv) removing NEU tag
98+
99+
## Benchmarking
100+
101+
```bash
102+
label precision recall f1-score support
103+
104+
NEG 0.80 0.76 0.78 509
105+
POS 0.90 0.92 0.91 1167
106+
107+
accuracy 0.87 1676
108+
macro avg 0.85 0.84 0.84 1676
109+
weighted avg 0.87 0.87 0.87 1676
110+
```

0 commit comments

Comments
 (0)