|
| 1 | +--- |
| 2 | +layout: model |
| 3 | +title: Earning Calls Financial NER (Generic, sm) |
| 4 | +author: John Snow Labs |
| 5 | +name: finner_earning_calls_generic_sm |
| 6 | +date: 2022-11-30 |
| 7 | +tags: [en, financial, ner, earning, calls, licensed] |
| 8 | +task: Named Entity Recognition |
| 9 | +language: en |
| 10 | +edition: Finance NLP 1.0.0 |
| 11 | +spark_version: 3.0 |
| 12 | +supported: true |
| 13 | +article_header: |
| 14 | + type: cover |
| 15 | +use_language_switcher: "Python-Scala-Java" |
| 16 | +--- |
| 17 | + |
| 18 | +## Description |
| 19 | + |
| 20 | +This is a `sm` (small) version of a financial model trained on Earning Calls transcripts to detect financial entities (NER model). |
| 21 | +This model is called `Generic` as it has fewer labels in comparison with the `Specific` version. |
| 22 | + |
| 23 | +Please note this model requires some tokenization configuration to extract the currency (see python snippet below). |
| 24 | + |
| 25 | +The currently available entities are: |
| 26 | +- AMOUNT: Numeric amounts, not percentages |
| 27 | +- ASSET: Current or Fixed Asset |
| 28 | +- ASSET_DECREASE: Decrease in the asset possession/exposure |
| 29 | +- ASSET_INCREASE: Increase in the asset possession/exposure |
| 30 | +- CF: Total cash flow |
| 31 | +- CF_DECREASE: Relative decrease in cash flow |
| 32 | +- CF_INCREASE: Relative increase in cash flow |
| 33 | +- COUNT: Number of items (not monetary, not percentages). |
| 34 | +- CURRENCY: The currency of the amount |
| 35 | +- DATE: Generic dates in context where either it's not a fiscal year or it can't be asserted as such given the context |
| 36 | +- EXPENSE: An expense or loss |
| 37 | +- EXPENSE_DECREASE: A piece of information saying there was an expense decrease in that fiscal year |
| 38 | +- EXPENSE_INCREASE: A piece of information saying there was an expense increase in that fiscal year |
| 39 | +- FCF: Free Cash Flow |
| 40 | +- FISCAL_YEAR: A date which expresses which month the fiscal exercise was closed for a specific year |
| 41 | +- KPI: Key Performance Indicator, a quantifiable measure of performance over time for a specific objective |
| 42 | +- KPI_DECREASE: Relative decrease in a KPI |
| 43 | +- KPI_INCREASE: Relative increase in a KPI |
| 44 | +- LIABILITY: Current or Long-Term Liability (not from stockholders) |
| 45 | +- LIABILITY_DECREASE: Relative decrease in liability |
| 46 | +- LIABILITY_INCREASE: Relative increase in liability |
| 47 | +- ORG: Mention to a company/organization name |
| 48 | +- PERCENTAGE: : Numeric amounts which are percentages |
| 49 | +- PROFIT: Profit or also Revenue |
| 50 | +- PROFIT_DECLINE: A piece of information saying there was a profit / revenue decrease in that fiscal year |
| 51 | +- PROFIT_INCREASE: A piece of information saying there was a profit / revenue increase in that fiscal year |
| 52 | +- TICKER: Trading symbol of the company |
| 53 | + |
| 54 | +You can also check for the Relation Extraction model which connects these entities together. |
| 55 | + |
| 56 | +## Predicted Entities |
| 57 | + |
| 58 | +`AMOUNT`, `ASSET`, `ASSET_DECREASE`, `ASSET_INCREASE`, `CF`, `CF_DECREASE`, `CF_INCREASE`, `COUNT`, `CURRENCY`, `DATE`, `EXPENSE`, `EXPENSE_DECREASE`, `EXPENSE_INCREASE`, `FCF`, `FISCAL_YEAR`, `KPI`, `KPI_DECREASE`, `KPI_INCREASE`, `LIABILITY`, `LIABILITY_DECREASE`, `LIABILITY_INCREASE`, `ORG`, `PERCENTAGE`, `PROFIT`, `PROFIT_DECLINE`, `PROFIT_INCREASE`, `TICKER` |
| 59 | + |
| 60 | + |
| 61 | +{:.btn-box} |
| 62 | +<button class="button button-orange" disabled>Live Demo</button> |
| 63 | +<button class="button button-orange" disabled>Open in Colab</button> |
| 64 | +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/finance/models/finner_earning_calls_generic_sm_en_1.0.0_3.0_1669839690938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} |
| 65 | + |
| 66 | +## How to use |
| 67 | + |
| 68 | + |
| 69 | + |
| 70 | +<div class="tabs-box" markdown="1"> |
| 71 | +{% include programmingLanguageSelectScalaPythonNLU.html %} |
| 72 | +```python |
| 73 | +document_assembler = nlp.DocumentAssembler()\ |
| 74 | + .setInputCol("text")\ |
| 75 | + .setOutputCol("document") |
| 76 | + |
| 77 | +sentence_detector = nlp.SentenceDetectorDLModel.pretrained("sentence_detector_dl","xx")\ |
| 78 | + .setInputCols(["document"])\ |
| 79 | + .setOutputCol("sentence") |
| 80 | + |
| 81 | +tokenizer = nlp.Tokenizer()\ |
| 82 | + .setInputCols(["sentence"])\ |
| 83 | + .setOutputCol("token")\ |
| 84 | + .setContextChars(['.', ',', ';', ':', '!', '?', '*', '-', '(', ')', '”', '’', '$','€']) |
| 85 | + |
| 86 | +embeddings = nlp.BertEmbeddings.pretrained("bert_embeddings_sec_bert_base", "en") \ |
| 87 | + .setInputCols("sentence", "token") \ |
| 88 | + .setOutputCol("embeddings")\ |
| 89 | + .setMaxSentenceLength(512) |
| 90 | + |
| 91 | +ner_model = finance.NerModel.pretrained("finner_earning_calls_generic_sm", "en", "finance/models")\ |
| 92 | + .setInputCols(["sentence", "token", "embeddings"])\ |
| 93 | + .setOutputCol("ner") |
| 94 | + |
| 95 | +ner_converter = nlp.NerConverter()\ |
| 96 | + .setInputCols(["sentence", "token", "ner"])\ |
| 97 | + .setOutputCol("ner_chunk") |
| 98 | + |
| 99 | +pipeline = nlp.Pipeline(stages=[ |
| 100 | + document_assembler, |
| 101 | + sentence_detector, |
| 102 | + tokenizer, |
| 103 | + embeddings, |
| 104 | + ner_model, |
| 105 | + ner_converter |
| 106 | + ]) |
| 107 | + |
| 108 | +data = spark.createDataFrame([["""Adjusted EPS was ahead of our expectations at $ 1.21 , and free cash flow is also ahead of our expectations despite a $ 1.5 billion additional tax payment we made related to the R&D amortization."""]]).toDF("text") |
| 109 | + |
| 110 | +model = pipeline.fit(data) |
| 111 | + |
| 112 | +result = model.transform(data) |
| 113 | + |
| 114 | +result.select(F.explode(F.arrays_zip('ner_chunk.result', 'ner_chunk.metadata')).alias("cols")) \ |
| 115 | + .select(F.expr("cols['0']").alias("text"), |
| 116 | + F.expr("cols['1']['entity']").alias("label")).show(200, truncate = False) |
| 117 | +``` |
| 118 | + |
| 119 | +</div> |
| 120 | + |
| 121 | +## Results |
| 122 | + |
| 123 | +```bash |
| 124 | ++------------+----------+----------+ |
| 125 | +| token| ner_label|confidence| |
| 126 | ++------------+----------+----------+ |
| 127 | +| Adjusted| B-PROFIT| 0.9691| |
| 128 | +| EPS| I-PROFIT| 0.9954| |
| 129 | +| was| O| 1.0| |
| 130 | +| ahead| O| 1.0| |
| 131 | +| of| O| 1.0| |
| 132 | +| our| O| 1.0| |
| 133 | +|expectations| O| 1.0| |
| 134 | +| at| O| 1.0| |
| 135 | +| $|B-CURRENCY| 1.0| |
| 136 | +| 1.21| B-AMOUNT| 1.0| |
| 137 | +| ,| O| 0.9998| |
| 138 | +| and| O| 1.0| |
| 139 | +| free| B-FCF| 0.9981| |
| 140 | +| cash| I-FCF| 0.9998| |
| 141 | +| flow| I-FCF| 0.9998| |
| 142 | +| is| O| 1.0| |
| 143 | +| also| O| 1.0| |
| 144 | +| ahead| O| 1.0| |
| 145 | +| of| O| 1.0| |
| 146 | +| our| O| 1.0| |
| 147 | +|expectations| O| 1.0| |
| 148 | +| despite| O| 1.0| |
| 149 | +| a| O| 1.0| |
| 150 | +| $|B-CURRENCY| 1.0| |
| 151 | +| 1.5| B-AMOUNT| 1.0| |
| 152 | +| billion| I-AMOUNT| 0.9999| |
| 153 | +| additional| O| 0.998| |
| 154 | +| tax| O| 0.9532| |
| 155 | +| payment| O| 0.945| |
| 156 | +| we| O| 0.9999| |
| 157 | +| made| O| 1.0| |
| 158 | +| related| O| 1.0| |
| 159 | +| to| O| 1.0| |
| 160 | +| the| O| 1.0| |
| 161 | +| R&D| O| 0.9981| |
| 162 | +|amortization| O| 0.9973| |
| 163 | +| .| O| 1.0| |
| 164 | ++------------+----------+----------+ |
| 165 | +``` |
| 166 | + |
| 167 | +{:.model-param} |
| 168 | +## Model Information |
| 169 | + |
| 170 | +{:.table-model} |
| 171 | +|---|---| |
| 172 | +|Model Name:|finner_earning_calls_generic_sm| |
| 173 | +|Compatibility:|Finance NLP 1.0.0+| |
| 174 | +|License:|Licensed| |
| 175 | +|Edition:|Official| |
| 176 | +|Input Labels:|[sentence, token, embeddings]| |
| 177 | +|Output Labels:|[ner]| |
| 178 | +|Language:|en| |
| 179 | +|Size:|16.2 MB| |
| 180 | + |
| 181 | +## References |
| 182 | + |
| 183 | +In-house annotations on Earning Calls. |
| 184 | + |
| 185 | +## Benchmarking |
| 186 | + |
| 187 | +```bash |
| 188 | +label tp fp fn prec rec f1 |
| 189 | +I-AMOUNT 383 1 3 0.9973958 0.992228 0.9948052 |
| 190 | +B-COUNT 13 5 2 0.7222222 0.8666667 0.78787875 |
| 191 | +B-AMOUNT 453 0 6 1.0 0.9869281 0.9934211 |
| 192 | +I-ORG 16 0 0 1.0 1.0 1.0 |
| 193 | +B-DATE 117 11 5 0.9140625 0.9590164 0.93600005 |
| 194 | +B-LIABILITY_DECREASE 1 1 0 0.5 1.0 0.6666667 |
| 195 | +I-LIABILITY 8 6 3 0.5714286 0.72727275 0.64000005 |
| 196 | +I-EXPENSE 75 13 52 0.85227275 0.5905512 0.69767445 |
| 197 | +I-KPI_INCREASE 6 3 8 0.6666667 0.42857143 0.5217392 |
| 198 | +B-LIABILITY 9 4 5 0.6923077 0.64285713 0.6666667 |
| 199 | +I-CF 18 1 18 0.94736844 0.5 0.6545455 |
| 200 | +I-COUNT 12 2 1 0.85714287 0.9230769 0.8888889 |
| 201 | +B-FCF 13 5 0 0.7222222 1.0 0.83870965 |
| 202 | +B-PROFIT_INCREASE 79 22 31 0.7821782 0.7181818 0.7488152 |
| 203 | +B-KPI_INCREASE 3 4 11 0.42857143 0.21428572 0.2857143 |
| 204 | +B-EXPENSE 41 19 38 0.68333334 0.51898736 0.5899281 |
| 205 | +I-PROFIT_DECLINE 5 7 22 0.41666666 0.18518518 0.25641027 |
| 206 | +I-LIABILITY_DECREASE 1 1 0 0.5 1.0 0.6666667 |
| 207 | +I-PROFIT 188 47 50 0.8 0.789916 0.79492605 |
| 208 | +B-CURRENCY 440 0 1 1.0 0.9977324 0.9988649 |
| 209 | +I-PROFIT_INCREASE 77 23 45 0.77 0.63114756 0.69369364 |
| 210 | +I-CURRENCY 6 0 0 1.0 1.0 1.0 |
| 211 | +B-CF 9 1 8 0.9 0.5294118 0.6666667 |
| 212 | +B-PROFIT 147 51 40 0.74242425 0.7860963 0.7636363 |
| 213 | +B-PERCENTAGE 417 2 4 0.99522674 0.99049884 0.99285716 |
| 214 | +B-TICKER 13 0 0 1.0 1.0 1.0 |
| 215 | +I-FISCAL_YEAR 3 0 0 1.0 1.0 1.0 |
| 216 | +B-ORG 14 0 0 1.0 1.0 1.0 |
| 217 | +B-EXPENSE_INCREASE 6 0 4 1.0 0.6 0.75 |
| 218 | +B-EXPENSE_DECREASE 1 0 1 1.0 0.5 0.6666667 |
| 219 | +B-ASSET 9 2 16 0.8181818 0.36 0.5 |
| 220 | +B-FISCAL_YEAR 1 0 0 1.0 1.0 1.0 |
| 221 | +I-EXPENSE_DECREASE 3 2 2 0.6 0.6 0.6 |
| 222 | +I-FCF 26 15 0 0.63414633 1.0 0.7761194 |
| 223 | +I-EXPENSE_INCREASE 8 0 3 1.0 0.72727275 0.84210527 |
| 224 | +Macro-average 2637 255 465 0.7494908 0.64362085 0.70253296 |
| 225 | +Micro-average 2637 255 465 0.9118257 0.8500967 0.8798799 |
| 226 | +``` |
0 commit comments