Previous updates
+
+- Updated to version 0.3.6
+ - Support custom baseline files [#74](https://github.com/Tiiiger/bert_score/pull/74)
+ - The option `--rescale-with-baseline` is changed to `--rescale_with_baseline` so that it is consistent with other options.
+- Updated to version 0.3.5
+ - Being compatible with Huggingface's transformers >=v3.0.0 and minor fixes ([#58](https://github.com/Tiiiger/bert_score/pull/58), [#66](https://github.com/Tiiiger/bert_score/pull/66), [#68](https://github.com/Tiiiger/bert_score/pull/68))
+ - Several improvements related to efficency ([#67](https://github.com/Tiiiger/bert_score/pull/67), [#69](https://github.com/Tiiiger/bert_score/pull/69))
+- Updated to version 0.3.4
+ - Compatible with transformers v2.11.0 now (#58)
+- Updated to version 0.3.3
+ - Fixing the bug with empty strings [issue #47](https://github.com/Tiiiger/bert_score/issues/47).
+ - Supporting 6 [ELECTRA](https://github.com/google-research/electra) models and 24 smaller [BERT](https://github.com/google-research/bert) models.
+ - A new [Google sheet](https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/edit?usp=sharing) for keeping the performance (i.e., pearson correlation with human judgment) of different models on WMT16 to-English.
+ - Including the script for tuning the best number of layers of an English pre-trained model on WMT16 to-English data (See the [details](tune_layers)).
+- Updated to version 0.3.2
+ - **Bug fixed**: fixing the bug in v0.3.1 when having multiple reference sentences.
+ - Supporting multiple reference sentences with our command line tool.
+- Updated to version 0.3.1
+ - A new `BERTScorer` object that caches the model to avoid re-loading it multiple times. Please see our [jupyter notebook example](./example/Demo.ipynb) for the usage.
+ - Supporting multiple reference sentences for each example. The `score` function now can take a list of lists of strings as the references and return the score between the candidate sentence and its closest reference sentence.
+
+
+
+Please see [release logs](https://github.com/Tiiiger/bert_score/releases) for older updates.
+
+#### Authors:
+* [Tianyi Zhang](https://scholar.google.com/citations?user=OI0HSa0AAAAJ&hl=en)*
+* [Varsha Kishore](https://scholar.google.com/citations?user=B8UeYcEAAAAJ&authuser=2)*
+* [Felix Wu](https://sites.google.com/view/felixwu/home)*
+* [Kilian Q. Weinberger](http://kilian.cs.cornell.edu/index.html)
+* [Yoav Artzi](https://yoavartzi.com/)
+
+*: Equal Contribution
+
+### Overview
+BERTScore leverages the pre-trained contextual embeddings from BERT and matches
+words in candidate and reference sentences by cosine similarity.
+It has been shown to correlate with human judgment on sentence-level and
+system-level evaluation.
+Moreover, BERTScore computes precision, recall, and F1 measure, which can be
+useful for evaluating different language generation tasks.
+
+For an illustration, BERTScore recall can be computed as
+![](./bert_score.png "BERTScore")
+
+If you find this repo useful, please cite:
+```
+@inproceedings{bert-score,
+ title={BERTScore: Evaluating Text Generation with BERT},
+ author={Tianyi Zhang* and Varsha Kishore* and Felix Wu* and Kilian Q. Weinberger and Yoav Artzi},
+ booktitle={International Conference on Learning Representations},
+ year={2020},
+ url={https://openreview.net/forum?id=SkeHuCVFDr}
+}
+```
+
+### Installation
+* Python version >= 3.6
+* PyTorch version >= 1.0.0
+
+Install from pypi with pip by
+
+```sh
+pip install bert-score
+```
+Install latest unstable version from the master branch on Github by:
+```
+pip install git+https://github.com/Tiiiger/bert_score
+```
+
+Install it from the source by:
+```sh
+git clone https://github.com/Tiiiger/bert_score
+cd bert_score
+pip install .
+```
+and you may test your installation by:
+```
+python -m unittest discover
+```
+
+### Usage
+
+
+#### Python Function
+
+On a high level, we provide a python function `bert_score.score` and a python object `bert_score.BERTScorer`.
+The function provides all the supported features while the scorer object caches the BERT model to faciliate multiple evaluations.
+Check our [demo](./example/Demo.ipynb) to see how to use these two interfaces.
+Please refer to [`bert_score/score.py`](./bert_score/score.py) for implementation details.
+
+Running BERTScore can be computationally intensive (because it uses BERT :p).
+Therefore, a GPU is usually necessary. If you don't have access to a GPU, you
+can try our [demo on Google Colab](https://colab.research.google.com/drive/1kpL8Y_AnUUiCxFjhxSrxCsc6-sDMNb_Q)
+
+#### Command Line Interface (CLI)
+We provide a command line interface (CLI) of BERTScore as well as a python module.
+For the CLI, you can use it as follows:
+1. To evaluate English text files:
+
+We provide example inputs under `./example`.
+
+```sh
+bert-score -r example/refs.txt -c example/hyps.txt --lang en
+```
+You will get the following output at the end:
+
+roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.3.0) P: 0.957378 R: 0.961325 F1: 0.959333
+
+where "roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.3.0)" is the hash code.
+
+Starting from version 0.3.0, we support rescaling the scores with baseline scores
+
+```sh
+bert-score -r example/refs.txt -c example/hyps.txt --lang en --rescale_with_baseline
+```
+You will get:
+
+roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.3.0)-rescaled P: 0.747044 R: 0.770484 F1: 0.759045
+
+This makes the range of the scores larger and more human-readable. Please see this [post](./journal/rescale_baseline.md) for details.
+
+When having multiple reference sentences, please use
+```sh
+bert-score -r example/refs.txt example/refs2.txt -c example/hyps.txt --lang en
+```
+where the `-r` argument supports an arbitrary number of reference files. Each reference file should have the same number of lines as your candidate/hypothesis file. The i-th line in each reference file corresponds to the i-th line in the candidate file.
+
+
+2. To evaluate text files in other languages:
+
+We currently support the 104 languages in multilingual BERT ([full list](https://github.com/google-research/bert/blob/master/multilingual.md#list-of-languages)).
+
+Please specify the two-letter abbreviation of the language. For instance, using `--lang zh` for Chinese text.
+
+See more options by `bert-score -h`.
+
+
+3. To load your own custom model:
+Please specify the path to the model and the number of layers to use by `--model` and `--num_layers`.
+```sh
+bert-score -r example/refs.txt -c example/hyps.txt --model path_to_my_bert --num_layers 9
+```
+
+
+4. To visualize matching scores:
+```sh
+bert-score-show --lang en -r "There are two bananas on the table." -c "On the table are two apples." -f out.png
+```
+The figure will be saved to out.png.
+
+
+#### Practical Tips
+
+* Report the hash code (e.g., `roberta-large_L17_no-idf_version=0.3.0(hug_trans=2.3.0)-rescaled`) in your paper so that people know what setting you use. This is inspired by [sacreBLEU](https://github.com/mjpost/sacreBLEU). Changes in huggingface's transformers version may also affect the score (See [issue #46](https://github.com/Tiiiger/bert_score/issues/46)).
+* Unlike BERT, RoBERTa uses GPT2-style tokenizer which creates addition " " tokens when there are multiple spaces appearing together. It is recommended to remove addition spaces by `sent = re.sub(r' +', ' ', sent)` or `sent = re.sub(r'\s+', ' ', sent)`.
+* Using inverse document frequency (idf) on the reference
+ sentences to weigh word importance may correlate better with human judgment.
+ However, when the set of reference sentences become too small, the idf score
+ would become inaccurate/invalid.
+ We now make it optional. To use idf,
+ please set `--idf` when using the CLI tool or
+ `idf=True` when calling `bert_score.score` function.
+* When you are low on GPU memory, consider setting `batch_size` when calling
+ `bert_score.score` function.
+* To use a particular model please set `-m MODEL_TYPE` when using the CLI tool
+ or `model_type=MODEL_TYPE` when calling `bert_score.score` function.
+* We tune layer to use based on WMT16 metric evaluation dataset. You may use a
+ different layer by setting `-l LAYER` or `num_layers=LAYER`. To tune the best layer for your custom model, please follow the instructions in [tune_layers](tune_layers) folder.
+* __Limitation__: Because BERT, RoBERTa, and XLM with learned positional embeddings are pre-trained on sentences with max length 512, BERTScore is undefined between sentences longer than 510 (512 after adding \[CLS\] and \[SEP\] tokens). The sentences longer than this will be truncated. Please consider using XLNet which can support much longer inputs.
+
+### Default Behavior
+
+#### Default Model
+| Language | Model |
+|:---------:|:--------------------------------:|
+| en | roberta-large |
+| en-sci | allenai/scibert_scivocab_uncased |
+| zh | bert-base-chinese |
+| tr | dbmdz/bert-base-turkish-cased |
+| others | bert-base-multilingual-cased |
+
+#### Default Layers
+Please see this [Google sheet](https://docs.google.com/spreadsheets/d/1RKOVpselB98Nnh_EOC4A2BYn8_201tmPODpNWu4w7xI/edit?usp=sharing) for the supported models and their performance.
+
+### Acknowledgement
+This repo wouldn't be possible without the awesome
+[bert](https://github.com/google-research/bert), [fairseq](https://github.com/pytorch/fairseq), and [transformers](https://github.com/huggingface/transformers).
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score.png b/mitigating_bias/train/BERTScore/bert_score/bert_score.png
new file mode 100644
index 0000000..deb6b28
Binary files /dev/null and b/mitigating_bias/train/BERTScore/bert_score/bert_score.png differ
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/__init__.py b/mitigating_bias/train/BERTScore/bert_score/bert_score/__init__.py
new file mode 100644
index 0000000..b4479d2
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/__init__.py
@@ -0,0 +1,3 @@
+__version__ = "0.3.11"
+from .score import *
+from .scorer import *
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..1182dec
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.28803304,0.28811806,0.28382972
+1,0.36045152,0.3605044,0.35791346
+2,0.35763955,0.3577387,0.35552806
+3,0.4382742,0.43832803,0.4371357
+4,0.49264902,0.4926875,0.49187797
+5,0.5753039,0.5753327,0.57483304
+6,0.63127446,0.6313224,0.6309864
+7,0.5324934,0.532565,0.53202814
+8,0.5102161,0.5103038,0.5096529
+9,0.6044539,0.6045382,0.604006
+10,0.6814313,0.68149376,0.6810876
+11,0.7187933,0.7188438,0.71841186
+12,0.386078,0.38613266,0.38548917
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..e18490e
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.24679352,0.24680473,0.24270211
+1,0.29235435,0.29231834,0.28975013
+2,0.3138872,0.31386852,0.31213808
+3,0.3285111,0.3284912,0.32616478
+4,0.34355187,0.34352767,0.3409594
+5,0.40920743,0.4091819,0.40708998
+6,0.5143928,0.5143628,0.51312447
+7,0.5684746,0.56843746,0.5675548
+8,0.55277854,0.55274475,0.55174726
+9,0.4946325,0.49455652,0.49314302
+10,0.425077,0.42500603,0.42305094
+11,0.37143245,0.37136525,0.3687799
+12,0.38431773,0.38426274,0.38162753
+13,0.40205154,0.40199956,0.3993145
+14,0.41208863,0.412054,0.40980735
+15,0.4243431,0.42427495,0.4220649
+16,0.32602695,0.3260445,0.32438898
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-roberta-base.tsv
new file mode 100644
index 0000000..2bce6f5
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.28832704,0.28834337,0.28409466
+1,0.42489076,0.42484972,0.42346135
+2,0.6489359,0.64890593,0.6484903
+3,0.7212477,0.7212302,0.7210182
+4,0.70944715,0.7094549,0.70922697
+5,0.7286318,0.72864425,0.7284186
+6,0.71929383,0.71930563,0.71912307
+7,0.75613487,0.756147,0.7559896
+8,0.7593519,0.759376,0.75920963
+9,0.801281,0.80129445,0.8010951
+10,0.8243164,0.82432646,0.8241175
+11,0.86058,0.86058563,0.8604526
+12,0.97968304,0.9796832,0.9796791
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-roberta-large.tsv
new file mode 100644
index 0000000..3345ec3
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/cs/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.36036962,0.3603732,0.35725367
+1,0.6612272,0.661179,0.66074145
+2,0.722742,0.72273415,0.72256047
+3,0.73125947,0.73123205,0.7310358
+4,0.7825561,0.7825642,0.78245354
+5,0.78133506,0.7813208,0.7811937
+6,0.8079803,0.8079664,0.8078874
+7,0.8139315,0.8139195,0.8138673
+8,0.82575524,0.82575536,0.8256901
+9,0.8267652,0.8267674,0.8267081
+10,0.826633,0.826636,0.82654697
+11,0.8310137,0.8310095,0.83087397
+12,0.8320955,0.83211106,0.83181846
+13,0.82811135,0.8281364,0.827703
+14,0.8271892,0.8272189,0.8265785
+15,0.8306057,0.8306258,0.82997155
+16,0.81801736,0.81803435,0.8175852
+17,0.8253589,0.825372,0.8250096
+18,0.82938665,0.82940817,0.8290164
+19,0.82824516,0.8282779,0.827922
+20,0.8445639,0.84459394,0.84429437
+21,0.86360985,0.8636378,0.86333483
+22,0.8661244,0.8661579,0.86584014
+23,0.8638866,0.86392677,0.8635829
+24,0.97858095,0.9785705,0.9785698
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..825803e
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.29239914,0.29233938,0.28799337
+1,0.37400073,0.37395933,0.37138724
+2,0.36879358,0.36874846,0.3663888
+3,0.4502482,0.4501956,0.44887444
+4,0.4982386,0.49817833,0.49722672
+5,0.5760319,0.5759751,0.5754043
+6,0.62940514,0.62935334,0.6289917
+7,0.5357095,0.53565013,0.53505087
+8,0.5146575,0.51462156,0.5138855
+9,0.61532813,0.61528224,0.6147353
+10,0.68632543,0.6862504,0.6858456
+11,0.7214881,0.7214059,0.72098553
+12,0.36572546,0.36572027,0.36501065
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..6f4d8ff
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.25753325,0.25744554,0.25318053
+1,0.2981514,0.2980718,0.2952621
+2,0.3208413,0.32078207,0.3187413
+3,0.33565432,0.33562624,0.33315146
+4,0.34684345,0.34679237,0.3443796
+5,0.4133209,0.41324788,0.41142154
+6,0.514071,0.51400465,0.51292115
+7,0.5642201,0.56416416,0.56339765
+8,0.54623514,0.5461879,0.54531705
+9,0.49143773,0.4913597,0.4903938
+10,0.42275012,0.42266262,0.42136824
+11,0.36494458,0.36484274,0.36310795
+12,0.37404448,0.37393928,0.37217715
+13,0.38868552,0.3885813,0.38668826
+14,0.39440155,0.39433125,0.39241815
+15,0.4055417,0.40547967,0.4035052
+16,0.30379978,0.30370486,0.30213118
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-roberta-base.tsv
new file mode 100644
index 0000000..5aabf1c
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.30777588,0.30777684,0.3031559
+1,0.44505233,0.44509125,0.4434747
+2,0.66170436,0.66174895,0.6612669
+3,0.73550326,0.7355261,0.735256
+4,0.7208496,0.72085893,0.720586
+5,0.73704386,0.73705214,0.7367808
+6,0.73208153,0.7320707,0.7318679
+7,0.7680251,0.76800215,0.76783967
+8,0.77268696,0.77266395,0.7724989
+9,0.8099519,0.80989397,0.809723
+10,0.8310105,0.83095115,0.83080244
+11,0.86770487,0.86765665,0.86756754
+12,0.9819623,0.9819598,0.9819579
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-roberta-large.tsv
new file mode 100644
index 0000000..f9f7028
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/de/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.37937975,0.379358,0.37601414
+1,0.6650208,0.66500705,0.6645409
+2,0.72824335,0.7282381,0.72805566
+3,0.74166065,0.7416417,0.741436
+4,0.7924967,0.7925062,0.7923915
+5,0.7885143,0.7884954,0.7883624
+6,0.8117979,0.8117669,0.81168765
+7,0.8173677,0.8173395,0.81728506
+8,0.82804793,0.828012,0.82794595
+9,0.83066076,0.8306335,0.83057094
+10,0.82999426,0.8299607,0.82988906
+11,0.83342683,0.83340013,0.8332831
+12,0.83806795,0.83803594,0.83778083
+13,0.83596325,0.83591455,0.83558387
+14,0.8378458,0.8377797,0.83741814
+15,0.8420356,0.84196484,0.84161186
+16,0.83186066,0.8318187,0.8314605
+17,0.83927697,0.83923465,0.83889884
+18,0.84405965,0.84401745,0.8436563
+19,0.8409746,0.8409399,0.84059715
+20,0.8542512,0.85422283,0.8539368
+21,0.8734287,0.8733914,0.87314016
+22,0.8774618,0.87741566,0.87717056
+23,0.87821764,0.8781659,0.8779116
+24,0.9817083,0.98170334,0.9817008
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en-sci/allenai/scibert_scivocab_uncased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en-sci/allenai/scibert_scivocab_uncased.tsv
new file mode 100644
index 0000000..e050cbb
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en-sci/allenai/scibert_scivocab_uncased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.3247314,0.32477322,0.32055983
+1,0.34701017,0.34706187,0.344079
+2,0.41985375,0.41988486,0.4179418
+3,0.4668236,0.46684003,0.4656058
+4,0.45860615,0.4586492,0.4573681
+5,0.41228917,0.4123522,0.41066456
+6,0.4395095,0.43956795,0.43794444
+7,0.48392966,0.4839865,0.48246792
+8,0.5335945,0.5336341,0.5322364
+9,0.60744065,0.6074917,0.60612226
+10,0.66027635,0.66033924,0.65897125
+11,0.6890247,0.6891011,0.6878515
+12,0.54997945,0.55007255,0.54844016
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-base-v1.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-base-v1.tsv
new file mode 100644
index 0000000..5c2e20f
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-base-v1.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.42279568,0.42285842,0.4198645
+1,0.38239375,0.3824535,0.3795375
+2,0.35127786,0.35131463,0.34854048
+3,0.3402314,0.34027407,0.33761653
+4,0.34001094,0.3400646,0.33745667
+5,0.34310105,0.34314916,0.34054983
+6,0.3478834,0.34792796,0.34530792
+7,0.3523316,0.35237584,0.34973368
+8,0.35546654,0.35550496,0.35283387
+9,0.35682797,0.35686156,0.3541417
+10,0.3572713,0.35730729,0.35451323
+11,0.35916516,0.35920846,0.35632935
+12,0.3620535,0.3621047,0.35911387
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-base-v2.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-base-v2.tsv
new file mode 100644
index 0000000..769e8ce
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-base-v2.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.43284354,0.4329465,0.42670736
+1,0.4085349,0.40857056,0.4041539
+2,0.42302486,0.42304876,0.41986418
+3,0.43835327,0.43837532,0.43578437
+4,0.46398157,0.4640153,0.46179092
+5,0.487097,0.48714137,0.48507443
+6,0.50701046,0.5070602,0.50516284
+7,0.5251579,0.5252073,0.52346826
+8,0.5432063,0.5432638,0.5416856
+9,0.56169736,0.56174135,0.56031275
+10,0.58207834,0.58211654,0.58080167
+11,0.5087994,0.5088567,0.50630754
+12,0.4822224,0.48224902,0.4795803
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-large-v1.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-large-v1.tsv
new file mode 100644
index 0000000..d7356d8
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-large-v1.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.48447838,0.48450485,0.4821886
+1,0.5124409,0.51243365,0.5109167
+2,0.49396634,0.49394318,0.49285302
+3,0.48355308,0.48351732,0.48258644
+4,0.48206407,0.48202685,0.4811013
+5,0.48171225,0.48167655,0.48073986
+6,0.48402956,0.48400134,0.48304388
+7,0.48760605,0.48758495,0.4866279
+8,0.49034056,0.4903293,0.4893756
+9,0.4919946,0.49199188,0.4910255
+10,0.49351045,0.4935107,0.49251547
+11,0.4953505,0.49535286,0.4943231
+12,0.49792922,0.4979353,0.49686712
+13,0.50119936,0.5012099,0.5001017
+14,0.50464475,0.50465906,0.5035164
+15,0.5072171,0.50723296,0.5060587
+16,0.50804037,0.50805837,0.506836
+17,0.50674427,0.5067624,0.5054734
+18,0.5028615,0.5028785,0.50150096
+19,0.4957624,0.49577576,0.49427336
+20,0.48470628,0.48471764,0.48304176
+21,0.46942177,0.4694329,0.46755382
+22,0.45182654,0.45184082,0.44979697
+23,0.4372368,0.43725976,0.43516964
+24,0.43032366,0.4303518,0.42831102
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-large-v2.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-large-v2.tsv
new file mode 100644
index 0000000..bdc49f1
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-large-v2.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.43137488,0.4314412,0.4271023
+1,0.47189355,0.47192886,0.46977237
+2,0.4965904,0.49659666,0.49521467
+3,0.4952368,0.4952206,0.49390256
+4,0.49991024,0.4998804,0.49865857
+5,0.5061125,0.5060827,0.50490576
+6,0.52520007,0.5251885,0.5241151
+7,0.5463337,0.54633546,0.54536676
+8,0.56268036,0.56267744,0.5618048
+9,0.5788636,0.5788671,0.5780607
+10,0.59798187,0.5979915,0.5972454
+11,0.6093569,0.6093737,0.60867995
+12,0.61832786,0.6183305,0.6176837
+13,0.6298888,0.62988657,0.6292773
+14,0.63760334,0.6376027,0.6370052
+15,0.6402277,0.6402217,0.63963217
+16,0.6457506,0.6457368,0.64517874
+17,0.6488497,0.6488231,0.6482803
+18,0.6473536,0.6473276,0.6467711
+19,0.65181977,0.6517948,0.6512418
+20,0.65941834,0.6593918,0.65884435
+21,0.65883756,0.65882397,0.65822756
+22,0.6599824,0.6599794,0.6593097
+23,0.6140344,0.6140205,0.6131047
+24,0.54314095,0.54311645,0.5419062
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xlarge-v1.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xlarge-v1.tsv
new file mode 100644
index 0000000..764576b
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xlarge-v1.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.37603918,0.37612942,0.37049496
+1,0.31145602,0.3114958,0.3073803
+2,0.25227228,0.2522994,0.24795091
+3,0.22015819,0.22017719,0.21600199
+4,0.21572605,0.21576598,0.21187688
+5,0.21390381,0.21393314,0.21024637
+6,0.21366087,0.21368802,0.21022928
+7,0.2149553,0.21497151,0.2116843
+8,0.21902423,0.21904334,0.215865
+9,0.22598784,0.22601976,0.22294162
+10,0.23651579,0.23656204,0.2335378
+11,0.2508,0.25083283,0.24782418
+12,0.26735264,0.26740175,0.2642045
+13,0.2851571,0.2852036,0.28140694
+14,0.30159834,0.3016559,0.2969648
+15,0.31582344,0.31589058,0.31032172
+16,0.33028397,0.3303347,0.32389277
+17,0.34479943,0.34483773,0.33757344
+18,0.3576801,0.35770583,0.34980485
+19,0.36997133,0.36996147,0.3615338
+20,0.3813416,0.38132015,0.37257645
+21,0.3904368,0.39041746,0.38146585
+22,0.4026223,0.40261322,0.39356884
+23,0.41755676,0.41755086,0.4090774
+24,0.40913486,0.40914643,0.40243107
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xlarge-v2.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xlarge-v2.tsv
new file mode 100644
index 0000000..87ac788
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xlarge-v2.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.379094,0.37919718,0.37330297
+1,0.27352002,0.27357075,0.26852632
+2,0.24191533,0.24194317,0.23669504
+3,0.2238661,0.22388461,0.21928357
+4,0.22812894,0.22815062,0.22410771
+5,0.22398795,0.22402358,0.22023973
+6,0.22606015,0.22609216,0.22241953
+7,0.22955626,0.22957715,0.2261971
+8,0.23346025,0.23349406,0.230283
+9,0.23933677,0.23937275,0.23639005
+10,0.24947925,0.2495169,0.24674372
+11,0.25879192,0.25879982,0.25623834
+12,0.26840612,0.2684224,0.2659429
+13,0.28223696,0.2822432,0.27990422
+14,0.3007411,0.30081397,0.298456
+15,0.32065493,0.32073346,0.31820792
+16,0.3489667,0.34909493,0.34612358
+17,0.37499505,0.37513632,0.37153322
+18,0.39365283,0.3937659,0.3894278
+19,0.3985198,0.39858896,0.39375183
+20,0.40377426,0.4038127,0.3987301
+21,0.4162669,0.41631454,0.41127917
+22,0.4385093,0.43853307,0.43359485
+23,0.50211877,0.5021498,0.49820283
+24,0.6450441,0.6450727,0.64176905
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xxlarge-v1.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xxlarge-v1.tsv
new file mode 100644
index 0000000..324494f
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xxlarge-v1.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.44518736,0.44525033,0.44190475
+1,0.26892486,0.26893654,0.26619813
+2,0.25225964,0.25227055,0.2495048
+3,0.23626596,0.23626427,0.23414151
+4,0.24108262,0.24108647,0.23914734
+5,0.2402725,0.24029303,0.23852193
+6,0.24204335,0.24206877,0.24038398
+7,0.24432875,0.24436904,0.2427339
+8,0.24470611,0.24472676,0.24312295
+9,0.24761276,0.24763304,0.2458257
+10,0.26654655,0.26657295,0.26450548
+11,0.30993807,0.309992,0.3073111
+12,0.46560258,0.46563277,0.463768
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xxlarge-v2.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xxlarge-v2.tsv
new file mode 100644
index 0000000..f682d2c
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/albert-xxlarge-v2.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.4414845,0.4415628,0.4378333
+1,0.26729813,0.26729846,0.26443842
+2,0.25006709,0.25006858,0.2470538
+3,0.22912578,0.22914563,0.22677879
+4,0.23676835,0.23678702,0.23474906
+5,0.23712093,0.23712862,0.23520498
+6,0.2357785,0.23579709,0.2339876
+7,0.2375271,0.2375658,0.2357691
+8,0.23694733,0.2369875,0.23519956
+9,0.24043696,0.24048997,0.23847668
+10,0.25991938,0.25997588,0.257621
+11,0.3076668,0.30775174,0.30460533
+12,0.5213576,0.52133,0.5192018
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-cased-finetuned-mrpc.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-cased-finetuned-mrpc.tsv
new file mode 100644
index 0000000..d001fc1
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-cased-finetuned-mrpc.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.32524315,0.32527947,0.32047534
+1,0.3697738,0.3697855,0.36682808
+2,0.3912412,0.39124438,0.38884974
+3,0.38678017,0.3867508,0.3849363
+4,0.4306143,0.43059555,0.4291982
+5,0.47680253,0.47676748,0.4757307
+6,0.4937383,0.4937078,0.49275663
+7,0.47395828,0.47392154,0.47275484
+8,0.48822877,0.48818707,0.48712534
+9,0.55345184,0.55342007,0.5525519
+10,0.6535154,0.6534775,0.6529064
+11,0.76415604,0.7641147,0.76378924
+12,0.72067815,0.7206308,0.72023565
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..59dc19e
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.31651747,0.3166142,0.31180394
+1,0.38737702,0.38744056,0.38455048
+2,0.37912813,0.37916443,0.37648088
+3,0.46451283,0.46451145,0.46312103
+4,0.5066057,0.50659287,0.5054953
+5,0.5804824,0.5804496,0.5797646
+6,0.63067275,0.630636,0.63018715
+7,0.54218787,0.5421653,0.5414328
+8,0.5240471,0.5240057,0.5232123
+9,0.6320527,0.6320019,0.63146895
+10,0.69633687,0.6962761,0.6958725
+11,0.7193143,0.7192363,0.7188216
+12,0.3473233,0.34732684,0.34655094
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-uncased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-uncased.tsv
new file mode 100644
index 0000000..4cd085d
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-base-uncased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.3231512,0.32322776,0.31853873
+1,0.32517454,0.32522815,0.32197207
+2,0.3708038,0.37080705,0.36834884
+3,0.36287847,0.36286885,0.36059204
+4,0.3786389,0.37860426,0.3767926
+5,0.4018232,0.401791,0.40032896
+6,0.38439456,0.38434005,0.38282546
+7,0.37114623,0.3710986,0.36949417
+8,0.37231025,0.37226102,0.37049443
+9,0.35375935,0.3537393,0.35219112
+10,0.38161838,0.3816211,0.37991408
+11,0.4421448,0.4421776,0.44040316
+12,0.40192786,0.40191513,0.40038353
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-large-uncased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-large-uncased.tsv
new file mode 100644
index 0000000..39118de
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/bert-large-uncased.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.33945993,0.33952734,0.3353803
+1,0.46529758,0.46534532,0.4629573
+2,0.5190359,0.51904607,0.5170987
+3,0.55551875,0.5555247,0.5540426
+4,0.47806495,0.4780755,0.47663376
+5,0.39333034,0.3933407,0.391598
+6,0.30678865,0.30683848,0.30446944
+7,0.40164435,0.40167126,0.39997557
+8,0.44429466,0.4443099,0.44277325
+9,0.5114804,0.5114661,0.5102474
+10,0.53322667,0.5332073,0.5323144
+11,0.56793964,0.56791747,0.56725395
+12,0.56360143,0.5635814,0.5629889
+13,0.5358492,0.5358346,0.53522795
+14,0.42079058,0.42078197,0.41975206
+15,0.3509417,0.3509411,0.34957188
+16,0.4534342,0.45341223,0.45231807
+17,0.46370843,0.46370083,0.46265444
+18,0.4278576,0.42786714,0.42646673
+19,0.38974905,0.3897353,0.3877319
+20,0.3966205,0.3966191,0.3942883
+21,0.4981153,0.49813268,0.4955151
+22,0.5868029,0.58685154,0.584482
+23,0.7136535,0.7137033,0.7118858
+24,0.5152624,0.5152391,0.5146088
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..d28c58d
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-multilingual-cased.tsv
@@ -0,0 +1,8 @@
+LAYER,P,R,F
+0,0.27245584,0.27247205,0.26611173
+1,0.45394143,0.453942,0.45178676
+2,0.5374658,0.5374726,0.53619426
+3,0.61241305,0.61244136,0.6116679
+4,0.63282156,0.632836,0.63219804
+5,0.8164157,0.81645757,0.81623197
+6,0.4648941,0.4649093,0.4638737
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-uncased-distilled-squad.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-uncased-distilled-squad.tsv
new file mode 100644
index 0000000..66e8168
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-uncased-distilled-squad.tsv
@@ -0,0 +1,8 @@
+LAYER,P,R,F
+0,0.28725642,0.2872663,0.28207442
+1,0.37234208,0.37233955,0.37046063
+2,0.403689,0.4037149,0.4020736
+3,0.5399291,0.53997463,0.53930676
+4,0.6591859,0.65919137,0.65882134
+5,0.65313077,0.6531313,0.65279835
+6,0.74920315,0.7491901,0.7487158
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-uncased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-uncased.tsv
new file mode 100644
index 0000000..b6f66db
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilbert-base-uncased.tsv
@@ -0,0 +1,8 @@
+LAYER,P,R,F
+0,0.2884445,0.2884457,0.28333962
+1,0.39316687,0.3931663,0.39123002
+2,0.42905498,0.4290923,0.42735597
+3,0.5222444,0.52227175,0.52129734
+4,0.6019937,0.6019904,0.6014007
+5,0.6666034,0.66660464,0.66620487
+6,0.51401854,0.51404256,0.5131456
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilroberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilroberta-base.tsv
new file mode 100644
index 0000000..4213b57
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/distilroberta-base.tsv
@@ -0,0 +1,8 @@
+LAYER,P,R,F
+0,0.42608285,0.4272089,0.42462298
+1,0.7367886,0.7370362,0.736573
+2,0.79922664,0.799593,0.7991632
+3,0.8329021,0.8333321,0.83291864
+4,0.8442,0.84462386,0.84425896
+5,0.84732,0.84759504,0.8473319
+6,0.89334005,0.8935088,0.8933471
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-base-mnli.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-base-mnli.tsv
new file mode 100644
index 0000000..3e1d6e3
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-base-mnli.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.24991891,0.25001466,0.24200068
+1,0.29392833,0.29395026,0.28912014
+2,0.36113718,0.36123025,0.3575888
+3,0.41445282,0.41459718,0.41148487
+4,0.4386812,0.43877414,0.4361292
+5,0.45521808,0.4552972,0.45306677
+6,0.4797258,0.4797979,0.47779492
+7,0.48204568,0.48210686,0.480253
+8,0.50440174,0.5044583,0.5025705
+9,0.53045946,0.5304829,0.52866036
+10,0.53781724,0.5377958,0.53583634
+11,0.5402823,0.5402229,0.53816986
+12,0.57382584,0.57370174,0.57160807
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-base.tsv
new file mode 100644
index 0000000..70ab5c4
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.2517391,0.2518335,0.24388544
+1,0.36171424,0.36175922,0.35748047
+2,0.4423475,0.442458,0.44021225
+3,0.50618786,0.5063445,0.5045984
+4,0.5250692,0.525192,0.5236118
+5,0.55415064,0.5542385,0.5528668
+6,0.5684745,0.5685567,0.5672051
+7,0.5721026,0.5721756,0.5708452
+8,0.60626274,0.6063245,0.6049902
+9,0.6282066,0.62825406,0.6269483
+10,0.6643668,0.66438687,0.66297233
+11,0.65951246,0.6595324,0.6584084
+12,0.70749044,0.70750576,0.7064498
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-large-mnli.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-large-mnli.tsv
new file mode 100644
index 0000000..d1df4db
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-large-mnli.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.24490805,0.24501415,0.23715581
+1,0.29400384,0.2940948,0.28872925
+2,0.30570883,0.3057956,0.30113816
+3,0.2957167,0.29578057,0.2915654
+4,0.2884288,0.28847086,0.2843156
+5,0.30902475,0.3090854,0.3057
+6,0.3267471,0.32683545,0.32377866
+7,0.32664096,0.32672828,0.3239887
+8,0.33238792,0.3324875,0.32986364
+9,0.35454232,0.3546663,0.35220724
+10,0.37474304,0.37486178,0.3723941
+11,0.38948673,0.38959926,0.38713577
+12,0.40499082,0.4051212,0.4027381
+13,0.40869987,0.40882573,0.40650842
+14,0.41533,0.41543606,0.41318002
+15,0.42891178,0.4289993,0.42687863
+16,0.43574512,0.43581918,0.43376175
+17,0.44409868,0.44415665,0.4421444
+18,0.45358238,0.45362508,0.45173016
+19,0.4614291,0.46146432,0.45968512
+20,0.4612395,0.46127385,0.45946208
+21,0.47897574,0.47901914,0.4772938
+22,0.49526486,0.49531218,0.49363694
+23,0.48103315,0.4810869,0.4794539
+24,0.5131625,0.51319313,0.51193404
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-large.tsv
new file mode 100644
index 0000000..729ce3b
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.24543904,0.24554524,0.23771858
+1,0.32400694,0.3240889,0.31910792
+2,0.35317397,0.35325843,0.34890434
+3,0.34494445,0.34502625,0.34082443
+4,0.34670925,0.34677663,0.3425456
+5,0.36661133,0.3667012,0.36314553
+6,0.38046056,0.38056228,0.37710926
+7,0.38267714,0.3827855,0.37945607
+8,0.3922755,0.3924098,0.38914645
+9,0.41027483,0.41045374,0.4072962
+10,0.43634042,0.4365225,0.43331632
+11,0.4587171,0.45889324,0.45575032
+12,0.47399956,0.47417867,0.47109136
+13,0.48888516,0.48905894,0.4862424
+14,0.4966528,0.49680543,0.49413764
+15,0.5117451,0.51189446,0.50938886
+16,0.5341927,0.53433174,0.53205305
+17,0.55080074,0.5509329,0.5488182
+18,0.5715738,0.571711,0.5698007
+19,0.58424556,0.5843769,0.5826535
+20,0.59171396,0.5918352,0.5901539
+21,0.60953987,0.60965025,0.60810995
+22,0.620468,0.6205763,0.6191674
+23,0.57499653,0.575068,0.573669
+24,0.5698042,0.5698687,0.5686779
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-xlarge-mnli.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-xlarge-mnli.tsv
new file mode 100644
index 0000000..f52abba
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-xlarge-mnli.tsv
@@ -0,0 +1,50 @@
+LAYER,P,R,F
+0,0.2493579,0.24956034,0.24190253
+1,0.3013932,0.30158633,0.2964718
+2,0.317363,0.31756195,0.31315005
+3,0.3117229,0.3118849,0.30764845
+4,0.3074649,0.3076071,0.30345994
+5,0.3140126,0.31414607,0.31065413
+6,0.32410583,0.3242222,0.32100978
+7,0.32173893,0.3218549,0.3187024
+8,0.32544047,0.32556787,0.3224075
+9,0.344368,0.3445152,0.34142512
+10,0.3655007,0.36567506,0.3623955
+11,0.38081372,0.38100296,0.37764993
+12,0.38874978,0.38893828,0.38563213
+13,0.38537422,0.38555342,0.38225004
+14,0.39434314,0.39452493,0.3914539
+15,0.40501443,0.40519157,0.40221062
+16,0.41383415,0.414013,0.41118416
+17,0.43424043,0.4344097,0.4318083
+18,0.4456768,0.44583458,0.4435271
+19,0.4616012,0.46173084,0.45967415
+20,0.46671286,0.46683112,0.4647799
+21,0.49091575,0.49103191,0.4892095
+22,0.5345532,0.53466916,0.53317034
+23,0.52739257,0.5275056,0.52598923
+24,0.4812145,0.48132038,0.47937903
+25,0.47786388,0.47797868,0.4758911
+26,0.4767261,0.476854,0.4747504
+27,0.45120457,0.45133275,0.44898003
+28,0.43487516,0.43499732,0.43227148
+29,0.4418857,0.44200745,0.439456
+30,0.45188263,0.4520089,0.44948938
+31,0.44309646,0.443208,0.44067165
+32,0.44934252,0.44945362,0.44696212
+33,0.47058168,0.470693,0.46848273
+34,0.48300824,0.4831242,0.480923
+35,0.49022266,0.49034286,0.48815507
+36,0.49732342,0.49744752,0.49531126
+37,0.49466616,0.494789,0.49265566
+38,0.4995418,0.4996657,0.49754837
+39,0.5116362,0.5117548,0.50974
+40,0.5169066,0.5170288,0.5150192
+41,0.53604615,0.5361662,0.534255
+42,0.5560917,0.5562141,0.55443686
+43,0.5699871,0.5701181,0.56848437
+44,0.5755175,0.5756376,0.5740404
+45,0.5944691,0.59459156,0.59314805
+46,0.61108196,0.6111957,0.60986704
+47,0.5935245,0.59361994,0.5924131
+48,0.6343621,0.6344516,0.63365686
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-xlarge.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-xlarge.tsv
new file mode 100644
index 0000000..053847a
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/microsoft/deberta-xlarge.tsv
@@ -0,0 +1,50 @@
+LAYER,P,R,F
+0,0.24957183,0.24977477,0.24214219
+1,0.30639872,0.3065908,0.30158743
+2,0.3331396,0.33333635,0.3290473
+3,0.32949522,0.32968733,0.3253696
+4,0.31661382,0.316769,0.31258678
+5,0.32896715,0.32910535,0.32568523
+6,0.33770096,0.33782086,0.3345446
+7,0.3326147,0.33272395,0.32957816
+8,0.3367821,0.33687654,0.33380622
+9,0.3546219,0.3547327,0.35179362
+10,0.38037142,0.38049275,0.37740862
+11,0.40171945,0.40185076,0.39869636
+12,0.4163913,0.41652367,0.4133557
+13,0.43222922,0.43235204,0.42938623
+14,0.4416328,0.44175574,0.43894717
+15,0.45403007,0.45415205,0.45151842
+16,0.47758847,0.47770745,0.47528616
+17,0.49413732,0.49424222,0.49203014
+18,0.5177917,0.5178813,0.51596016
+19,0.54055035,0.54061955,0.53895485
+20,0.5554671,0.55553156,0.553943
+21,0.5871218,0.5871978,0.585844
+22,0.6379372,0.6380021,0.6369301
+23,0.62672323,0.6267863,0.6256873
+24,0.5497838,0.5498381,0.5483379
+25,0.543943,0.5440018,0.54246646
+26,0.55943567,0.55949783,0.5578509
+27,0.5522361,0.5523346,0.55041844
+28,0.5384432,0.53856134,0.53645724
+29,0.541011,0.5411351,0.53916043
+30,0.53560615,0.5357274,0.5337449
+31,0.5211553,0.5212751,0.51924247
+32,0.52553123,0.52564174,0.5235451
+33,0.53930295,0.5394204,0.5372786
+34,0.5591909,0.55931133,0.5570341
+35,0.5712996,0.5714208,0.5691194
+36,0.57959074,0.57972014,0.5774709
+37,0.58818644,0.5883232,0.58616716
+38,0.5925551,0.5926871,0.5905953
+39,0.6026835,0.60282564,0.6008043
+40,0.6189861,0.6191251,0.6172279
+41,0.62964463,0.62977934,0.62799156
+42,0.6451681,0.64530563,0.6436409
+43,0.6539978,0.65413773,0.65264153
+44,0.65711796,0.65726453,0.65580934
+45,0.66835105,0.66850114,0.6671609
+46,0.67004806,0.6701847,0.6689483
+47,0.611536,0.61166185,0.6104823
+48,0.6487418,0.64883584,0.6481099
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-base.tsv
new file mode 100644
index 0000000..801f4a1
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.4043224,0.40432808,0.40218553
+1,0.6423126,0.6422804,0.6414617
+2,0.768273,0.7682535,0.76791227
+3,0.7803166,0.78030443,0.7800415
+4,0.7839782,0.78397924,0.7836174
+5,0.7959116,0.7959033,0.79557085
+6,0.80936664,0.80936354,0.80908644
+7,0.81720984,0.81721514,0.816965
+8,0.80465585,0.80464727,0.8043641
+9,0.7911581,0.79115206,0.7908595
+10,0.8146725,0.8146619,0.814463
+11,0.8243949,0.8244051,0.82420003
+12,0.8557132,0.85571885,0.8555707
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-large-mnli.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-large-mnli.tsv
new file mode 100644
index 0000000..3c6d5ae
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-large-mnli.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.36816803,0.36820343,0.3650997
+1,0.6424572,0.64243424,0.6408211
+2,0.62199366,0.6219771,0.62105906
+3,0.65479594,0.65479946,0.6542115
+4,0.66220766,0.66219413,0.66147035
+5,0.6841878,0.6841976,0.6835943
+6,0.6993157,0.6993184,0.698729
+7,0.7363659,0.7363538,0.73597246
+8,0.76699406,0.76697797,0.7666572
+9,0.76385623,0.76387703,0.76359564
+10,0.7751121,0.7751162,0.7748585
+11,0.7607176,0.7607192,0.7604293
+12,0.75846714,0.75850517,0.7582122
+13,0.7660639,0.766093,0.7658386
+14,0.76723933,0.7672636,0.76692307
+15,0.76183504,0.7618548,0.7615043
+16,0.77503896,0.7750635,0.77476084
+17,0.7572284,0.75724494,0.7568846
+18,0.72981,0.72983533,0.7294623
+19,0.6901594,0.69018,0.6896288
+20,0.6456024,0.6456534,0.6447707
+21,0.6733705,0.6734108,0.672755
+22,0.7964235,0.79642963,0.7961781
+23,0.83942956,0.839427,0.8393037
+24,0.87867236,0.8787309,0.8781039
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-large.tsv
new file mode 100644
index 0000000..86bc2e8
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.3712891,0.37132213,0.36826715
+1,0.67176163,0.6717439,0.6703483
+2,0.70031923,0.7003052,0.69969934
+3,0.7080897,0.7081011,0.707698
+4,0.6976306,0.69762677,0.69710517
+5,0.7187199,0.71873325,0.71828526
+6,0.74678195,0.74678224,0.74642223
+7,0.7772428,0.7772184,0.77691925
+8,0.8021733,0.8021747,0.8019093
+9,0.8067641,0.80678225,0.8065291
+10,0.8366976,0.8367098,0.8364913
+11,0.8163513,0.816369,0.8161064
+12,0.8175406,0.8175611,0.81728977
+13,0.82106245,0.8210674,0.82080233
+14,0.81487834,0.8148861,0.8145652
+15,0.8243552,0.8243522,0.8240494
+16,0.8341641,0.8341684,0.833912
+17,0.83150584,0.8314941,0.83122575
+18,0.8314624,0.83146274,0.8311686
+19,0.82761073,0.8276117,0.8273196
+20,0.799873,0.79988,0.79956234
+21,0.8082163,0.80819315,0.8079286
+22,0.83196104,0.83195347,0.83174026
+23,0.8408042,0.8408027,0.8405716
+24,0.96022236,0.96021587,0.960168
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..f5e7601
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.2929519,0.29297927,0.28788087
+1,0.32307193,0.32305866,0.31955993
+2,0.33333376,0.33329934,0.3307059
+3,0.34018472,0.34019333,0.3369147
+4,0.35193846,0.35196185,0.34877294
+5,0.41633913,0.41635182,0.41389906
+6,0.52230054,0.5223191,0.5208747
+7,0.57117224,0.5711975,0.57016635
+8,0.55626523,0.55628437,0.55513597
+9,0.5035621,0.5035617,0.5023768
+10,0.43660313,0.4366135,0.43496045
+11,0.37350416,0.37354943,0.3712711
+12,0.3694557,0.36947483,0.36708415
+13,0.38296118,0.38296735,0.38057274
+14,0.3801941,0.38019708,0.37771493
+15,0.39073846,0.39073724,0.38804337
+16,0.27941948,0.2793937,0.27774334
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-mlm-en-2048.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-mlm-en-2048.tsv
new file mode 100644
index 0000000..d09397d
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-mlm-en-2048.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.48034036,0.48027167,0.4755281
+1,0.68549955,0.68547165,0.68418026
+2,0.7502881,0.7502652,0.7497456
+3,0.7662417,0.7662214,0.7659151
+4,0.7910623,0.7910466,0.79085386
+5,0.8090659,0.8090618,0.80895317
+6,0.82148397,0.8214852,0.821408
+7,0.8091143,0.8091184,0.8090199
+8,0.77966934,0.7796406,0.77937865
+9,0.75278246,0.7527972,0.7524639
+10,0.72071564,0.7207407,0.7202978
+11,0.7175687,0.7176211,0.7170889
+12,0.22130837,0.22130068,0.21938775
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-roberta-base.tsv
new file mode 100644
index 0000000..a39b6ba
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.31767526,0.31771243,0.31208947
+1,0.45930108,0.45930612,0.4573549
+2,0.6739723,0.6739605,0.67332643
+3,0.7428563,0.7428622,0.74252146
+4,0.7270618,0.7270706,0.7267292
+5,0.7459538,0.7459533,0.74563044
+6,0.7416182,0.74162334,0.74136156
+7,0.7766629,0.7766664,0.7764565
+8,0.7827196,0.78271383,0.78251594
+9,0.81658614,0.8165717,0.81639785
+10,0.83839214,0.83837646,0.8382293
+11,0.8711623,0.8711581,0.87106025
+12,0.9843661,0.98436636,0.9843645
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-roberta-large.tsv
new file mode 100644
index 0000000..66ab214
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.38918123,0.38920417,0.3852401
+1,0.66835684,0.6683084,0.6677018
+2,0.7323929,0.7323684,0.7321559
+3,0.7391762,0.7391537,0.73889536
+4,0.7922834,0.79227173,0.7921484
+5,0.79589903,0.795871,0.7957138
+6,0.8166894,0.816673,0.8165898
+7,0.8223533,0.8223572,0.82228154
+8,0.834576,0.8345772,0.8344947
+9,0.8377803,0.83777326,0.8376894
+10,0.8380223,0.8380033,0.83791
+11,0.8415803,0.84157884,0.8414282
+12,0.84659237,0.8466055,0.84632146
+13,0.8437288,0.84372836,0.84340864
+14,0.846515,0.84650415,0.8461781
+15,0.8514585,0.8514379,0.85112184
+16,0.84461045,0.8446081,0.8442589
+17,0.85291016,0.8529066,0.8525485
+18,0.8582745,0.8582787,0.85787606
+19,0.85327464,0.8532746,0.85287833
+20,0.86624545,0.86624,0.86592185
+21,0.8854349,0.88543147,0.88515806
+22,0.8891757,0.8891605,0.88892245
+23,0.88805044,0.88803035,0.88777393
+24,0.9840399,0.98404247,0.984038
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlnet-base-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlnet-base-cased.tsv
new file mode 100644
index 0000000..0e70f56
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlnet-base-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.29910204,0.29919305,0.29052314
+1,0.29633516,0.29640594,0.2915415
+2,0.28782755,0.28787795,0.28492415
+3,0.29966587,0.2996727,0.29745364
+4,0.32897076,0.32897395,0.3263186
+5,0.34247187,0.3424195,0.34024557
+6,0.61728173,0.61718243,0.6160013
+7,0.6704566,0.6703779,0.66936857
+8,0.8596307,0.8595696,0.859391
+9,0.8611796,0.8611522,0.8610164
+10,0.89382625,0.8938215,0.8937337
+11,0.97762144,0.9776183,0.97761476
+12,0.93146294,0.93134,0.93100053
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlnet-large-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlnet-large-cased.tsv
new file mode 100644
index 0000000..7cb41d3
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/en/xlnet-large-cased.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.41637358,0.41643414,0.41258112
+1,0.32545134,0.32545993,0.3204785
+2,0.29599807,0.29601985,0.29176536
+3,0.21799843,0.2180424,0.21441601
+4,0.2619272,0.261958,0.25913864
+5,0.30362618,0.30360785,0.30147976
+6,0.31371272,0.3136575,0.31170228
+7,0.3085695,0.30850938,0.30676135
+8,0.3251663,0.32509723,0.32402074
+9,0.34611195,0.34610417,0.3449464
+10,0.33172518,0.3316963,0.32996267
+11,0.32673666,0.32671896,0.3252777
+12,0.3015574,0.30154356,0.29979268
+13,0.33127543,0.33126998,0.33017284
+14,0.33191463,0.33192313,0.3307891
+15,0.3753324,0.3753503,0.374231
+16,0.37750244,0.37751338,0.37648135
+17,0.3678608,0.3678761,0.36674905
+18,0.305072,0.3050984,0.3042137
+19,0.42524177,0.4253285,0.42387673
+20,0.59149736,0.59153783,0.5901478
+21,0.6070587,0.607099,0.6057612
+22,0.80884385,0.80882186,0.8085461
+23,0.9555436,0.9555404,0.95551467
+24,0.96873486,0.9687297,0.9685215
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..29f24be
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.32142487,0.32125905,0.31729683
+1,0.39584324,0.395717,0.39326182
+2,0.3895418,0.38945207,0.38716727
+3,0.47731403,0.47727716,0.47604948
+4,0.5232235,0.5231792,0.52232313
+5,0.5989939,0.59892774,0.59843445
+6,0.6496523,0.6496062,0.649302
+7,0.5524209,0.5523591,0.55184853
+8,0.52988493,0.5298184,0.52922106
+9,0.63474494,0.6346978,0.6342529
+10,0.70397323,0.7039352,0.703585
+11,0.7417414,0.74173224,0.74136305
+12,0.39257455,0.39254928,0.39194846
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..137c8b4
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.29741356,0.29718205,0.29312935
+1,0.33412832,0.33395684,0.3312904
+2,0.35136887,0.35124466,0.3492788
+3,0.36096326,0.3608026,0.35864976
+4,0.36783966,0.36770988,0.36555293
+5,0.4318944,0.4317502,0.4300937
+6,0.54022354,0.54010266,0.5391772
+7,0.5873484,0.5872481,0.58660454
+8,0.56757474,0.5674764,0.566725
+9,0.50883144,0.5087277,0.5079181
+10,0.43789023,0.43777642,0.4366415
+11,0.37517586,0.37504935,0.3734603
+12,0.37935427,0.37921786,0.37755096
+13,0.39596176,0.39583504,0.39407465
+14,0.40488854,0.4047234,0.40284988
+15,0.41720447,0.41700417,0.41506332
+16,0.321014,0.32089671,0.31943843
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-roberta-base.tsv
new file mode 100644
index 0000000..8ed0d15
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.3246688,0.32442638,0.31979698
+1,0.4669744,0.46682546,0.46536762
+2,0.682952,0.68287796,0.6824639
+3,0.75232756,0.7522827,0.7520721
+4,0.73857796,0.73851913,0.73830944
+5,0.7549688,0.7549195,0.75471216
+6,0.7463499,0.74629426,0.7461334
+7,0.7811989,0.78114533,0.78101724
+8,0.78642476,0.7863655,0.7862384
+9,0.8234212,0.823385,0.8232284
+10,0.8446837,0.8446493,0.8445056
+11,0.87540615,0.8753815,0.8752877
+12,0.9844347,0.9844323,0.9844318
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-roberta-large.tsv
new file mode 100644
index 0000000..9318273
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/es/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.39393866,0.39371517,0.3905179
+1,0.6807716,0.6807661,0.680328
+2,0.7418765,0.74186456,0.74167407
+3,0.74935234,0.7493611,0.7491301
+4,0.79821396,0.79822713,0.7980995
+5,0.7988987,0.7989139,0.7987521
+6,0.8229017,0.8228938,0.8228024
+7,0.8280001,0.8279914,0.8279237
+8,0.8397697,0.8397626,0.8396876
+9,0.8410181,0.8410066,0.84094054
+10,0.8409921,0.8409992,0.8409067
+11,0.8431543,0.8431424,0.84302104
+12,0.8459719,0.84595364,0.84571356
+13,0.8396326,0.839628,0.83931595
+14,0.84028465,0.84028375,0.83993286
+15,0.8447372,0.84472674,0.8444034
+16,0.8363781,0.8363222,0.8360513
+17,0.84482056,0.8447689,0.8445116
+18,0.85074264,0.85068643,0.8504014
+19,0.84944814,0.8493866,0.8491228
+20,0.86171687,0.86166567,0.8614421
+21,0.87797874,0.8779322,0.8777276
+22,0.87815136,0.87810516,0.87791014
+23,0.87712365,0.87708676,0.8768752
+24,0.9812538,0.9812441,0.9812441
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..b9e6443
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.27440318,0.27450797,0.2698814
+1,0.36711293,0.3672066,0.3646441
+2,0.36751607,0.36758184,0.36546195
+3,0.44396114,0.4440282,0.44275236
+4,0.49434176,0.49438694,0.49351478
+5,0.5781191,0.57814497,0.57762396
+6,0.6325188,0.63253754,0.63219965
+7,0.5371272,0.5371553,0.53662723
+8,0.51365125,0.5136854,0.5130298
+9,0.61113626,0.6111767,0.6106605
+10,0.68986833,0.6898959,0.6895253
+11,0.72481495,0.7248488,0.72443366
+12,0.41427994,0.414279,0.41360843
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..48b8cc5
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.22709163,0.2271243,0.22315507
+1,0.28697282,0.28699732,0.284561
+2,0.31591207,0.31594896,0.3142282
+3,0.3272068,0.3271873,0.3251662
+4,0.33797315,0.33791435,0.3357934
+5,0.39506105,0.39499047,0.39325175
+6,0.49566302,0.4955908,0.49454838
+7,0.55213124,0.5520629,0.55135715
+8,0.5356107,0.53553146,0.53473157
+9,0.48094663,0.4808736,0.4799986
+10,0.41156343,0.41149083,0.410293
+11,0.36135536,0.36126482,0.3597544
+12,0.3840661,0.38395354,0.3824061
+13,0.3990762,0.39895925,0.39723954
+14,0.40530387,0.40517297,0.40322375
+15,0.41519368,0.41508782,0.4130928
+16,0.3316055,0.33152688,0.32986996
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-roberta-base.tsv
new file mode 100644
index 0000000..d5874c5
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.27967498,0.27960515,0.27565888
+1,0.4284523,0.4284252,0.42707014
+2,0.6428217,0.64281446,0.6423762
+3,0.7207139,0.72071636,0.72050244
+4,0.7149051,0.7149116,0.7146955
+5,0.7364546,0.7364631,0.73624396
+6,0.72894406,0.7289576,0.7287705
+7,0.76335233,0.76335174,0.7631944
+8,0.7660467,0.7660525,0.76588887
+9,0.80481553,0.8047997,0.8046123
+10,0.824247,0.8242213,0.8240452
+11,0.8616431,0.8616192,0.8615053
+12,0.9794287,0.9794275,0.9794225
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-roberta-large.tsv
new file mode 100644
index 0000000..722e83c
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/et/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.34819788,0.3482508,0.34528664
+1,0.66133285,0.6613388,0.66087705
+2,0.7167448,0.71674955,0.71656334
+3,0.7235102,0.7235182,0.7232822
+4,0.7763208,0.77634054,0.7762066
+5,0.7787467,0.77879304,0.77860975
+6,0.8065161,0.8065541,0.8064417
+7,0.8130386,0.8130481,0.81297535
+8,0.8232221,0.8232352,0.82315505
+9,0.8259885,0.8259966,0.82592285
+10,0.8261345,0.8261378,0.82604396
+11,0.8302732,0.83030087,0.8301369
+12,0.832208,0.8322509,0.83195096
+13,0.8284099,0.82843494,0.8280566
+14,0.8308385,0.830874,0.83043313
+15,0.83593214,0.83598274,0.8355737
+16,0.8225831,0.8226431,0.8222514
+17,0.83149856,0.83155996,0.83119583
+18,0.8360739,0.83612305,0.83573186
+19,0.8338515,0.8339162,0.8335273
+20,0.85045713,0.8505154,0.8501959
+21,0.866938,0.8670008,0.86669517
+22,0.86754334,0.86759776,0.86728114
+23,0.86495036,0.86501676,0.86465526
+24,0.97565717,0.97566575,0.97565126
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..db6baa6
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.27203143,0.2718125,0.2674527
+1,0.3669008,0.36672932,0.36434102
+2,0.36613643,0.36596906,0.36401412
+3,0.4369806,0.43683773,0.43563512
+4,0.4888657,0.48875853,0.48795715
+5,0.5726952,0.5726454,0.572158
+6,0.62713367,0.62711185,0.6267881
+7,0.5336007,0.53355575,0.53305566
+8,0.51138526,0.51132864,0.51072747
+9,0.6112424,0.6111909,0.6107369
+10,0.6913106,0.6912809,0.6909531
+11,0.7289148,0.7289066,0.7285409
+12,0.40449622,0.4044448,0.4038471
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..cf5544e
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.23438127,0.23428068,0.23040625
+1,0.2891474,0.2890501,0.28663626
+2,0.31794775,0.3178401,0.3161117
+3,0.3314175,0.3313274,0.32932603
+4,0.342742,0.34266472,0.34063184
+5,0.40328184,0.40322024,0.40158102
+6,0.5053177,0.5052804,0.50429296
+7,0.55995744,0.5599387,0.55925107
+8,0.5432386,0.5432242,0.5424414
+9,0.48718062,0.4871476,0.48624423
+10,0.41743338,0.41739362,0.4161943
+11,0.36450592,0.36447832,0.3629536
+12,0.38068864,0.38065174,0.37914556
+13,0.40042648,0.40037584,0.39876702
+14,0.40577888,0.405764,0.40404075
+15,0.4122403,0.412242,0.4104758
+16,0.32324278,0.3231793,0.3216624
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-roberta-base.tsv
new file mode 100644
index 0000000..1386329
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.28559503,0.2854355,0.28159738
+1,0.4194148,0.41935146,0.417895
+2,0.6369165,0.63687444,0.63647
+3,0.7129336,0.71288896,0.71269244
+4,0.705694,0.7056649,0.7054607
+5,0.7278231,0.72779924,0.7275826
+6,0.7264064,0.72638345,0.72620934
+7,0.76126385,0.7612437,0.7610952
+8,0.76516724,0.76513124,0.76499206
+9,0.8022079,0.8021703,0.8020057
+10,0.8249256,0.8248923,0.824736
+11,0.86274844,0.8627164,0.862623
+12,0.98083913,0.9808375,0.9808362
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-roberta-large.tsv
new file mode 100644
index 0000000..2c05f81
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fi/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.36092442,0.36074498,0.3580085
+1,0.6607407,0.66068447,0.66028255
+2,0.7176594,0.7175985,0.71745765
+3,0.7285679,0.72852075,0.72832286
+4,0.78595924,0.7859272,0.7858384
+5,0.7865282,0.7864904,0.7863824
+6,0.8087826,0.80874693,0.80867803
+7,0.81360877,0.8135691,0.8135222
+8,0.8234922,0.82345206,0.82339203
+9,0.82659143,0.8265619,0.8265032
+10,0.82844096,0.8284168,0.8283329
+11,0.833159,0.83313984,0.83300036
+12,0.83688194,0.8368595,0.8365941
+13,0.83482826,0.8348066,0.83445275
+14,0.8371448,0.8371316,0.8367094
+15,0.8411402,0.8411226,0.8407152
+16,0.8285362,0.828508,0.8281295
+17,0.8365054,0.8364763,0.8361323
+18,0.84074885,0.84073424,0.8403675
+19,0.8374997,0.8374846,0.83713585
+20,0.85316974,0.8531484,0.85286206
+21,0.8724993,0.87247324,0.87221074
+22,0.87472016,0.87468535,0.87441224
+23,0.8715076,0.8714718,0.8711863
+24,0.97793525,0.97793806,0.9779296
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..ef71e3e
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.3169629,0.31686378,0.31290925
+1,0.3973607,0.39730093,0.3948981
+2,0.3917096,0.39167702,0.38944873
+3,0.471558,0.4715446,0.4703017
+4,0.51729333,0.5172892,0.5164051
+5,0.5921461,0.59214556,0.59160614
+6,0.64118487,0.6411703,0.64082944
+7,0.54434645,0.5443365,0.5437896
+8,0.52369165,0.5237088,0.5230594
+9,0.62573117,0.62573653,0.6252499
+10,0.69342446,0.6934141,0.6930288
+11,0.72644377,0.72643,0.7260432
+12,0.37622055,0.3762342,0.37555423
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..b79195a
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.29251722,0.29243135,0.28824046
+1,0.3270653,0.32702345,0.32414123
+2,0.34298986,0.34297037,0.34081614
+3,0.35257423,0.35255608,0.3502542
+4,0.36079553,0.36077785,0.35852012
+5,0.4250942,0.425059,0.42333168
+6,0.5288226,0.5288088,0.5278067
+7,0.57518166,0.5751787,0.5744667
+8,0.5556386,0.5556409,0.554816
+9,0.50031036,0.50027037,0.49935982
+10,0.431764,0.43173033,0.43051916
+11,0.3727856,0.37272698,0.37108865
+12,0.3785679,0.37849474,0.37677515
+13,0.3937992,0.39371702,0.39193982
+14,0.3963082,0.39625022,0.3943681
+15,0.40861925,0.408575,0.40663382
+16,0.3136189,0.3135873,0.3120113
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-roberta-base.tsv
new file mode 100644
index 0000000..dedf67a
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.32206962,0.32194686,0.31745258
+1,0.4602701,0.46020856,0.45864564
+2,0.6719199,0.6718807,0.67143625
+3,0.74045163,0.74043214,0.74017817
+4,0.72625005,0.7262382,0.72596425
+5,0.74321467,0.7431929,0.7429401
+6,0.73884493,0.7388181,0.7386279
+7,0.77495724,0.77493846,0.77478385
+8,0.78073204,0.7807058,0.7805547
+9,0.8198895,0.81987315,0.81971085
+10,0.84097534,0.84096044,0.8408151
+11,0.8744024,0.87438446,0.8742934
+12,0.98294896,0.98294634,0.9829455
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-roberta-large.tsv
new file mode 100644
index 0000000..43d2c1d
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/fr/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.38867676,0.38860926,0.38541874
+1,0.6748494,0.6748672,0.6744121
+2,0.7362502,0.7362605,0.73607147
+3,0.74364084,0.7436084,0.7434166
+4,0.79383326,0.79382324,0.7937171
+5,0.79398805,0.7939688,0.79384273
+6,0.8189012,0.818897,0.81879824
+7,0.8252554,0.8252701,0.82518923
+8,0.8372976,0.8373069,0.837221
+9,0.83934426,0.8393484,0.8392743
+10,0.8396223,0.8396263,0.8395396
+11,0.8413963,0.8414122,0.84128094
+12,0.8425236,0.84252983,0.8422956
+13,0.836232,0.8362653,0.8359306
+14,0.8365411,0.8365994,0.83620155
+15,0.84075475,0.84081256,0.84043586
+16,0.8336484,0.83366156,0.83334255
+17,0.8420401,0.84204596,0.84175515
+18,0.84736043,0.847369,0.8470594
+19,0.8457147,0.84572095,0.84543604
+20,0.8610545,0.86105347,0.8608192
+21,0.8796009,0.87960935,0.87939256
+22,0.87826204,0.8782994,0.8780729
+23,0.8757684,0.8757959,0.8755639
+24,0.9783308,0.9783405,0.9783287
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..a7a016d
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.2844219,0.28444692,0.28044853
+1,0.37012622,0.37011567,0.3678241
+2,0.37155172,0.3715547,0.36946896
+3,0.4603244,0.46031958,0.45919034
+4,0.50872415,0.5087325,0.50791526
+5,0.5868436,0.5868716,0.5863534
+6,0.6397911,0.63983333,0.63949335
+7,0.5409238,0.54094136,0.54040617
+8,0.5172371,0.51725966,0.5166258
+9,0.62051994,0.6205607,0.62006223
+10,0.6916372,0.6916744,0.69127834
+11,0.7267179,0.72675204,0.7263427
+12,0.38121554,0.38125327,0.38057736
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..8d185f4
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.26145306,0.26147375,0.25741524
+1,0.3025566,0.30248818,0.2998916
+2,0.32179558,0.32175842,0.31985885
+3,0.33394024,0.33392504,0.33166507
+4,0.34498307,0.3450014,0.34262
+5,0.41246715,0.4124828,0.41061035
+6,0.51987356,0.51987606,0.51879877
+7,0.56968486,0.5696755,0.56891495
+8,0.5526059,0.55259466,0.55172443
+9,0.49650237,0.49646214,0.49543175
+10,0.42862728,0.42857316,0.42718053
+11,0.36833626,0.36827257,0.3663598
+12,0.37506276,0.3750053,0.37302044
+13,0.38766044,0.38759127,0.3855202
+14,0.39820743,0.39815167,0.39614087
+15,0.4081781,0.40812454,0.4060677
+16,0.31196377,0.31188112,0.31034505
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-roberta-base.tsv
new file mode 100644
index 0000000..a680823
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.29116568,0.29114744,0.2868295
+1,0.44197133,0.4419524,0.4405804
+2,0.6624113,0.662375,0.661958
+3,0.73566717,0.7356161,0.73541015
+4,0.72424763,0.72419375,0.7239804
+5,0.74316144,0.743101,0.7429021
+6,0.7358866,0.7358457,0.7356837
+7,0.7717992,0.77175343,0.7716246
+8,0.77671385,0.77666664,0.7765362
+9,0.8156109,0.8155815,0.81542325
+10,0.8353943,0.8353729,0.83522326
+11,0.8693978,0.86938006,0.86928266
+12,0.98234653,0.98234344,0.9823433
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-roberta-large.tsv
new file mode 100644
index 0000000..bced5e2
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/it/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.3594449,0.35943142,0.35630783
+1,0.662669,0.662666,0.66222733
+2,0.7278628,0.72787654,0.727691
+3,0.7365745,0.73659885,0.7363675
+4,0.7886599,0.7886995,0.7885636
+5,0.7880371,0.7880777,0.7879012
+6,0.8136995,0.813729,0.8136207
+7,0.8208896,0.8209284,0.82083935
+8,0.83259714,0.8326411,0.8325414
+9,0.83512545,0.83517814,0.83508295
+10,0.8349007,0.8349598,0.83484215
+11,0.8370682,0.83713394,0.83697623
+12,0.83735925,0.8374388,0.83716834
+13,0.8307876,0.8308769,0.8305337
+14,0.8304336,0.83052474,0.83013064
+15,0.8350196,0.83511585,0.8347356
+16,0.8262828,0.8263541,0.82602215
+17,0.8352246,0.83529,0.8349814
+18,0.8413706,0.8414452,0.84111327
+19,0.84041846,0.84048223,0.840178
+20,0.85462093,0.8546788,0.854426
+21,0.8733275,0.87337714,0.8731498
+22,0.87235314,0.8724075,0.87218434
+23,0.86924857,0.86931163,0.8690715
+24,0.97641337,0.9764174,0.97640705
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..fe4c935
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.31880563,0.31885436,0.314444
+1,0.39939553,0.399461,0.39680958
+2,0.39936826,0.39942977,0.39705652
+3,0.4639698,0.46403775,0.462585
+4,0.51133174,0.511391,0.5104017
+5,0.58995867,0.59001076,0.5894416
+6,0.64041185,0.6404576,0.640104
+7,0.5489481,0.5489947,0.5485002
+8,0.5241059,0.5241476,0.5235563
+9,0.61489826,0.6149375,0.614489
+10,0.69464105,0.6946774,0.69437844
+11,0.73005176,0.7301036,0.72975814
+12,0.42655912,0.42657772,0.42596325
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..a79d579
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.25295407,0.2529038,0.24914542
+1,0.30763087,0.30758235,0.3050838
+2,0.3358753,0.33583683,0.33403713
+3,0.35062265,0.35062423,0.34857833
+4,0.36368594,0.36370537,0.36166307
+5,0.4208051,0.42084384,0.41921085
+6,0.52163017,0.5216751,0.5207288
+7,0.5748712,0.57491165,0.57426846
+8,0.5565561,0.55660844,0.55587536
+9,0.50083,0.50086135,0.5000541
+10,0.4287173,0.42873642,0.4276349
+11,0.37965864,0.37967306,0.37825358
+12,0.407949,0.40795445,0.40652746
+13,0.43800756,0.437995,0.43645307
+14,0.45024598,0.4502631,0.4485536
+15,0.45746338,0.45749146,0.4557482
+16,0.38742596,0.38743827,0.3859296
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-roberta-base.tsv
new file mode 100644
index 0000000..2358e52
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.3158023,0.31595236,0.31170404
+1,0.44148916,0.44161677,0.43999764
+2,0.65698195,0.65707916,0.65660477
+3,0.7291459,0.72921735,0.72896385
+4,0.72035086,0.720424,0.72016907
+5,0.7387083,0.73877054,0.73851764
+6,0.7331035,0.7331564,0.73294306
+7,0.7675076,0.767555,0.7673717
+8,0.7721929,0.7722387,0.77205515
+9,0.8134348,0.81347644,0.81327385
+10,0.8337028,0.8337392,0.8335502
+11,0.86931133,0.869342,0.86921614
+12,0.98048294,0.9804859,0.980481
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-roberta-large.tsv
new file mode 100644
index 0000000..933e92b
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/lv/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.38303047,0.38316694,0.38004494
+1,0.6715437,0.67160815,0.671146
+2,0.72935355,0.7293971,0.7292059
+3,0.7367963,0.7368109,0.73659825
+4,0.78665257,0.78668237,0.7865436
+5,0.7876893,0.7877102,0.7875555
+6,0.8118788,0.811887,0.8117912
+7,0.81852704,0.8185447,0.81847227
+8,0.82763994,0.8276652,0.8275769
+9,0.829965,0.8299915,0.8299079
+10,0.8325733,0.8325909,0.832492
+11,0.83627987,0.8362996,0.8361541
+12,0.83918196,0.8392273,0.8389481
+13,0.83649516,0.8365421,0.83620304
+14,0.8397933,0.8398624,0.8394616
+15,0.84358793,0.8436498,0.8432716
+16,0.83227086,0.8323361,0.8319669
+17,0.83968145,0.83974457,0.8394039
+18,0.8447246,0.84477955,0.8444194
+19,0.8420644,0.8421077,0.84175897
+20,0.85892874,0.85897714,0.85869175
+21,0.8784681,0.87850714,0.8782475
+22,0.8812008,0.8812461,0.8809788
+23,0.8809998,0.88104856,0.8807619
+24,0.9765009,0.9765086,0.9764968
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..d0e59e2
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.29287484,0.29276612,0.28896153
+1,0.37047997,0.3703816,0.36809015
+2,0.36899555,0.36891612,0.36687937
+3,0.46113333,0.46108902,0.46000266
+4,0.5105716,0.5105592,0.5097696
+5,0.5895024,0.5895065,0.5890118
+6,0.6431079,0.6431254,0.64279854
+7,0.5462664,0.54628617,0.5457523
+8,0.5256067,0.5256172,0.5249814
+9,0.6314677,0.6314837,0.63099706
+10,0.70045394,0.7004913,0.7001096
+11,0.73772144,0.7377826,0.73737544
+12,0.376468,0.37648788,0.37585765
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..110b4aa
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.27077773,0.27081063,0.26688486
+1,0.31458914,0.31458747,0.3120791
+2,0.33407974,0.3341051,0.33230388
+3,0.3424346,0.34244755,0.3402508
+4,0.35230166,0.35230178,0.35003006
+5,0.4220341,0.42202395,0.4202843
+6,0.53116757,0.53115034,0.530122
+7,0.5820105,0.5819889,0.5812692
+8,0.5657533,0.56574196,0.56491727
+9,0.5071269,0.5071374,0.5062459
+10,0.43731558,0.43734074,0.43611565
+11,0.37808847,0.37813658,0.37643093
+12,0.38327742,0.3833187,0.38146868
+13,0.39855412,0.39860153,0.3966159
+14,0.40502536,0.4050431,0.4029639
+15,0.4187931,0.4188173,0.41667226
+16,0.3218223,0.32182986,0.32025683
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-roberta-base.tsv
new file mode 100644
index 0000000..f30f813
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.302053,0.30184427,0.2976399
+1,0.45684627,0.45677197,0.45546424
+2,0.6735796,0.6735233,0.6731233
+3,0.74322075,0.74317265,0.74297166
+4,0.72940576,0.72935677,0.72914743
+5,0.74814695,0.748095,0.74789345
+6,0.7392128,0.739171,0.7390097
+7,0.7750178,0.774979,0.7748446
+8,0.7798485,0.77981144,0.779676
+9,0.81777656,0.8177529,0.8175885
+10,0.83894795,0.8389314,0.838778
+11,0.870458,0.87044656,0.8703426
+12,0.9830619,0.98306423,0.98306143
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-roberta-large.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-roberta-large.tsv
new file mode 100644
index 0000000..f2a8901
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/pt/xlm-roberta-large.tsv
@@ -0,0 +1,26 @@
+LAYER,P,R,F
+0,0.3740441,0.37399486,0.3710023
+1,0.67301875,0.6729773,0.6725724
+2,0.7341082,0.73409325,0.733912
+3,0.74055076,0.7405203,0.7403056
+4,0.7904661,0.79042983,0.79032314
+5,0.7880771,0.78803754,0.78790236
+6,0.81661665,0.8166169,0.81652534
+7,0.8221869,0.82219744,0.82212555
+8,0.8350775,0.83508027,0.8350043
+9,0.8372719,0.83726805,0.8372026
+10,0.8372136,0.8371918,0.8371133
+11,0.8399054,0.8398653,0.83975667
+12,0.84060127,0.8405483,0.84033316
+13,0.8341999,0.8341561,0.83385843
+14,0.83416283,0.83414257,0.8337824
+15,0.8384014,0.83838236,0.8380531
+16,0.8296981,0.82966036,0.8293861
+17,0.83966845,0.8396195,0.8393744
+18,0.84589136,0.8458346,0.845562
+19,0.84492606,0.8448792,0.8446221
+20,0.8584489,0.8584082,0.858195
+21,0.87398726,0.8739412,0.87374836
+22,0.8719638,0.8719428,0.8717388
+23,0.87165064,0.87161326,0.87140054
+24,0.97964114,0.9796477,0.979639
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/bert-base-chinese.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/bert-base-chinese.tsv
new file mode 100644
index 0000000..93a76ed
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/bert-base-chinese.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.2786982,0.2785878,0.27494037
+1,0.33671036,0.33662596,0.33471334
+2,0.42845756,0.4284101,0.4273608
+3,0.45149758,0.45147166,0.45057997
+4,0.5184017,0.5184023,0.51783705
+5,0.573508,0.5734958,0.57311326
+6,0.6330495,0.6330315,0.63276017
+7,0.59864044,0.5986131,0.59829366
+8,0.54804957,0.5480091,0.54755783
+9,0.51617336,0.516132,0.5156478
+10,0.5561151,0.55609417,0.55573994
+11,0.5984755,0.5984512,0.5981564
+12,0.56038475,0.5603337,0.5599188
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/bert-base-multilingual-cased.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/bert-base-multilingual-cased.tsv
new file mode 100644
index 0000000..9f0efef
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/bert-base-multilingual-cased.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.3118959,0.31177446,0.3086432
+1,0.3425565,0.34244964,0.3399823
+2,0.35352883,0.35343447,0.35129714
+3,0.43610418,0.43604368,0.43494177
+4,0.489178,0.4891102,0.48830378
+5,0.5690116,0.5689432,0.5684761
+6,0.6265541,0.6264865,0.6262059
+7,0.54113525,0.5410629,0.54064935
+8,0.5284168,0.52834535,0.5279011
+9,0.62840384,0.62833464,0.62803453
+10,0.69999313,0.69992936,0.6997184
+11,0.732485,0.73242646,0.73219264
+12,0.37793094,0.37789607,0.3773833
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/xlm-mlm-100-1280.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/xlm-mlm-100-1280.tsv
new file mode 100644
index 0000000..0b0065a
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/xlm-mlm-100-1280.tsv
@@ -0,0 +1,18 @@
+LAYER,P,R,F
+0,0.32776257,0.3276813,0.32470536
+1,0.3356181,0.3355479,0.332959
+2,0.35034394,0.35026166,0.3482373
+3,0.36442822,0.36435747,0.36224666
+4,0.3771403,0.37707978,0.37516475
+5,0.43258497,0.4325344,0.43104237
+6,0.5181599,0.5181224,0.5172326
+7,0.5792645,0.57922333,0.57866186
+8,0.5692134,0.5691731,0.56858486
+9,0.5324812,0.5324232,0.53178775
+10,0.47810394,0.47805268,0.47723517
+11,0.4319199,0.43188363,0.43088776
+12,0.44747546,0.447443,0.44653583
+13,0.45633683,0.4563076,0.45531917
+14,0.45723236,0.457195,0.45610127
+15,0.46675017,0.46670267,0.4656479
+16,0.40051928,0.40046176,0.39960644
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/xlm-roberta-base.tsv b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/xlm-roberta-base.tsv
new file mode 100644
index 0000000..31c46db
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/rescale_baseline/zh/xlm-roberta-base.tsv
@@ -0,0 +1,14 @@
+LAYER,P,R,F
+0,0.36188287,0.36180493,0.35862362
+1,0.4372344,0.43716717,0.43550655
+2,0.64521,0.64515334,0.6446227
+3,0.734053,0.7340016,0.7337482
+4,0.730163,0.73011726,0.72988415
+5,0.7542184,0.7541747,0.7539484
+6,0.7611062,0.7610684,0.76089287
+7,0.79163146,0.7915949,0.79145956
+8,0.79859376,0.79856044,0.7984367
+9,0.82988167,0.8298588,0.82975745
+10,0.8522986,0.8522761,0.8521975
+11,0.8852355,0.88521546,0.88517046
+12,0.98287344,0.98286974,0.9828698
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/score.py b/mitigating_bias/train/BERTScore/bert_score/bert_score/score.py
new file mode 100644
index 0000000..df60feb
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/score.py
@@ -0,0 +1,305 @@
+import os
+import sys
+import time
+import pathlib
+import torch
+import matplotlib.pyplot as plt
+from mpl_toolkits.axes_grid1 import make_axes_locatable
+import numpy as np
+import pandas as pd
+
+from collections import defaultdict
+from transformers import AutoTokenizer
+
+from .utils import (
+ get_model,
+ get_tokenizer,
+ get_idf_dict,
+ bert_cos_score_idf,
+ get_bert_embedding,
+ lang2model,
+ model2layers,
+ get_hash,
+ cache_scibert,
+ sent_encode,
+)
+
+
+__all__ = ["score", "plot_example"]
+
+
+def score(
+ cands,
+ refs,
+ model_type=None,
+ num_layers=None,
+ verbose=False,
+ idf=False,
+ device=None,
+ batch_size=64,
+ nthreads=4,
+ all_layers=False,
+ lang=None,
+ return_hash=False,
+ rescale_with_baseline=False,
+ baseline_path=None,
+ use_fast_tokenizer=False
+):
+ """
+ BERTScore metric.
+
+ Args:
+ - :param: `cands` (list of str): candidate sentences
+ - :param: `refs` (list of str or list of list of str): reference sentences
+ - :param: `model_type` (str): bert specification, default using the suggested
+ model for the target langauge; has to specify at least one of
+ `model_type` or `lang`
+ - :param: `num_layers` (int): the layer of representation to use.
+ default using the number of layer tuned on WMT16 correlation data
+ - :param: `verbose` (bool): turn on intermediate status update
+ - :param: `idf` (bool or dict): use idf weighting, can also be a precomputed idf_dict
+ - :param: `device` (str): on which the contextual embedding model will be allocated on.
+ If this argument is None, the model lives on cuda:0 if cuda is available.
+ - :param: `nthreads` (int): number of threads
+ - :param: `batch_size` (int): bert score processing batch size
+ - :param: `lang` (str): language of the sentences; has to specify
+ at least one of `model_type` or `lang`. `lang` needs to be
+ specified when `rescale_with_baseline` is True.
+ - :param: `return_hash` (bool): return hash code of the setting
+ - :param: `rescale_with_baseline` (bool): rescale bertscore with pre-computed baseline
+ - :param: `baseline_path` (str): customized baseline file
+ - :param: `use_fast_tokenizer` (bool): `use_fast` parameter passed to HF tokenizer
+
+ Return:
+ - :param: `(P, R, F)`: each is of shape (N); N = number of input
+ candidate reference pairs. if returning hashcode, the
+ output will be ((P, R, F), hashcode). If a candidate have
+ multiple references, the returned score of this candidate is
+ the *best* score among all references.
+ """
+ assert len(cands) == len(refs), "Different number of candidates and references"
+
+ assert lang is not None or model_type is not None, "Either lang or model_type should be specified"
+
+ ref_group_boundaries = None
+ if not isinstance(refs[0], str):
+ ref_group_boundaries = []
+ ori_cands, ori_refs = cands, refs
+ cands, refs = [], []
+ count = 0
+ for cand, ref_group in zip(ori_cands, ori_refs):
+ cands += [cand] * len(ref_group)
+ refs += ref_group
+ ref_group_boundaries.append((count, count + len(ref_group)))
+ count += len(ref_group)
+
+ if rescale_with_baseline:
+ assert lang is not None, "Need to specify Language when rescaling with baseline"
+
+ if model_type is None:
+ lang = lang.lower()
+ model_type = lang2model[lang]
+ if num_layers is None:
+ num_layers = model2layers[model_type]
+
+ tokenizer = get_tokenizer(model_type, use_fast_tokenizer)
+ model = get_model(model_type, num_layers, all_layers)
+ if device is None:
+ device = "cuda" if torch.cuda.is_available() else "cpu"
+ model.to(device)
+
+ if not idf:
+ idf_dict = defaultdict(lambda: 1.0)
+ # set idf for [SEP] and [CLS] to 0
+ idf_dict[tokenizer.sep_token_id] = 0
+ idf_dict[tokenizer.cls_token_id] = 0
+ elif isinstance(idf, dict):
+ if verbose:
+ print("using predefined IDF dict...")
+ idf_dict = idf
+ else:
+ if verbose:
+ print("preparing IDF dict...")
+ start = time.perf_counter()
+ idf_dict = get_idf_dict(refs, tokenizer, nthreads=nthreads)
+ if verbose:
+ print("done in {:.2f} seconds".format(time.perf_counter() - start))
+
+ if verbose:
+ print("calculating scores...")
+ start = time.perf_counter()
+ all_preds = bert_cos_score_idf(
+ model,
+ refs,
+ cands,
+ tokenizer,
+ idf_dict,
+ verbose=verbose,
+ device=device,
+ batch_size=batch_size,
+ all_layers=all_layers,
+ ).cpu()
+
+ if ref_group_boundaries is not None:
+ max_preds = []
+ for beg, end in ref_group_boundaries:
+ max_preds.append(all_preds[beg:end].max(dim=0)[0])
+ all_preds = torch.stack(max_preds, dim=0)
+
+ use_custom_baseline = baseline_path is not None
+ if rescale_with_baseline:
+ if baseline_path is None:
+ baseline_path = os.path.join(os.path.dirname(__file__), f"rescale_baseline/{lang}/{model_type}.tsv")
+ if os.path.isfile(baseline_path):
+ if not all_layers:
+ baselines = torch.from_numpy(pd.read_csv(baseline_path).iloc[num_layers].to_numpy())[1:].float()
+ else:
+ baselines = torch.from_numpy(pd.read_csv(baseline_path).to_numpy())[:, 1:].unsqueeze(1).float()
+
+ all_preds = (all_preds - baselines) / (1 - baselines)
+ else:
+ print(
+ f"Warning: Baseline not Found for {model_type} on {lang} at {baseline_path}", file=sys.stderr,
+ )
+
+ out = all_preds[..., 0], all_preds[..., 1], all_preds[..., 2] # P, R, F
+
+ if verbose:
+ time_diff = time.perf_counter() - start
+ print(f"done in {time_diff:.2f} seconds, {len(refs) / time_diff:.2f} sentences/sec")
+
+ if return_hash:
+ return tuple(
+ [
+ out,
+ get_hash(model_type, num_layers, idf, rescale_with_baseline,
+ use_custom_baseline=use_custom_baseline,
+ use_fast_tokenizer=use_fast_tokenizer),
+ ]
+ )
+
+ return out
+
+
+def plot_example(
+ candidate,
+ reference,
+ model_type=None,
+ num_layers=None,
+ lang=None,
+ rescale_with_baseline=False,
+ baseline_path=None,
+ use_fast_tokenizer=False,
+ fname="",
+):
+ """
+ BERTScore metric.
+
+ Args:
+ - :param: `candidate` (str): a candidate sentence
+ - :param: `reference` (str): a reference sentence
+ - :param: `verbose` (bool): turn on intermediate status update
+ - :param: `model_type` (str): bert specification, default using the suggested
+ model for the target langauge; has to specify at least one of
+ `model_type` or `lang`
+ - :param: `num_layers` (int): the layer of representation to use
+ - :param: `lang` (str): language of the sentences; has to specify
+ at least one of `model_type` or `lang`. `lang` needs to be
+ specified when `rescale_with_baseline` is True.
+ - :param: `return_hash` (bool): return hash code of the setting
+ - :param: `rescale_with_baseline` (bool): rescale bertscore with pre-computed baseline
+ - :param: `use_fast_tokenizer` (bool): `use_fast` parameter passed to HF tokenizer
+ - :param: `fname` (str): path to save the output plot
+ """
+ assert isinstance(candidate, str)
+ assert isinstance(reference, str)
+
+ assert lang is not None or model_type is not None, "Either lang or model_type should be specified"
+
+ if rescale_with_baseline:
+ assert lang is not None, "Need to specify Language when rescaling with baseline"
+
+ if model_type is None:
+ lang = lang.lower()
+ model_type = lang2model[lang]
+ if num_layers is None:
+ num_layers = model2layers[model_type]
+
+ tokenizer = get_tokenizer(model_type, use_fast_tokenizer)
+ model = get_model(model_type, num_layers)
+ device = "cuda" if torch.cuda.is_available() else "cpu"
+ model.to(device)
+
+ idf_dict = defaultdict(lambda: 1.0)
+ # set idf for [SEP] and [CLS] to 0
+ idf_dict[tokenizer.sep_token_id] = 0
+ idf_dict[tokenizer.cls_token_id] = 0
+
+ hyp_embedding, masks, padded_idf = get_bert_embedding(
+ [candidate], model, tokenizer, idf_dict, device=device, all_layers=False
+ )
+ ref_embedding, masks, padded_idf = get_bert_embedding(
+ [reference], model, tokenizer, idf_dict, device=device, all_layers=False
+ )
+ ref_embedding.div_(torch.norm(ref_embedding, dim=-1).unsqueeze(-1))
+ hyp_embedding.div_(torch.norm(hyp_embedding, dim=-1).unsqueeze(-1))
+ sim = torch.bmm(hyp_embedding, ref_embedding.transpose(1, 2))
+ sim = sim.squeeze(0).cpu()
+
+ # remove [CLS] and [SEP] tokens
+ r_tokens = [tokenizer.decode([i]) for i in sent_encode(tokenizer, reference)][1:-1]
+ h_tokens = [tokenizer.decode([i]) for i in sent_encode(tokenizer, candidate)][1:-1]
+ sim = sim[1:-1, 1:-1]
+
+ if rescale_with_baseline:
+ if baseline_path is None:
+ baseline_path = os.path.join(os.path.dirname(__file__), f"rescale_baseline/{lang}/{model_type}.tsv")
+ if os.path.isfile(baseline_path):
+ baselines = torch.from_numpy(pd.read_csv(baseline_path).iloc[num_layers].to_numpy())[1:].float()
+ sim = (sim - baselines[2].item()) / (1 - baselines[2].item())
+ else:
+ print(
+ f"Warning: Baseline not Found for {model_type} on {lang} at {baseline_path}", file=sys.stderr,
+ )
+
+ fig, ax = plt.subplots(figsize=(len(r_tokens), len(h_tokens)))
+ im = ax.imshow(sim, cmap="Blues", vmin=0, vmax=1)
+
+ # We want to show all ticks...
+ ax.set_xticks(np.arange(len(r_tokens)))
+ ax.set_yticks(np.arange(len(h_tokens)))
+ # ... and label them with the respective list entries
+ ax.set_xticklabels(r_tokens, fontsize=10)
+ ax.set_yticklabels(h_tokens, fontsize=10)
+ ax.grid(False)
+ plt.xlabel("Reference (tokenized)", fontsize=14)
+ plt.ylabel("Candidate (tokenized)", fontsize=14)
+ title = "Similarity Matrix"
+ if rescale_with_baseline:
+ title += " (after Rescaling)"
+ plt.title(title, fontsize=14)
+
+ divider = make_axes_locatable(ax)
+ cax = divider.append_axes("right", size="2%", pad=0.2)
+ fig.colorbar(im, cax=cax)
+
+ # Rotate the tick labels and set their alignment.
+ plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
+
+ # Loop over data dimensions and create text annotations.
+ for i in range(len(h_tokens)):
+ for j in range(len(r_tokens)):
+ text = ax.text(
+ j,
+ i,
+ "{:.3f}".format(sim[i, j].item()),
+ ha="center",
+ va="center",
+ color="k" if sim[i, j].item() < 0.5 else "w",
+ )
+
+ fig.tight_layout()
+ if fname != "":
+ plt.savefig(fname, dpi=100)
+ print("Saved figure to file: ", fname)
+ plt.show()
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/scorer.py b/mitigating_bias/train/BERTScore/bert_score/bert_score/scorer.py
new file mode 100644
index 0000000..3bafb3e
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/scorer.py
@@ -0,0 +1,324 @@
+import os
+import sys
+import time
+import pathlib
+import torch
+import matplotlib.pyplot as plt
+from mpl_toolkits.axes_grid1 import make_axes_locatable
+import numpy as np
+import pandas as pd
+import warnings
+
+from collections import defaultdict
+from transformers import AutoTokenizer
+
+from .utils import (
+ get_model,
+ get_tokenizer,
+ get_idf_dict,
+ bert_cos_score_idf,
+ get_bert_embedding,
+ lang2model,
+ model2layers,
+ get_hash,
+ cache_scibert,
+ sent_encode,
+)
+
+
+class BERTScorer:
+ """
+ BERTScore Scorer Object.
+ """
+
+ def __init__(
+ self,
+ model_type=None,
+ num_layers=None,
+ batch_size=64,
+ nthreads=4,
+ all_layers=False,
+ idf=False,
+ idf_sents=None,
+ device=None,
+ lang=None,
+ rescale_with_baseline=False,
+ baseline_path=None,
+ use_fast_tokenizer=False
+ ):
+ """
+ Args:
+ - :param: `model_type` (str): contexual embedding model specification, default using the suggested
+ model for the target langauge; has to specify at least one of
+ `model_type` or `lang`
+ - :param: `num_layers` (int): the layer of representation to use.
+ default using the number of layer tuned on WMT16 correlation data
+ - :param: `verbose` (bool): turn on intermediate status update
+ - :param: `idf` (bool): a booling to specify whether to use idf or not (this should be True even if `idf_sents` is given)
+ - :param: `idf_sents` (List of str): list of sentences used to compute the idf weights
+ - :param: `device` (str): on which the contextual embedding model will be allocated on.
+ If this argument is None, the model lives on cuda:0 if cuda is available.
+ - :param: `batch_size` (int): bert score processing batch size
+ - :param: `nthreads` (int): number of threads
+ - :param: `lang` (str): language of the sentences; has to specify
+ at least one of `model_type` or `lang`. `lang` needs to be
+ specified when `rescale_with_baseline` is True.
+ - :param: `return_hash` (bool): return hash code of the setting
+ - :param: `rescale_with_baseline` (bool): rescale bertscore with pre-computed baseline
+ - :param: `baseline_path` (str): customized baseline file
+ - :param: `use_fast_tokenizer` (bool): `use_fast` parameter passed to HF tokenizer
+ """
+
+ assert lang is not None or model_type is not None, "Either lang or model_type should be specified"
+
+ if rescale_with_baseline:
+ assert lang is not None, "Need to specify Language when rescaling with baseline"
+
+ if device is None:
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
+ else:
+ self.device = device
+
+ self._lang = lang
+ self._rescale_with_baseline = rescale_with_baseline
+ self._idf = idf
+ self.batch_size = batch_size
+ self.nthreads = nthreads
+ self.all_layers = all_layers
+
+ if model_type is None:
+ lang = lang.lower()
+ self._model_type = lang2model[lang]
+ else:
+ self._model_type = model_type
+
+ if num_layers is None:
+ self._num_layers = model2layers[self.model_type]
+ else:
+ self._num_layers = num_layers
+
+ # Building model and tokenizer
+ self._use_fast_tokenizer = use_fast_tokenizer
+ self._tokenizer = get_tokenizer(self.model_type, self._use_fast_tokenizer)
+ self._model = get_model(self.model_type, self.num_layers, self.all_layers)
+ self._model.to(self.device)
+
+ self._idf_dict = None
+ if idf_sents is not None:
+ self.compute_idf(idf_sents)
+
+ self._baseline_vals = None
+ self.baseline_path = baseline_path
+ self.use_custom_baseline = self.baseline_path is not None
+ if self.baseline_path is None:
+ self.baseline_path = os.path.join(
+ os.path.dirname(__file__), f"rescale_baseline/{self.lang}/{self.model_type}.tsv"
+ )
+
+ @property
+ def lang(self):
+ return self._lang
+
+ @property
+ def idf(self):
+ return self._idf
+
+ @property
+ def model_type(self):
+ return self._model_type
+
+ @property
+ def num_layers(self):
+ return self._num_layers
+
+ @property
+ def rescale_with_baseline(self):
+ return self._rescale_with_baseline
+
+ @property
+ def baseline_vals(self):
+ if self._baseline_vals is None:
+ if os.path.isfile(self.baseline_path):
+ if not self.all_layers:
+ self._baseline_vals = torch.from_numpy(
+ pd.read_csv(self.baseline_path).iloc[self.num_layers].to_numpy()
+ )[1:].float()
+ else:
+ self._baseline_vals = (
+ torch.from_numpy(pd.read_csv(self.baseline_path).to_numpy())[:, 1:].unsqueeze(1).float()
+ )
+ else:
+ raise ValueError(f"Baseline not Found for {self.model_type} on {self.lang} at {self.baseline_path}")
+
+ return self._baseline_vals
+
+ @property
+ def use_fast_tokenizer(self):
+ return self._use_fast_tokenizer
+
+ @property
+ def hash(self):
+ return get_hash(
+ self.model_type, self.num_layers, self.idf, self.rescale_with_baseline, self.use_custom_baseline, self.use_fast_tokenizer
+ )
+
+ def compute_idf(self, sents):
+ """
+ Args:
+
+ """
+ if self._idf_dict is not None:
+ warnings.warn("Overwriting the previous importance weights.")
+
+ self._idf_dict = get_idf_dict(sents, self._tokenizer, nthreads=self.nthreads)
+
+ def score(self, cands, refs, verbose=False, batch_size=64, return_hash=False):
+ """
+ Args:
+ - :param: `cands` (list of str): candidate sentences
+ - :param: `refs` (list of str or list of list of str): reference sentences
+
+ Return:
+ - :param: `(P, R, F)`: each is of shape (N); N = number of input
+ candidate reference pairs. if returning hashcode, the
+ output will be ((P, R, F), hashcode). If a candidate have
+ multiple references, the returned score of this candidate is
+ the *best* score among all references.
+ """
+
+ ref_group_boundaries = None
+ if not isinstance(refs[0], str):
+ ref_group_boundaries = []
+ ori_cands, ori_refs = cands, refs
+ cands, refs = [], []
+ count = 0
+ for cand, ref_group in zip(ori_cands, ori_refs):
+ cands += [cand] * len(ref_group)
+ refs += ref_group
+ ref_group_boundaries.append((count, count + len(ref_group)))
+ count += len(ref_group)
+
+ if verbose:
+ print("calculating scores...")
+ start = time.perf_counter()
+
+ if self.idf:
+ assert self._idf_dict, "IDF weights are not computed"
+ idf_dict = self._idf_dict
+ else:
+ idf_dict = defaultdict(lambda: 1.0)
+ idf_dict[self._tokenizer.sep_token_id] = 0
+ idf_dict[self._tokenizer.cls_token_id] = 0
+
+ all_preds = bert_cos_score_idf(
+ self._model,
+ refs,
+ cands,
+ self._tokenizer,
+ idf_dict,
+ verbose=verbose,
+ device=self.device,
+ batch_size=batch_size,
+ all_layers=self.all_layers,
+ ).cpu()
+
+ if ref_group_boundaries is not None:
+ max_preds = []
+ for start, end in ref_group_boundaries:
+ max_preds.append(all_preds[start:end].max(dim=0)[0])
+ all_preds = torch.stack(max_preds, dim=0)
+
+ if self.rescale_with_baseline:
+ all_preds = (all_preds - self.baseline_vals) / (1 - self.baseline_vals)
+
+ out = all_preds[..., 0], all_preds[..., 1], all_preds[..., 2] # P, R, F
+
+ if verbose:
+ time_diff = time.perf_counter() - start
+ print(f"done in {time_diff:.2f} seconds, {len(refs) / time_diff:.2f} sentences/sec")
+
+ if return_hash:
+ out = tuple([out, self.hash])
+
+ return out
+
+ def plot_example(self, candidate, reference, fname=""):
+ """
+ Args:
+ - :param: `candidate` (str): a candidate sentence
+ - :param: `reference` (str): a reference sentence
+ - :param: `fname` (str): path to save the output plot
+ """
+
+ assert isinstance(candidate, str)
+ assert isinstance(reference, str)
+
+ idf_dict = defaultdict(lambda: 1.0)
+ idf_dict[self._tokenizer.sep_token_id] = 0
+ idf_dict[self._tokenizer.cls_token_id] = 0
+
+ hyp_embedding, masks, padded_idf = get_bert_embedding(
+ [candidate], self._model, self._tokenizer, idf_dict, device=self.device, all_layers=False,
+ )
+ ref_embedding, masks, padded_idf = get_bert_embedding(
+ [reference], self._model, self._tokenizer, idf_dict, device=self.device, all_layers=False,
+ )
+ ref_embedding.div_(torch.norm(ref_embedding, dim=-1).unsqueeze(-1))
+ hyp_embedding.div_(torch.norm(hyp_embedding, dim=-1).unsqueeze(-1))
+ sim = torch.bmm(hyp_embedding, ref_embedding.transpose(1, 2))
+ sim = sim.squeeze(0).cpu()
+
+ r_tokens = [self._tokenizer.decode([i]) for i in sent_encode(self._tokenizer, reference)][1:-1]
+ h_tokens = [self._tokenizer.decode([i]) for i in sent_encode(self._tokenizer, candidate)][1:-1]
+ sim = sim[1:-1, 1:-1]
+
+ if self.rescale_with_baseline:
+ sim = (sim - self.baseline_vals[2].item()) / (1 - self.baseline_vals[2].item())
+
+ fig, ax = plt.subplots(figsize=(len(r_tokens), len(h_tokens)))
+ im = ax.imshow(sim, cmap="Blues", vmin=0, vmax=1)
+
+ # We want to show all ticks...
+ ax.set_xticks(np.arange(len(r_tokens)))
+ ax.set_yticks(np.arange(len(h_tokens)))
+ # ... and label them with the respective list entries
+ ax.set_xticklabels(r_tokens, fontsize=10)
+ ax.set_yticklabels(h_tokens, fontsize=10)
+ ax.grid(False)
+ plt.xlabel("Reference (tokenized)", fontsize=14)
+ plt.ylabel("Candidate (tokenized)", fontsize=14)
+ title = "Similarity Matrix"
+ if self.rescale_with_baseline:
+ title += " (after Rescaling)"
+ plt.title(title, fontsize=14)
+
+ divider = make_axes_locatable(ax)
+ cax = divider.append_axes("right", size="2%", pad=0.2)
+ fig.colorbar(im, cax=cax)
+
+ # Rotate the tick labels and set their alignment.
+ plt.setp(ax.get_xticklabels(), rotation=45, ha="right", rotation_mode="anchor")
+
+ # Loop over data dimensions and create text annotations.
+ for i in range(len(h_tokens)):
+ for j in range(len(r_tokens)):
+ text = ax.text(
+ j,
+ i,
+ "{:.3f}".format(sim[i, j].item()),
+ ha="center",
+ va="center",
+ color="k" if sim[i, j].item() < 0.5 else "w",
+ )
+
+ fig.tight_layout()
+ if fname != "":
+ plt.savefig(fname, dpi=100)
+ print("Saved figure to file: ", fname)
+ plt.show()
+
+ def __repr__(self):
+ return f"{self.__class__.__name__}(hash={self.hash}, batch_size={self.batch_size}, nthreads={self.nthreads})"
+
+ def __str__(self):
+ return self.__repr__()
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score/utils.py b/mitigating_bias/train/BERTScore/bert_score/bert_score/utils.py
new file mode 100644
index 0000000..62e51a1
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score/utils.py
@@ -0,0 +1,631 @@
+import sys
+import os
+import torch
+from math import log
+from itertools import chain
+from collections import defaultdict, Counter
+from multiprocessing import Pool
+from functools import partial
+from tqdm.auto import tqdm
+from torch.nn.utils.rnn import pad_sequence
+from distutils.version import LooseVersion
+
+from transformers import BertConfig, XLNetConfig, XLMConfig, RobertaConfig
+from transformers import AutoModel, GPT2Tokenizer, AutoTokenizer
+
+from . import __version__
+from transformers import __version__ as trans_version
+
+__all__ = []
+
+SCIBERT_URL_DICT = {
+ "scibert-scivocab-uncased": "https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/pytorch_models/scibert_scivocab_uncased.tar", # recommend by the SciBERT authors
+ "scibert-scivocab-cased": "https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/pytorch_models/scibert_scivocab_cased.tar",
+ "scibert-basevocab-uncased": "https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/pytorch_models/scibert_basevocab_uncased.tar",
+ "scibert-basevocab-cased": "https://s3-us-west-2.amazonaws.com/ai2-s2-research/scibert/pytorch_models/scibert_basevocab_cased.tar",
+}
+
+
+lang2model = defaultdict(lambda: "bert-base-multilingual-cased")
+lang2model.update(
+ {
+ "en": "roberta-large",
+ "zh": "bert-base-chinese",
+ "tr": "dbmdz/bert-base-turkish-cased",
+ "en-sci": "allenai/scibert_scivocab_uncased",
+ }
+)
+
+
+model2layers = {
+ "bert-base-uncased": 9, # 0.6925188074454226
+ "bert-large-uncased": 18, # 0.7210358126642836
+ "bert-base-cased-finetuned-mrpc": 9, # 0.6721947475618048
+ "bert-base-multilingual-cased": 9, # 0.6680687802637132
+ "bert-base-chinese": 8,
+ "roberta-base": 10, # 0.706288719158983
+ "roberta-large": 17, # 0.7385974720781534
+ "roberta-large-mnli": 19, # 0.7535618640417984
+ "roberta-base-openai-detector": 7, # 0.7048158349432633
+ "roberta-large-openai-detector": 15, # 0.7462770207355116
+ "xlnet-base-cased": 5, # 0.6630103662114238
+ "xlnet-large-cased": 7, # 0.6598800720297179
+ "xlm-mlm-en-2048": 6, # 0.651262570131464
+ "xlm-mlm-100-1280": 10, # 0.6475166424401905
+ # "scibert-scivocab-uncased": 8, # 0.6590354319927313
+ # "scibert-scivocab-cased": 9, # 0.6536375053937445
+ # "scibert-basevocab-uncased": 9, # 0.6748944832703548
+ # "scibert-basevocab-cased": 9, # 0.6524624150542374
+ 'allenai/scibert_scivocab_uncased': 8, # 0.6590354393124127
+ 'allenai/scibert_scivocab_cased': 9, # 0.6536374902465466
+ 'nfliu/scibert_basevocab_uncased': 9, # 0.6748945076082333
+ "distilroberta-base": 5, # 0.6797558139322964
+ "distilbert-base-uncased": 5, # 0.6756659152782033
+ "distilbert-base-uncased-distilled-squad": 4, # 0.6718318036382493
+ "distilbert-base-multilingual-cased": 5, # 0.6178131050889238
+ "albert-base-v1": 10, # 0.654237567249745
+ "albert-large-v1": 17, # 0.6755890754323239
+ "albert-xlarge-v1": 16, # 0.7031844211905911
+ "albert-xxlarge-v1": 8, # 0.7508642218461096
+ "albert-base-v2": 9, # 0.6682455591837927
+ "albert-large-v2": 14, # 0.7008537594374035
+ "albert-xlarge-v2": 13, # 0.7317228357869254
+ "albert-xxlarge-v2": 8, # 0.7505160257184014
+ "xlm-roberta-base": 9, # 0.6506799445871697
+ "xlm-roberta-large": 17, # 0.6941551437476826
+ "google/electra-small-generator": 9, # 0.6659421842117754
+ "google/electra-small-discriminator": 11, # 0.6534639151385759
+ "google/electra-base-generator": 10, # 0.6730033453857188
+ "google/electra-base-discriminator": 9, # 0.7032089590812965
+ "google/electra-large-generator": 18, # 0.6813370013104459
+ "google/electra-large-discriminator": 14, # 0.6896675824733477
+ "google/bert_uncased_L-2_H-128_A-2": 1, # 0.5887998733228855
+ "google/bert_uncased_L-2_H-256_A-4": 1, # 0.6114863547661203
+ "google/bert_uncased_L-2_H-512_A-8": 1, # 0.6177345529192847
+ "google/bert_uncased_L-2_H-768_A-12": 2, # 0.6191261237956839
+ "google/bert_uncased_L-4_H-128_A-2": 3, # 0.6076202863798991
+ "google/bert_uncased_L-4_H-256_A-4": 3, # 0.6205239036810148
+ "google/bert_uncased_L-4_H-512_A-8": 3, # 0.6375351621856903
+ "google/bert_uncased_L-4_H-768_A-12": 3, # 0.6561849979644787
+ "google/bert_uncased_L-6_H-128_A-2": 5, # 0.6200458425360283
+ "google/bert_uncased_L-6_H-256_A-4": 5, # 0.6277501629539081
+ "google/bert_uncased_L-6_H-512_A-8": 5, # 0.641952305130849
+ "google/bert_uncased_L-6_H-768_A-12": 5, # 0.6762186226247106
+ "google/bert_uncased_L-8_H-128_A-2": 7, # 0.6186876506711779
+ "google/bert_uncased_L-8_H-256_A-4": 7, # 0.6447993208267708
+ "google/bert_uncased_L-8_H-512_A-8": 6, # 0.6489729408169956
+ "google/bert_uncased_L-8_H-768_A-12": 7, # 0.6705203359541737
+ "google/bert_uncased_L-10_H-128_A-2": 8, # 0.6126762064125278
+ "google/bert_uncased_L-10_H-256_A-4": 8, # 0.6376350032576573
+ "google/bert_uncased_L-10_H-512_A-8": 9, # 0.6579006292799915
+ "google/bert_uncased_L-10_H-768_A-12": 8, # 0.6861146692220176
+ "google/bert_uncased_L-12_H-128_A-2": 10, # 0.6184105693383591
+ "google/bert_uncased_L-12_H-256_A-4": 11, # 0.6374004994430261
+ "google/bert_uncased_L-12_H-512_A-8": 10, # 0.65880012149526
+ "google/bert_uncased_L-12_H-768_A-12": 9, # 0.675911357700092
+ "amazon/bort": 0, # 0.41927911053036643
+ "facebook/bart-base": 6, # 0.7122259132414092
+ "facebook/bart-large": 10, # 0.7448671872459683
+ "facebook/bart-large-cnn": 10, # 0.7393148105835096
+ "facebook/bart-large-mnli": 11, # 0.7531665445691358
+ "facebook/bart-large-xsum": 9, # 0.7496408866539556
+ "t5-small": 6, # 0.6813843919496912
+ "t5-base": 11, # 0.7096044814981418
+ "t5-large": 23, # 0.7244153820191929
+ "vinai/bertweet-base": 9, # 0.6529471006118857
+ "microsoft/deberta-base": 9, # 0.7088459455930344
+ "microsoft/deberta-base-mnli": 9, # 0.7395257063907247
+ "microsoft/deberta-large": 16, # 0.7511806792052013
+ "microsoft/deberta-large-mnli": 18, # 0.7736263649679905
+ "microsoft/deberta-xlarge": 18, # 0.7568670944373346
+ "microsoft/deberta-xlarge-mnli": 40, # 0.7780600929333213
+ "YituTech/conv-bert-base": 10, # 0.7058253551080789
+ "YituTech/conv-bert-small": 10, # 0.6544473011107349
+ "YituTech/conv-bert-medium-small": 9, # 0.6590097075123257
+ "microsoft/mpnet-base": 8, # 0.724976539498804
+ "squeezebert/squeezebert-uncased": 9, # 0.6543868703018726
+ "squeezebert/squeezebert-mnli": 9, # 0.6654799051284791
+ "squeezebert/squeezebert-mnli-headless": 9, # 0.6654799051284791
+ "tuner007/pegasus_paraphrase": 15, # 0.7188349436772694
+ "google/pegasus-large": 8, # 0.63960462272448
+ "google/pegasus-xsum": 11, # 0.6836878575233349
+ "sshleifer/tiny-mbart": 2, # 0.028246072231946733
+ "facebook/mbart-large-cc25": 12, # 0.6582922975802958
+ "facebook/mbart-large-50": 12, # 0.6464972230103133
+ "facebook/mbart-large-en-ro": 12, # 0.6791285137459857
+ "facebook/mbart-large-50-many-to-many-mmt": 12, # 0.6904136529270892
+ "facebook/mbart-large-50-one-to-many-mmt": 12, # 0.6847906439540236
+ "allenai/led-base-16384": 6, # 0.7122259170564179
+ "facebook/blenderbot_small-90M": 7, # 0.6489176335400088
+ "facebook/blenderbot-400M-distill": 2, # 0.5874774070540008
+ "microsoft/prophetnet-large-uncased": 4, # 0.586496184234925
+ "microsoft/prophetnet-large-uncased-cnndm": 7, # 0.6478379437729287
+ "SpanBERT/spanbert-base-cased": 8, # 0.6824006863686848
+ "SpanBERT/spanbert-large-cased": 17, # 0.705352690855603
+ "microsoft/xprophetnet-large-wiki100-cased": 7, # 0.5852499775879524
+ "ProsusAI/finbert": 10, # 0.6923213940752796
+ "Vamsi/T5_Paraphrase_Paws": 12, # 0.6941611753807352
+ "ramsrigouthamg/t5_paraphraser": 11, # 0.7200917597031539
+ "microsoft/deberta-v2-xlarge": 10, # 0.7393675784473045
+ "microsoft/deberta-v2-xlarge-mnli": 17, # 0.7620620803716714
+ "microsoft/deberta-v2-xxlarge": 21, # 0.7520547670281869
+ "microsoft/deberta-v2-xxlarge-mnli": 22, # 0.7742603457742682
+ "allenai/longformer-base-4096": 7, # 0.7089559593129316
+ "allenai/longformer-large-4096": 14, # 0.732408493548181
+ "allenai/longformer-large-4096-finetuned-triviaqa": 14, # 0.7365882744744722
+ "zhiheng-huang/bert-base-uncased-embedding-relative-key": 4, # 0.5995636595368777
+ "zhiheng-huang/bert-base-uncased-embedding-relative-key-query": 7, # 0.6303599452145718
+ "zhiheng-huang/bert-large-uncased-whole-word-masking-embedding-relative-key-query": 19, # 0.6896878492850327
+ 'google/mt5-small': 8, # 0.6401166527273479
+ 'google/mt5-base': 11, # 0.5663956536597241
+ 'google/mt5-large': 19, # 0.6430931371732798
+ 'google/mt5-xl': 24, # 0.6707200963021145
+ 'google/bigbird-roberta-base': 10, # 0.6695606423502717
+ 'google/bigbird-roberta-large': 14, # 0.6755874042374509
+ 'google/bigbird-base-trivia-itc': 8, # 0.6930725491629892
+ 'princeton-nlp/unsup-simcse-bert-base-uncased': 10, # 0.6703066531921142
+ 'princeton-nlp/unsup-simcse-bert-large-uncased': 18, # 0.6958302800755326
+ 'princeton-nlp/unsup-simcse-roberta-base': 8, # 0.6436615893535319
+ 'princeton-nlp/unsup-simcse-roberta-large': 13, # 0.6812864385585965
+ 'princeton-nlp/sup-simcse-bert-base-uncased': 10, # 0.7068074935240984
+ 'princeton-nlp/sup-simcse-bert-large-uncased': 18, # 0.7111049471332378
+ 'princeton-nlp/sup-simcse-roberta-base': 10, # 0.7253123806661946
+ 'princeton-nlp/sup-simcse-roberta-large': 16, # 0.7497820277237173
+ 'dbmdz/bert-base-turkish-cased': 10, # WMT18 seg en-tr 0.5522827687776142
+ 'dbmdz/distilbert-base-turkish-cased': 4, # WMT18 seg en-tr 0.4742268041237113
+ 'google/byt5-small': 1, # 0.5100025975052146
+ 'google/byt5-base': 17, # 0.5810347173565313
+ 'google/byt5-large': 30, # 0.6151895697554877
+ 'microsoft/deberta-v3-xsmall': 10, # 0.6941803815412021
+ 'microsoft/deberta-v3-small': 4, # 0.6651551203179679
+ 'microsoft/deberta-v3-base': 9, # 0.7261586651018335
+ 'microsoft/mdeberta-v3-base': 10, # 0.6778713684091584
+ 'microsoft/deberta-v3-large': 12, # 0.6927693082293821
+ 'khalidalt/DeBERTa-v3-large-mnli': 18, # 0.7428756686018376
+}
+
+
+def sent_encode(tokenizer, sent):
+ "Encoding as sentence based on the tokenizer"
+ sent = sent.strip()
+ if sent == "":
+ return tokenizer.build_inputs_with_special_tokens([])
+ elif isinstance(tokenizer, GPT2Tokenizer):
+ # for RoBERTa and GPT-2
+ if LooseVersion(trans_version) >= LooseVersion("4.0.0"):
+ return tokenizer.encode(
+ sent,
+ add_special_tokens=True,
+ add_prefix_space=True,
+ max_length=tokenizer.model_max_length,
+ truncation=True,
+ )
+ elif LooseVersion(trans_version) >= LooseVersion("3.0.0"):
+ return tokenizer.encode(
+ sent, add_special_tokens=True, add_prefix_space=True, max_length=tokenizer.max_len, truncation=True,
+ )
+ elif LooseVersion(trans_version) >= LooseVersion("2.0.0"):
+ return tokenizer.encode(sent, add_special_tokens=True, add_prefix_space=True, max_length=tokenizer.max_len)
+ else:
+ raise NotImplementedError(f"transformers version {trans_version} is not supported")
+ else:
+ if LooseVersion(trans_version) >= LooseVersion("4.0.0"):
+ return tokenizer.encode(
+ sent, add_special_tokens=True, max_length=tokenizer.model_max_length, truncation=True,
+ )
+ elif LooseVersion(trans_version) >= LooseVersion("3.0.0"):
+ return tokenizer.encode(sent, add_special_tokens=True, max_length=tokenizer.max_len, truncation=True)
+ elif LooseVersion(trans_version) >= LooseVersion("2.0.0"):
+ return tokenizer.encode(sent, add_special_tokens=True, max_length=tokenizer.max_len)
+ else:
+ raise NotImplementedError(f"transformers version {trans_version} is not supported")
+
+
+def get_model(model_type, num_layers, all_layers=None):
+ if model_type.startswith("scibert"):
+ model = AutoModel.from_pretrained(cache_scibert(model_type))
+ elif "t5" in model_type:
+ from transformers import T5EncoderModel
+
+ model = T5EncoderModel.from_pretrained(model_type)
+ else:
+ model = AutoModel.from_pretrained(model_type)
+ model.eval()
+
+ if hasattr(model, "decoder") and hasattr(model, "encoder"):
+ model = model.encoder
+
+ # drop unused layers
+ if not all_layers:
+ if hasattr(model, "n_layers"): # xlm
+ assert (
+ 0 <= num_layers <= model.n_layers
+ ), f"Invalid num_layers: num_layers should be between 0 and {model.n_layers} for {model_type}"
+ model.n_layers = num_layers
+ elif hasattr(model, "layer"): # xlnet
+ assert (
+ 0 <= num_layers <= len(model.layer)
+ ), f"Invalid num_layers: num_layers should be between 0 and {len(model.layer)} for {model_type}"
+ model.layer = torch.nn.ModuleList([layer for layer in model.layer[:num_layers]])
+ elif hasattr(model, "encoder"): # albert
+ if hasattr(model.encoder, "albert_layer_groups"):
+ assert (
+ 0 <= num_layers <= model.encoder.config.num_hidden_layers
+ ), f"Invalid num_layers: num_layers should be between 0 and {model.encoder.config.num_hidden_layers} for {model_type}"
+ model.encoder.config.num_hidden_layers = num_layers
+ elif hasattr(model.encoder, "block"): # t5
+ assert (
+ 0 <= num_layers <= len(model.encoder.block)
+ ), f"Invalid num_layers: num_layers should be between 0 and {len(model.encoder.block)} for {model_type}"
+ model.encoder.block = torch.nn.ModuleList([layer for layer in model.encoder.block[:num_layers]])
+ else: # bert, roberta
+ assert (
+ 0 <= num_layers <= len(model.encoder.layer)
+ ), f"Invalid num_layers: num_layers should be between 0 and {len(model.encoder.layer)} for {model_type}"
+ model.encoder.layer = torch.nn.ModuleList([layer for layer in model.encoder.layer[:num_layers]])
+ elif hasattr(model, "transformer"): # bert, roberta
+ assert (
+ 0 <= num_layers <= len(model.transformer.layer)
+ ), f"Invalid num_layers: num_layers should be between 0 and {len(model.transformer.layer)} for {model_type}"
+ model.transformer.layer = torch.nn.ModuleList([layer for layer in model.transformer.layer[:num_layers]])
+ elif hasattr(model, "layers"): # bart
+ assert (
+ 0 <= num_layers <= len(model.layers)
+ ), f"Invalid num_layers: num_layers should be between 0 and {len(model.layers)} for {model_type}"
+ model.layers = torch.nn.ModuleList([layer for layer in model.layers[:num_layers]])
+ else:
+ raise ValueError("Not supported")
+ else:
+ if hasattr(model, "output_hidden_states"):
+ model.output_hidden_states = True
+ elif hasattr(model, "encoder"):
+ model.encoder.output_hidden_states = True
+ elif hasattr(model, "transformer"):
+ model.transformer.output_hidden_states = True
+ # else:
+ # raise ValueError(f"Not supported model architecture: {model_type}")
+
+ return model
+
+
+def get_tokenizer(model_type, use_fast=False):
+ if model_type.startswith("scibert"):
+ model_type = cache_scibert(model_type)
+
+ if LooseVersion(trans_version) >= LooseVersion("4.0.0"):
+ tokenizer = AutoTokenizer.from_pretrained(model_type, use_fast=use_fast)
+ else:
+ assert not use_fast, "Fast tokenizer is not available for version < 4.0.0"
+ tokenizer = AutoTokenizer.from_pretrained(model_type)
+
+ return tokenizer
+
+
+def padding(arr, pad_token, dtype=torch.long):
+ lens = torch.LongTensor([len(a) for a in arr])
+ max_len = lens.max().item()
+ padded = torch.ones(len(arr), max_len, dtype=dtype) * pad_token
+ mask = torch.zeros(len(arr), max_len, dtype=torch.long)
+ for i, a in enumerate(arr):
+ padded[i, : lens[i]] = torch.tensor(a, dtype=dtype)
+ mask[i, : lens[i]] = 1
+ return padded, lens, mask
+
+
+def bert_encode(model, x, attention_mask, all_layers=False):
+ model.eval()
+ with torch.no_grad():
+ out = model(x, attention_mask=attention_mask, output_hidden_states=all_layers)
+ if all_layers:
+ emb = torch.stack(out[-1], dim=2)
+ else:
+ emb = out[0]
+ return emb
+
+
+def process(a, tokenizer=None):
+ if tokenizer is not None:
+ a = sent_encode(tokenizer, a)
+ return set(a)
+
+
+def get_idf_dict(arr, tokenizer, nthreads=4):
+ """
+ Returns mapping from word piece index to its inverse document frequency.
+
+
+ Args:
+ - :param: `arr` (list of str) : sentences to process.
+ - :param: `tokenizer` : a BERT tokenizer corresponds to `model`.
+ - :param: `nthreads` (int) : number of CPU threads to use
+ """
+ idf_count = Counter()
+ num_docs = len(arr)
+
+ process_partial = partial(process, tokenizer=tokenizer)
+
+ with Pool(nthreads) as p:
+ idf_count.update(chain.from_iterable(p.map(process_partial, arr)))
+
+ idf_dict = defaultdict(lambda: log((num_docs + 1) / (1)))
+ idf_dict.update({idx: log((num_docs + 1) / (c + 1)) for (idx, c) in idf_count.items()})
+ return idf_dict
+
+
+def collate_idf(arr, tokenizer, idf_dict, device="cuda:0"):
+ """
+ Helper function that pads a list of sentences to hvae the same length and
+ loads idf score for words in the sentences.
+
+ Args:
+ - :param: `arr` (list of str): sentences to process.
+ - :param: `tokenize` : a function that takes a string and return list
+ of tokens.
+ - :param: `numericalize` : a function that takes a list of tokens and
+ return list of token indexes.
+ - :param: `idf_dict` (dict): mapping a word piece index to its
+ inverse document frequency
+ - :param: `pad` (str): the padding token.
+ - :param: `device` (str): device to use, e.g. 'cpu' or 'cuda'
+ """
+ arr = [sent_encode(tokenizer, a) for a in arr]
+
+ idf_weights = [[idf_dict[i] for i in a] for a in arr]
+
+ pad_token = tokenizer.pad_token_id
+
+ padded, lens, mask = padding(arr, pad_token, dtype=torch.long)
+ padded_idf, _, _ = padding(idf_weights, 0, dtype=torch.float)
+
+ padded = padded.to(device=device)
+ mask = mask.to(device=device)
+ lens = lens.to(device=device)
+ return padded, padded_idf, lens, mask
+
+
+def get_bert_embedding(all_sens, model, tokenizer, idf_dict, batch_size=-1, device="cuda:0", all_layers=False):
+ """
+ Compute BERT embedding in batches.
+
+ Args:
+ - :param: `all_sens` (list of str) : sentences to encode.
+ - :param: `model` : a BERT model from `pytorch_pretrained_bert`.
+ - :param: `tokenizer` : a BERT tokenizer corresponds to `model`.
+ - :param: `idf_dict` (dict) : mapping a word piece index to its
+ inverse document frequency
+ - :param: `device` (str): device to use, e.g. 'cpu' or 'cuda'
+ """
+
+ padded_sens, padded_idf, lens, mask = collate_idf(all_sens, tokenizer, idf_dict, device=device)
+
+ if batch_size == -1:
+ batch_size = len(all_sens)
+
+ embeddings = []
+ with torch.no_grad():
+ for i in range(0, len(all_sens), batch_size):
+ batch_embedding = bert_encode(
+ model, padded_sens[i : i + batch_size], attention_mask=mask[i : i + batch_size], all_layers=all_layers,
+ )
+ embeddings.append(batch_embedding)
+ del batch_embedding
+
+ total_embedding = torch.cat(embeddings, dim=0)
+
+ return total_embedding, mask, padded_idf
+
+
+def greedy_cos_idf(ref_embedding, ref_masks, ref_idf, hyp_embedding, hyp_masks, hyp_idf, all_layers=False):
+ """
+ Compute greedy matching based on cosine similarity.
+
+ Args:
+ - :param: `ref_embedding` (torch.Tensor):
+ embeddings of reference sentences, BxKxd,
+ B: batch size, K: longest length, d: bert dimenison
+ - :param: `ref_lens` (list of int): list of reference sentence length.
+ - :param: `ref_masks` (torch.LongTensor): BxKxK, BERT attention mask for
+ reference sentences.
+ - :param: `ref_idf` (torch.Tensor): BxK, idf score of each word
+ piece in the reference setence
+ - :param: `hyp_embedding` (torch.Tensor):
+ embeddings of candidate sentences, BxKxd,
+ B: batch size, K: longest length, d: bert dimenison
+ - :param: `hyp_lens` (list of int): list of candidate sentence length.
+ - :param: `hyp_masks` (torch.LongTensor): BxKxK, BERT attention mask for
+ candidate sentences.
+ - :param: `hyp_idf` (torch.Tensor): BxK, idf score of each word
+ piece in the candidate setence
+ """
+ ref_embedding.div_(torch.norm(ref_embedding, dim=-1).unsqueeze(-1))
+ hyp_embedding.div_(torch.norm(hyp_embedding, dim=-1).unsqueeze(-1))
+
+ if all_layers:
+ B, _, L, D = hyp_embedding.size()
+ hyp_embedding = hyp_embedding.transpose(1, 2).transpose(0, 1).contiguous().view(L * B, hyp_embedding.size(1), D)
+ ref_embedding = ref_embedding.transpose(1, 2).transpose(0, 1).contiguous().view(L * B, ref_embedding.size(1), D)
+ batch_size = ref_embedding.size(0)
+ sim = torch.bmm(hyp_embedding, ref_embedding.transpose(1, 2))
+ masks = torch.bmm(hyp_masks.unsqueeze(2).float(), ref_masks.unsqueeze(1).float())
+ if all_layers:
+ masks = masks.unsqueeze(0).expand(L, -1, -1, -1).contiguous().view_as(sim)
+ else:
+ masks = masks.expand(batch_size, -1, -1).contiguous().view_as(sim)
+
+ masks = masks.float().to(sim.device)
+ sim = sim * masks
+
+ word_precision = sim.max(dim=2)[0]
+ word_recall = sim.max(dim=1)[0]
+
+ hyp_idf.div_(hyp_idf.sum(dim=1, keepdim=True))
+ ref_idf.div_(ref_idf.sum(dim=1, keepdim=True))
+ precision_scale = hyp_idf.to(word_precision.device)
+ recall_scale = ref_idf.to(word_recall.device)
+ if all_layers:
+ precision_scale = precision_scale.unsqueeze(0).expand(L, B, -1).contiguous().view_as(word_precision)
+ recall_scale = recall_scale.unsqueeze(0).expand(L, B, -1).contiguous().view_as(word_recall)
+ P = (word_precision * precision_scale).sum(dim=1)
+ R = (word_recall * recall_scale).sum(dim=1)
+ F = 2 * P * R / (P + R)
+
+ hyp_zero_mask = hyp_masks.sum(dim=1).eq(2)
+ ref_zero_mask = ref_masks.sum(dim=1).eq(2)
+
+ if all_layers:
+ P = P.view(L, B)
+ R = R.view(L, B)
+ F = F.view(L, B)
+
+ if torch.any(hyp_zero_mask):
+ print(
+ "Warning: Empty candidate sentence detected; setting raw BERTscores to 0.", file=sys.stderr,
+ )
+ P = P.masked_fill(hyp_zero_mask, 0.0)
+ R = R.masked_fill(hyp_zero_mask, 0.0)
+
+ if torch.any(ref_zero_mask):
+ print("Warning: Empty reference sentence detected; setting raw BERTScores to 0.", file=sys.stderr)
+ P = P.masked_fill(ref_zero_mask, 0.0)
+ R = R.masked_fill(ref_zero_mask, 0.0)
+
+ F = F.masked_fill(torch.isnan(F), 0.0)
+
+ return P, R, F
+
+
+def bert_cos_score_idf(
+ model, refs, hyps, tokenizer, idf_dict, verbose=False, batch_size=64, device="cuda:0", all_layers=False,
+):
+ """
+ Compute BERTScore.
+
+ Args:
+ - :param: `model` : a BERT model in `pytorch_pretrained_bert`
+ - :param: `refs` (list of str): reference sentences
+ - :param: `hyps` (list of str): candidate sentences
+ - :param: `tokenzier` : a BERT tokenizer corresponds to `model`
+ - :param: `idf_dict` : a dictionary mapping a word piece index to its
+ inverse document frequency
+ - :param: `verbose` (bool): turn on intermediate status update
+ - :param: `batch_size` (int): bert score processing batch size
+ - :param: `device` (str): device to use, e.g. 'cpu' or 'cuda'
+ """
+ preds = []
+
+ def dedup_and_sort(l):
+ return sorted(list(set(l)), key=lambda x: len(x.split(" ")), reverse=True)
+
+ sentences = dedup_and_sort(refs + hyps)
+ embs = []
+ iter_range = range(0, len(sentences), batch_size)
+ if verbose:
+ print("computing bert embedding.")
+ iter_range = tqdm(iter_range)
+ stats_dict = dict()
+ for batch_start in iter_range:
+ sen_batch = sentences[batch_start : batch_start + batch_size]
+ embs, masks, padded_idf = get_bert_embedding(
+ sen_batch, model, tokenizer, idf_dict, device=device, all_layers=all_layers
+ )
+ embs = embs.cpu()
+ masks = masks.cpu()
+ padded_idf = padded_idf.cpu()
+ for i, sen in enumerate(sen_batch):
+ sequence_len = masks[i].sum().item()
+ emb = embs[i, :sequence_len]
+ idf = padded_idf[i, :sequence_len]
+ stats_dict[sen] = (emb, idf)
+
+ def pad_batch_stats(sen_batch, stats_dict, device):
+ stats = [stats_dict[s] for s in sen_batch]
+ emb, idf = zip(*stats)
+ emb = [e.to(device) for e in emb]
+ idf = [i.to(device) for i in idf]
+ lens = [e.size(0) for e in emb]
+ emb_pad = pad_sequence(emb, batch_first=True, padding_value=2.0)
+ idf_pad = pad_sequence(idf, batch_first=True)
+
+ def length_to_mask(lens):
+ lens = torch.tensor(lens, dtype=torch.long)
+ max_len = max(lens)
+ base = torch.arange(max_len, dtype=torch.long).expand(len(lens), max_len)
+ return base < lens.unsqueeze(1)
+
+ pad_mask = length_to_mask(lens).to(device)
+ return emb_pad, pad_mask, idf_pad
+
+ device = next(model.parameters()).device
+ iter_range = range(0, len(refs), batch_size)
+ if verbose:
+ print("computing greedy matching.")
+ iter_range = tqdm(iter_range)
+
+ with torch.no_grad():
+ for batch_start in iter_range:
+ batch_refs = refs[batch_start : batch_start + batch_size]
+ batch_hyps = hyps[batch_start : batch_start + batch_size]
+ ref_stats = pad_batch_stats(batch_refs, stats_dict, device)
+ hyp_stats = pad_batch_stats(batch_hyps, stats_dict, device)
+
+ P, R, F1 = greedy_cos_idf(*ref_stats, *hyp_stats, all_layers)
+ preds.append(torch.stack((P, R, F1), dim=-1).cpu())
+ preds = torch.cat(preds, dim=1 if all_layers else 0)
+ return preds
+
+
+def get_hash(model, num_layers, idf, rescale_with_baseline, use_custom_baseline, use_fast_tokenizer):
+ msg = "{}_L{}{}_version={}(hug_trans={})".format(
+ model, num_layers, "_idf" if idf else "_no-idf", __version__, trans_version
+ )
+ if rescale_with_baseline:
+ if use_custom_baseline:
+ msg += "-custom-rescaled"
+ else:
+ msg += "-rescaled"
+ if use_fast_tokenizer:
+ msg += "_fast-tokenizer"
+ return msg
+
+
+def cache_scibert(model_type, cache_folder="~/.cache/torch/transformers"):
+ if not model_type.startswith("scibert"):
+ return model_type
+
+ underscore_model_type = model_type.replace("-", "_")
+ cache_folder = os.path.abspath(os.path.expanduser(cache_folder))
+ filename = os.path.join(cache_folder, underscore_model_type)
+
+ # download SciBERT models
+ if not os.path.exists(filename):
+ cmd = f"mkdir -p {cache_folder}; cd {cache_folder};"
+ cmd += f"wget {SCIBERT_URL_DICT[model_type]}; tar -xvf {underscore_model_type}.tar;"
+ cmd += (
+ f"rm -f {underscore_model_type}.tar ; cd {underscore_model_type}; tar -zxvf weights.tar.gz; mv weights/* .;"
+ )
+ cmd += f"rm -f weights.tar.gz; rmdir weights; mv bert_config.json config.json;"
+ print(cmd)
+ print(f"downloading {model_type} model")
+ os.system(cmd)
+
+ # fix the missing files in scibert
+ json_file = os.path.join(filename, "special_tokens_map.json")
+ if not os.path.exists(json_file):
+ with open(json_file, "w") as f:
+ print(
+ '{"unk_token": "[UNK]", "sep_token": "[SEP]", "pad_token": "[PAD]", "cls_token": "[CLS]", "mask_token": "[MASK]"}',
+ file=f,
+ )
+
+ json_file = os.path.join(filename, "added_tokens.json")
+ if not os.path.exists(json_file):
+ with open(json_file, "w") as f:
+ print("{}", file=f)
+
+ if "uncased" in model_type:
+ json_file = os.path.join(filename, "tokenizer_config.json")
+ if not os.path.exists(json_file):
+ with open(json_file, "w") as f:
+ print('{"do_lower_case": true, "max_len": 512, "init_inputs": []}', file=f)
+
+ return filename
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/__init__.py b/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/score.py b/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/score.py
new file mode 100644
index 0000000..10e4abe
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/score.py
@@ -0,0 +1,86 @@
+#!/usr/bin/env python
+import os
+import argparse
+import torch
+
+import bert_score
+
+
+def main():
+ torch.multiprocessing.set_sharing_strategy("file_system")
+
+ parser = argparse.ArgumentParser("Calculate BERTScore")
+ parser.add_argument(
+ "--lang",
+ type=str,
+ default=None,
+ help='two-letter abbreviation of the language (e.g., en) or "en-sci" for scientific text',
+ )
+ parser.add_argument(
+ "-m", "--model", default=None, help="BERT model name (default: bert-base-uncased) or path to a pretrain model",
+ )
+ parser.add_argument("-l", "--num_layers", type=int, default=None, help="use first N layer in BERT (default: 8)")
+ parser.add_argument("-b", "--batch_size", type=int, default=64, help="batch size (default: 64)")
+ parser.add_argument("--nthreads", type=int, default=4, help="number of cpu workers (default: 4)")
+ parser.add_argument("--idf", action="store_true", help="BERT Score with IDF scaling")
+ parser.add_argument(
+ "--rescale_with_baseline", action="store_true", help="Rescaling the numerical score with precomputed baselines",
+ )
+ parser.add_argument("--baseline_path", default=None, type=str, help="path of custom baseline csv file")
+ parser.add_argument("--use_fast_tokenizer", action="store_false", help="whether to use HF fast tokenizer")
+ parser.add_argument("-s", "--seg_level", action="store_true", help="show individual score of each pair")
+ parser.add_argument("-v", "--verbose", action="store_true", help="increase output verbosity")
+ parser.add_argument("-r", "--ref", type=str, nargs="+", required=True, help="reference file path(s) or a string")
+ parser.add_argument(
+ "-c", "--cand", type=str, required=True, help="candidate (system outputs) file path or a string",
+ )
+
+ args = parser.parse_args()
+
+ if os.path.isfile(args.cand):
+ with open(args.cand) as f:
+ cands = [line.strip() for line in f]
+
+ refs = []
+ for ref_file in args.ref:
+ assert os.path.exists(ref_file), f"reference file {ref_file} doesn't exist"
+ with open(ref_file) as f:
+ curr_ref = [line.strip() for line in f]
+ assert len(curr_ref) == len(cands), f"# of sentences in {ref_file} doesn't match the # of candidates"
+ refs.append(curr_ref)
+ refs = list(zip(*refs))
+ elif os.path.isfile(args.ref[0]):
+ assert os.path.exists(args.cand), f"candidate file {args.cand} doesn't exist"
+ else:
+ cands = [args.cand]
+ refs = [args.ref]
+ assert not args.idf, "do not support idf mode for a single pair of sentences"
+
+ all_preds, hash_code = bert_score.score(
+ cands,
+ refs,
+ model_type=args.model,
+ num_layers=args.num_layers,
+ verbose=args.verbose,
+ idf=args.idf,
+ batch_size=args.batch_size,
+ lang=args.lang,
+ return_hash=True,
+ rescale_with_baseline=args.rescale_with_baseline,
+ baseline_path=args.baseline_path,
+ use_fast_tokenizer=args.use_fast_tokenizer,
+ )
+ avg_scores = [s.mean(dim=0) for s in all_preds]
+ P = avg_scores[0].cpu().item()
+ R = avg_scores[1].cpu().item()
+ F1 = avg_scores[2].cpu().item()
+ msg = hash_code + f" P: {P:.6f} R: {R:.6f} F1: {F1:.6f}"
+ print(msg)
+ if args.seg_level:
+ ps, rs, fs = all_preds
+ for p, r, f in zip(ps, rs, fs):
+ print("{:.6f}\t{:.6f}\t{:.6f}".format(p, r, f))
+
+
+if __name__ == "__main__":
+ main()
diff --git a/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/visualize.py b/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/visualize.py
new file mode 100644
index 0000000..dffa3c0
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/bert_score_cli/visualize.py
@@ -0,0 +1,44 @@
+#!/usr/bin/env python
+import os
+import time
+import argparse
+import torch
+from collections import defaultdict
+
+import bert_score
+
+
+def main():
+ torch.multiprocessing.set_sharing_strategy("file_system")
+
+ parser = argparse.ArgumentParser("Visualize BERTScore")
+ parser.add_argument("--lang", type=str, default="en", help="two-letter abbreviation of the language (e.g., en)")
+ parser.add_argument("-m", "--model", default=None, help="BERT model name (default: bert-base-uncased)")
+ parser.add_argument("-l", "--num_layers", type=int, default=None, help="use first N layer in BERT (default: 8)")
+ parser.add_argument("-v", "--verbose", action="store_true", help="increase output verbosity")
+ parser.add_argument("-r", "--ref", type=str, required=True, help="reference sentence")
+ parser.add_argument("-c", "--cand", type=str, required=True, help="candidate sentence")
+ parser.add_argument(
+ "-f", "--file", type=str, default="visualize.png", help="name of file to save output matrix in",
+ )
+ parser.add_argument(
+ "--rescale_with_baseline", action="store_true", help="Rescaling the numerical score with precomputed baselines",
+ )
+ parser.add_argument("--baseline_path", default=None, type=str, help="path of custom baseline csv file")
+
+ args = parser.parse_args()
+
+ bert_score.plot_example(
+ args.cand,
+ args.ref,
+ model_type=args.model,
+ lang=args.lang,
+ num_layers=args.num_layers,
+ fname=args.file,
+ rescale_with_baseline=args.rescale_with_baseline,
+ baseline_path=args.baseline_path,
+ )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/mitigating_bias/train/BERTScore/bert_score/example/Demo.ipynb b/mitigating_bias/train/BERTScore/bert_score/example/Demo.ipynb
new file mode 100644
index 0000000..9f7b652
--- /dev/null
+++ b/mitigating_bias/train/BERTScore/bert_score/example/Demo.ipynb
@@ -0,0 +1,612 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## BERTScore Tutorial"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Installation\n",
+ "if you have not installed `bert_score`, it is very easy\n",
+ "simply uncomment the line below to install through pip"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#!pip install bert_score"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'0.3.1'"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# check your installation\n",
+ "import bert_score\n",
+ "bert_score.__version__"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### preparation"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# hide the loading messages\n",
+ "import logging\n",
+ "import transformers\n",
+ "transformers.tokenization_utils.logger.setLevel(logging.ERROR)\n",
+ "transformers.configuration_utils.logger.setLevel(logging.ERROR)\n",
+ "transformers.modeling_utils.logger.setLevel(logging.ERROR)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "%matplotlib inline\n",
+ "import matplotlib.pyplot as plt\n",
+ "from matplotlib import rcParams\n",
+ "\n",
+ "rcParams[\"xtick.major.size\"] = 0\n",
+ "rcParams[\"xtick.minor.size\"] = 0\n",
+ "rcParams[\"ytick.major.size\"] = 0\n",
+ "rcParams[\"ytick.minor.size\"] = 0\n",
+ "\n",
+ "rcParams[\"axes.labelsize\"] = \"large\"\n",
+ "rcParams[\"axes.axisbelow\"] = True\n",
+ "rcParams[\"axes.grid\"] = True"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Function API"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We will first demonstrate how to use the `score` function in `bert_score`, which is what you need to evaluate a set of machine generated outputs."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from bert_score import score"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Inputs to `score` are a list of candidate sentences and a list of reference sentences. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "with open(\"hyps.txt\") as f:\n",
+ " cands = [line.strip() for line in f]\n",
+ "\n",
+ "with open(\"refs.txt\") as f:\n",
+ " refs = [line.strip() for line in f]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's have a look."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'28-year-old chef found dead in San Francisco mall'"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "cands[0]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We are now ready to call the score function. Besides candidates and references, we need to speicify the bert model we are using. Since we are dealing with English sentences, we will use the *bert-base-uncased* model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "calculating scores...\n",
+ "computing bert embedding.\n"
+ ]
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "b4e553330320447684f2ad1c02a674dc",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "computing greedy matching.\n"
+ ]
+ },
+ {
+ "data": {
+ "application/vnd.jupyter.widget-view+json": {
+ "model_id": "ee7f80b6121b42baad9bee7e717ea3c4",
+ "version_major": 2,
+ "version_minor": 0
+ },
+ "text/plain": [
+ "HBox(children=(FloatProgress(value=0.0, max=1.0), HTML(value='')))"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "done in 0.13 seconds, 74.39 sentences/sec\n"
+ ]
+ }
+ ],
+ "source": [
+ "P, R, F1 = score(cands, refs, lang='en', verbose=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "The outputs of the `score` function are Tensors of precision, recall, and F1 respectively. Each Tensor has the same number of items with the candidate and reference lists. Each item in the list is a scalar, representing the score for the corresponding candidates and references."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "tensor([0.9834, 0.9782, 0.9162, 0.9589, 0.9675, 0.9680, 0.9602, 0.9663, 0.9438,\n",
+ " 0.9508])"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "F1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can take the average of all candidate reference pairs to be the system level score."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "System level F1 score: 0.959\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f\"System level F1 score: {F1.mean():.3f}\")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "It might also be very interestig to see the distribution of BERTScore."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYgAAAEGCAYAAAB/+QKOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAZc0lEQVR4nO3df7xcdX3n8debhIDeixoMspiEJNTQJYiCS0NFfDChgsEV6A93SfyVuLhuq2jVdrtQ+wAbyj78sdXqgkAr2aiPNam14qYkJGTFKVZJmwQjmNBASKW517RZCaL3gkDCZ/8458LJ5HuTO3Nnzpnhvp+Pxzzume/3nDnvOXPu/dzzY85RRGBmZtboqKoDmJlZd3KBMDOzJBcIMzNLcoEwM7MkFwgzM0tygTAzs6TJVQdol2nTpsXs2bMrm//w8DB9fX2Vzb9VvZjbmcvTi7mduTlbtmz5SUSckOp7wRSI2bNns3nz5srmX6/XqdVqlc2/Vb2Y25nL04u5nbk5kh4Zrc+7mMzMLMkFwszMklwgzMwsyQXCzMySXCDMzCyplAIhaaakb0vaLmmbpN9NjCNJn5e0U9J9kl5X6Fsi6aH8saSMzGZmE11Zp7nuB34vIu6VdBywRdKGiNheGOdiYG7+OAe4CThH0vHAtcDZQOTTro6Ix0rKbmY2IZWyBREReyLi3nz458ADwPSG0S4DvhyZjcDLJJ0EvBnYEBH78qKwAVhYRm4zs4ms9C/KSZoNnAX8fUPXdGB34flA3jZa+0GGh4ep1+ttTNqcoaGhSuffql7M7czlqSr30nXDLU97w3nRc8u6W9ePUguEpH7gr4EPR8TP2vnafX19lX57she/vQm9mduZy1NZ7nVrWp60v7+/55Z1t64fpZ3FJOlosuLwvyPiG4lRBoGZhecz8rbR2s3MrIPKOotJwK3AAxHxmVFGWw28Oz+b6VeBxyNiD7AeuEjSVElTgYvyNjMz66CydjG9AXgXcL+krXnbHwInA0TEzcBa4C3ATuAJ4D153z5J1wGb8umWRcS+knKbmU1YpRSIiPg7QEcYJ4APjNK3HFjegWhmZjYKf5PazMySXCDMzCzJBcLMzJJcIMzMLMkFwszMklwgzMwsyQXCzMySXCDMzCzJBcLMzJJcIMzMLMkFwszMklwgzMwsyQXCzMySXCDMzCzJBcLMzJJcIMzMLKmUGwZJWg68FdgbEa9O9P9X4B2FTKcBJ+R3k/sR8HPgALA/Is4uI7OZ2URX1hbECmDhaJ0R8emIODMizgSuBv624baiC/J+Fwczs5KUUiAi4m5grPeRXgys7GAcMzMbg646BiHpxWRbGn9daA7gTklbJL2vmmRmZhNPKccgmnAJ8N2G3UvnRcSgpFcAGyT9Y75FcpDh4WHq9XpZOQ8xNDRU6fxb1Yu5nbk8vZjbmdun2wrEIhp2L0XEYP5zr6TbgPnAIQWir6+PWq1WRsaker1e6fxb1Yu5nbk8leVet6blSfv7+3tuWXfr+tE1u5gkvRQ4H/g/hbY+SceNDAMXAT+sJqGZ2cRS1mmuK4EaME3SAHAtcDRARNycj/YbwJ0RMVyY9ETgNkkjWb8aEevKyGxmNtGVUiAiYvEYxllBdjpssW0X8NrOpDIzs8Ppml1MZmbWXVwgzMwsyQXCzMySXCDMzCzJBcLMzJJcIMzMLMkFwszMklwgzMwsyQXCzMySXCDMzCzJBcLMzJJcIMzMLMkFwszMklwgzMwsyQXCzMySXCDMzCzJBcLMzJJKKRCSlkvaKyl5P2lJNUmPS9qaP64p9C2UtEPSTklXlZHXzMzK24JYASw8wjjfiYgz88cyAEmTgBuBi4F5wGJJ8zqa1MzMgJIKRETcDexrYdL5wM6I2BURTwOrgMvaGs7MzJK66RjE6yX9QNIdkk7P26YDuwvjDORtZmbWYZOrDpC7F5gVEUOS3gJ8E5jbzAsMDw9Tr9c7kW1MhoaGKp1/q3oxtzOXpxdzO3P7dEWBiIifFYbXSvqCpGnAIDCzMOqMvO0QfX191Gq1juY8nHq9Xun8W9WLuZ25PJXlXrem5Un7+/t7bll36/rRFbuYJP0bScqH55PlehTYBMyVNEfSFGARsLq6pGZmE0cpWxCSVgI1YJqkAeBa4GiAiLgZeBvwO5L2A08CiyIigP2SrgTWA5OA5RGxrYzMZmYTXSkFIiIWH6H/BuCGUfrWAms7kcvMzEbXFbuYzMys+7hAmJlZkguEmZkluUCYmVmSC4SZmSW5QJiZWZILhJmZJblAmJlZkguEmZkluUCYmVmSC4SZmSW5QJiZWZILhJmZJblAmJlZkguEmZkluUCYmVmSC4SZmSWVUiAkLZe0V9IPR+l/h6T7JN0v6XuSXlvo+1HevlXS5jLymplZeVsQK4CFh+n/J+D8iDgDuA7484b+BRFxZkSc3aF8ZmbWoKx7Ut8tafZh+r9XeLoRmNHpTGZmdnjdeAziCuCOwvMA7pS0RdL7KspkZjbhKCLKmVG2BXF7RLz6MOMsAL4AnBcRj+Zt0yNiUNIrgA3AByPi7sZpTzvttLjppps6kn0shoaG6O/vr2z+rerF3M5cnqpyL1033PK0N5wXPbesq1w/FixYsGW03fel7GIaC0mvAb4IXDxSHAAiYjD/uVfSbcB84JAC0dfXR61WKyntoer1eqXzb1Uv5nbm8lSWe92alift7+/vuWXdretHV+xiknQy8A3gXRHxYKG9T9JxI8PARUDyTCgzM2uvMW9BSDoBeDIihiRNAt4NPAt8JSKePcK0K4EaME3SAHAtcDRARNwMXAO8HPiCJID9+SbPicBtedtk4KsRsa6pd2hmZi1pZhfT7cBvA98HrgcuAZ4BzgQ+crgJI2LxEfrfC7w30b4LeO2hU5iZWac1UyBOBbbmw+8EzgWGgG0coUCYmVnvaaZAHACmSDoVeDwi/lnSUUBvnS5gZmZj0kyBWAd8jexYwaq8bR4w2O5QZmZWvWYKxBXAErLjDl/O26YBH29zJjMz6wLNFIgPRsT/KDZERF3SR9ucyczMukAz34O4ZpT2P2pHEDMz6y5H3IKQdEE+OCm/FIYK3acAP+9EMDMzq9ZYdjHdmv88FlheaA/gX4APtjuUmZlV74gFIiLmAEj6ckS8u/ORzMysG4z5IHWxOOTffyj2HfZSG2Zm1nvGfJBa0usk3SNpmOxU12eA/flPMzN7gWnmNNcvAX8D/Cfgic7EMTOzbtFMgZgFfCzKusOQmZlVqpnvQdxGdj8GMzObAJrZgjiW7N4Mf0d2eutzfHaTmdkLTzMFYnv+MDOzCaCZ01z/uJNBzMysuzRzy9ELRuuLiLvGMP1y4K3A3oh4daJfwOeAt5CdJbU0Iu7N+5bw/DWf/iQivjTW3GZm1ppmdjHd2vD8BGAKMEB2TaYjWQHcwPOXCm90MTA3f5wD3AScI+l4sntYn012eY8tklZHxGNNZDczsyY1s4tpTvG5pElk/9WP6WJ9EXG3pNmHGeUy4Mv5abQbJb1M0klADdgQEfvy+W4AFgIrx5rdzMya18xprgeJiAPA9cAftCnLdGB34flA3jZau5mZdVAzu5hSLgS64jpMw8PD1Ov1lqZdum54XPNesbCPoaGhludfpV7M3Wrm8XzOKxb2tTztc/Ndt6a6eY9Hi7mrUtU6Pe5lPY7lPJ515HCaOUi9m+wYwIgXk3034v1tyjIIzCw8n5G3DZLtZiq21xsn7uvro1arNTaPzTh/AWq1GvV6vfX5V6gXc7eceRyf87iWURvWr6rm3Yv6+/urWacrXNader/NbEG8s+H5MPBgRPysTVlWA1dKWkV2kPrxiNgjaT3w3yVNzce7CLi6TfM0M7NRNHOQ+m/huUt9nwj8azOX+Za0kmxLYJqkAbIzk47OX/tmYC3ZKa47yU5zfU/et0/SdcCm/KWWjRywNjOzzmlmF9NxwI3A5WR/2J/J/9v/UEQ8fqTpI2LxEfoD+MAofcs5+G52ZmbWYc2cxfQ/gT7gDOBF+c8XA5/vQC4zM6tYM8cgFgKnRMTIvSAelPQe4OH2xzIzs6o1swXxC7JvTxdNA55qXxwzM+sWzWxBfBHYIOkzwCNkNxD6CPAXnQhmZmbVaqZAXE/2nYR3AK8Efgx8KiIar9FkZmYvAM3sYvocsCMi3hQR8yLiTcADkv6sQ9nMzKxCzRSIxcDmhrYtwNvbF8fMzLpFMwUigEkNbZOafA0zM+sRzfxx/w5wXf5N6pFvVH88bzczsxeYZg5S/y5wO7BH0iPAycAe4JJOBDMzs2o1cy2mAUmvA+aTXXV1N/APzVyPyczMekdT94PIi8HG/GFmZi9gPsBsZmZJLhBmZpbkAmFmZkkuEGZmluQCYWZmSaUVCEkLJe2QtFPSVYn+z0ramj8elPTTQt+BQt/qsjKbmU1kTZ3m2ipJk8huV3ohMABskrQ6IraPjBMRHymM/0HgrMJLPBkRZ5aR1czMMmVtQcwHdkbEroh4GlgFXHaY8RcDK0tJZmZmSWUViOlk37weMZC3HULSLGAOcFeh+VhJmyVtlPTrnYtpZmYjStnF1KRFwNcj4kChbVZEDEo6BbhL0v0RcdC9sIeHh6nX62XmfE69XmdoaKiy+Y9HL+auInOVy6jXPp+q9eI6PV6der9lFYhBsus3jZiRt6UsAj5QbIiIwfznLkl1suMTBxWIvr4+arVaa+nWrWltulytVqNer7c+/wr1Yu6WM4/jcx7XMmrD+lXVvHtRf39/Net0hcu6U++3rF1Mm4C5kuZImkJWBA45G0nSvwWmAvcU2qZKOiYfnga8AdjeOK2ZmbVXKVsQEbFf0pXAerKbDC2PiG2SlgGbI2KkWCwCVkVEFCY/DbhF0rNkBe0TxbOfzMysM0o7BhERa4G1DW3XNDz/eGK67wFndDScmZkdwt+kNjOzJBcIMzNLcoEwM7MkFwgzM0tygTAzsyQXCDMzS3KBMDOzJBcIMzNLcoEwM7MkFwgzM0tygTAzsyQXCDMzS3KBMDOzJBcIMzNLcoEwM7MkFwgzM0tygTAzs6TSCoSkhZJ2SNop6apE/1JJ/0/S1vzx3kLfEkkP5Y8lZWU2M5vISrnlqKRJwI3AhcAAsEnS6sS9pf8yIq5smPZ44FrgbCCALfm0j5UQ3cxswiprC2I+sDMidkXE08Aq4LIxTvtmYENE7MuLwgZgYYdymplZrqwCMR3YXXg+kLc1+i1J90n6uqSZTU5rZmZtVMoupjH6G2BlRDwl6b8AXwIuGOvEw8PD1Ov1TmU7rHq9ztDQUGXzH49ezF1F5iqXUa99PlXrxXV6vDr1fssqEIPAzMLzGXnbcyLi0cLTLwKfKkxba5i23jiDvr4+arVaY/PYrFvT2nS5Wq1GvV5vff4V6sXcLWcex+c8rmXUhvWrqnn3ov7+/mrW6QqXdafeb1m7mDYBcyXNkTQFWASsLo4g6aTC00uBB/Lh9cBFkqZKmgpclLeZmVkHlbIFERH7JV1J9od9ErA8IrZJWgZsjojVwIckXQrsB/YBS/Np90m6jqzIACyLiH1l5DYzm8hKOwYREWuBtQ1t1xSGrwauHmXa5cDyjgY0M7OD+JvUZmaW5AJhZmZJLhBmZpbkAmFmZkkuEGZmluQCYWZmSS4QZmaW5AJhZmZJLhBmZpbkAmFmZkkuEGZmluQCYWZmSS4QZmaW5AJhZmZJLhBmZpbkAmFmZkkuEGZmllRagZC0UNIOSTslXZXo/6ik7ZLuk/QtSbMKfQckbc0fqxunNTOz9ivllqOSJgE3AhcCA8AmSasjYnthtO8DZ0fEE5J+B/gUcHne92REnFlGVjMzy5S1BTEf2BkRuyLiaWAVcFlxhIj4dkQ8kT/dCMwoKZuZmSWUsgUBTAd2F54PAOccZvwrgDsKz4+VtBnYD3wiIr7ZOMHw8DD1er0NUZtXr9cZGhqqbP7j0Yu5q8hc5TLqtc+nar24To9Xp95vWQVizCS9EzgbOL/QPCsiBiWdAtwl6f6IeLg4XV9fH7VarbWZrlvTYtpMrVajXq+3Pv8K9WLuljOP43Me1zJqw/pV1bx7UX9/fzXrdIXLulPvt6xdTIPAzMLzGXnbQSS9CfgYcGlEPDXSHhGD+c9dQB04q5NhzcysvAKxCZgraY6kKcAi4KCzkSSdBdxCVhz2FtqnSjomH54GvAEoHtw2M7MOKGUXU0Tsl3QlsB6YBCyPiG2SlgGbI2I18GmgH/grSQD/HBGXAqcBt0h6lqygfaLh7CczM+uA0o5BRMRaYG1D2zWF4TeNMt33gDM6m87MzBr5m9RmZpbkAmFmZkkuEGZmluQCYWZmSS4QZmaW5AJhZmZJLhBmZpbkAmFmZkkuEGZmluQCYWZmSS4QZmaW5AJhZmZJLhBmZpbkAmFmZkkuEGZmluQCYWZmSS4QZmaWVFqBkLRQ0g5JOyVdleg/RtJf5v1/L2l2oe/qvH2HpDeXldnMbCIrpUBImgTcCFwMzAMWS5rXMNoVwGMR8Srgs8An82nnAYuA04GFwBfy1zMzsw4qawtiPrAzInZFxNPAKuCyhnEuA76UD38d+DVJyttXRcRTEfFPwM789czMrIMmlzSf6cDuwvMB4JzRxomI/ZIeB16et29smHZ64wy2bNnyE0mPtDP0WOmTVczVylTlZ+z1qzkLJuDyGuc6Mmu0jrIKRMdFxAlVZzAzeyEpaxfTIDCz8HxG3pYcR9Jk4KXAo2Oc1szM2qysArEJmCtpjqQpZAedVzeMsxpYkg+/DbgrIiJvX5Sf5TQHmAv8Q0m5zcwmrFIKRETsB64E1gMPAF+LiG2Slkm6NB/tVuDlknYCHwWuyqfdBnwN2A6sAz4QEQfKyD1iDKfozpL0LUn3SapLmpG3nynpHknb8r7LeyDzLEn3Stqa5/7tsjKPJ3eh/yWSBiTd0AuZJR3Il/VWSY3/NHVr5pMl3SnpAUnbi6ekd2tuSQsKy3mrpF9I+vVuzpz3fSr/PXxA0ufzE3fKExF+HOYBTAIeBk4BpgA/AOY1jPNXwJJ8+ALgK/nwqcDcfPiVwB7gZV2eeQpwTD7cD/wIeGW3L+tC/+eArwI39EJmYKiMnG3OXAcuLKwjL+6F3IVxjgf2lZF7nL+L5wLfzV9jEnAPUCtzXfE3qY9sLKfozgPuyoe/PdIfEQ9GxEP58I+BvUAZB9PHk/npiHgqbz+Gcr9t33JuAEn/DjgRuLOErCPGlbkiLWfOv5c0OSI2AETEUEQ8UU7sti3rtwF3lJR7PJkDOJb8nzbgaOBfO564wAXiyFKn6DaeZvsD4Dfz4d8AjpP08uIIkuaTfdAPdyhn0bgyS5op6b78NT6ZF7cytJxb0lHAnwK/3/GUBxvv+nGspM2SNpa1y4PxZT4V+Kmkb0j6vqRPq7wvrrbld5HsGOjKjiQ8VMuZI+IesoKxJ3+sj4gHOpz3IC4Q7fH7wPmSvg+cT3aW1XPHSSSdBHwFeE9EPFtNxEOMmjkidkfEa4BXAUsknVhdzEOMlvv9wNqIGKgy3CgOt37MioizgbcDfybplyrK2Gi0zJOBN+b9v0K262RpRRlTxvK7eAbZ8dBukcws6VXAaWRnbk4HLpD0xjKDvWC+B9FBRzzNNv8P+zcBJPUDvxURP82fvwRYA3wsIopf+OukcWUujiPph2R/EL7e0cSZlnNLej3wRknvJ9svPkXSUEQcclCwWzLnfYP5z12S6sBZdH4rczzLeQDYGhG78r5vAr9KdpJJp7Vjvf6PwG0R8UyHs44Yz7L+z8DGiBjK++4AXg98p4zgI+H8OPxBpsnALmAOzx9kOr1hnGnAUfnw9cCyfHgK8C3gwz2UeQbwonx4KvAgcEa3524YZynlHaQez7KeyvMnBEwDHqLhAGYXZp6Uj39C/vx/kZ1Z2NXLutC/EVhQRt42LOvLgf+bv8bR+d+SS8rKHhEuEGP8kN+S/6F8mGxLAGAZcGk+/Lb8l/tB4IuFX/p3As8AWwuPM7s884XAffmKfB/wvl5Y1g2vsZSSCsQ4l/W5wP35sr4fuKLbMzesI/cDK4ApPZJ7Ntl/70f1wjpNVoxvIftqwHbgM2XmjgiUBzEzMzuID1KbmVmSC4SZmSW5QJiZWZILhJmZJblAmJlZkguEmZkluUCYmVmSC4RZxZTx76J1Ha+UZmMk6b9JGpT08/wGML8maZKkP5T0cN6+RdLIrXPPlbRJ0uP5z3MLr1WXdL2k7wJPAKdIeqmkWyXtyefzJyVeKdXsEL5Yn9kYSPplsrsi/kpkFzGcTXYphI8Ci3n+cgqvAZ6QdDzZRRo/RHZp6f8ArJH0qoh4NH/ZdwEXAzsAkd05cS/ZVXT7gNvJLhV9Swlv0ewQ3oIwG5sDZDdtmSfp6Ij4UUQ8DLwX+KOI2BGZH+QF4N8DD0XEVyJif0SsBP4RuKTwmisiYltkt+Q9nqzIfDgihiNiL/BZsnsXmFXCWxBmYxAROyV9GPg4cLqk9WRbDzNJX577lcAjDW2PcPDNYoo3kplFdsXOPYXbDh/VMI5ZqbwFYTZGEfHViDiP7I95AJ8k+wOeusnPj/Pxik7m4HsBFK+UuRt4CpgWES/LHy+JiNPb9gbMmuQCYTYGkn5Z0gWSjgF+ATwJPEt2eebrJM3Nz0Z6TX6Ly7XAqZLeLmmypMvJ7j18e+r1I2IP2b20/1TSSyQdJemXJJ1fyhs0S3CBMBubY4BPAD8B/gV4BXA18Bmyg8t3Aj8ju7Pai/LjEG8Ffg94FPgD4K0R8ZPDzOPdZDeV2Q48RnYXv5M68WbMxsL3gzAzsyRvQZiZWZILhJmZJblAmJlZkguEmZkluUCYmVmSC4SZmSW5QJiZWZILhJmZJblAmJlZ0v8H1wmBLUt5KWYAAAAASUVORK5CYII=\n",
+ "text/plain": [
+ "