Poor performance and poor results

I'm trying to fine tune BERT on STS-B dataset. 

I used the following [notebook](https://colab.research.google.com/drive/1162FvpuCpmkudylOC3m8Llc2CGdjL8Rl) to fine tune it using BERT-keras.
(As described in the paper, I just added a classification layer using the CLS token of the output of BERT).

However, there is great differences in performance and results between this notebook and the script used in the official version for fine tuning :

|  | BERT-keras | Official BERT |
| --- | --- | --- |
| Pearson | 0.0254 | 0.8956 |
| Spearman | 0.0289 | 0.7942 |
| MSE | 2.2691 | 0.5456 |
| Training time | 9h | 10min |

_Note : Pearson / Spearman and correlation metrics used to evaluate the accuracy on the STS-B dataset_

---

**Why there is such a difference between the 2 approach ?**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance and poor results #15

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	BERT-keras	Official BERT
Pearson	0.0254	0.8956
Spearman	0.0289	0.7942
MSE	2.2691	0.5456
Training time	9h	10min