What Metric exactly is reported/computed for NER?

Hi,

I am reading through the code and implementation and was wondering what exactly R/P/F1 relate to.
In particular:
Are the metrics at the token level or the entity level? Are you accounting only for exact matches or do you give some credit for partial matches as well?
Indeed, the paper is not very clear about what exactly those metrics refer to. I'm not very familiar with tensorflow but it seems to me that the code is computing token level metrics?  Or is the text "detokenized" and grouped by entity before evaluation is run? Are predictions only run for the first token in a word like in the CONLL 2003 setup of BERT?

In particular, I noticed that your implementation is based on: https://github.com/kyzhouhzau/BERT-NER which mentions that the scores differ from normal evaluation (e.g: using conlleval.pl). Is this also the case for the numbers reported in the paper?

Thanks in advance for your answer!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What Metric exactly is reported/computed for NER? #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What Metric exactly is reported/computed for NER? #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions