Skip to content

What Metric exactly is reported/computed for NER? #3

Closed
@Iwontbecreative

Description

@Iwontbecreative

Hi,

I am reading through the code and implementation and was wondering what exactly R/P/F1 relate to.
In particular:
Are the metrics at the token level or the entity level? Are you accounting only for exact matches or do you give some credit for partial matches as well?
Indeed, the paper is not very clear about what exactly those metrics refer to. I'm not very familiar with tensorflow but it seems to me that the code is computing token level metrics? Or is the text "detokenized" and grouped by entity before evaluation is run? Are predictions only run for the first token in a word like in the CONLL 2003 setup of BERT?

In particular, I noticed that your implementation is based on: https://github.com/kyzhouhzau/BERT-NER which mentions that the scores differ from normal evaluation (e.g: using conlleval.pl). Is this also the case for the numbers reported in the paper?

Thanks in advance for your answer!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions