Task 2 : Types Evaluation #7

lara-hdr · 2016-03-08T01:43:10Z

For the second task, "the system answer will be counted correct as long as at least one of the possibility is returned".
I would like to know how will the types be evaluated if the system returns more than a type? Is there a penalty if one of the returned types does not match with one of the golden standard types?

Thank you,
Lara

anuzzolese · 2016-03-08T11:37:45Z

An entity can have multiple types in the goldstandard. A system should provide at least one correct type for an entity. However, all wrong types returned by a system for an entity will count as errors.

rtroncy · 2016-03-08T11:58:51Z

Let's take a concrete example: let's imagine that an entity has been annotated with three types in the gold standard. Will the scorer provide a different result if a system A provide just 1 correct type (and no incorrect type) versus a system B provide the 3 correct types (+1 incorrect type)?

MichaelRoeder · 2016-03-08T19:44:20Z

Hi all,

task 2 can be divided into 2 subtasks. The tasks are evaluated independently and the F1-score of the complete task is the average of the F1-scores of the single subtasks.

1. Find the string that describes the entity type
The example in the task description shows that it is not always possible to clearly judge whether a word (especially adjective) is part of the entity type or not. Thus, there are different possibilities which string could be marked. It is sufficient to mark one of these possibilities to get a true positive. However, no approach should mark more than one possible entity type per document.

2. Map the type that has been found to the subset of DOLCE+DnS Ultra Lite classes
For evaluating the entity typing subtask the hierarchical F-measure is used. It is important that all types mentioned in the gold standard are present. Missing types as well as additional types are counted as errors.
Regarding the concrete example, it is not easy to say which result the two answers will have because the hierarchical F-measure depends on the type hierarchy as well as on the positions of the single types inside the hierarchy.
However, since in the most cases the types will be leave nodes of the hierarchy, we can assume this for this example. In this case the hierarchical F-measure will perform like the normal F-measure. Thus, the first annotator would have 1 true positive and 2 false negative while the second annotator would have 3 true positives and 1 false positive.

rtroncy · 2016-03-08T20:34:50Z

Thanks for the clarifications.

lara-hdr · 2016-03-08T21:15:20Z

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task 2 : Types Evaluation #7

Task 2 : Types Evaluation #7

lara-hdr commented Mar 8, 2016

anuzzolese commented Mar 8, 2016

rtroncy commented Mar 8, 2016

MichaelRoeder commented Mar 8, 2016

rtroncy commented Mar 8, 2016

lara-hdr commented Mar 8, 2016

Task 2 : Types Evaluation #7

Task 2 : Types Evaluation #7

Comments

lara-hdr commented Mar 8, 2016

anuzzolese commented Mar 8, 2016

rtroncy commented Mar 8, 2016

MichaelRoeder commented Mar 8, 2016

rtroncy commented Mar 8, 2016

lara-hdr commented Mar 8, 2016