Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task 2 : Types Evaluation #7

Open
lara-hdr opened this issue Mar 8, 2016 · 5 comments
Open

Task 2 : Types Evaluation #7

lara-hdr opened this issue Mar 8, 2016 · 5 comments

Comments

@lara-hdr
Copy link

lara-hdr commented Mar 8, 2016

For the second task, "the system answer will be counted correct as long as at least one of the possibility is returned".
I would like to know how will the types be evaluated if the system returns more than a type? Is there a penalty if one of the returned types does not match with one of the golden standard types?

Thank you,
Lara

@anuzzolese
Copy link
Owner

An entity can have multiple types in the goldstandard. A system should provide at least one correct type for an entity. However, all wrong types returned by a system for an entity will count as errors.

@rtroncy
Copy link

rtroncy commented Mar 8, 2016

Let's take a concrete example: let's imagine that an entity has been annotated with three types in the gold standard. Will the scorer provide a different result if a system A provide just 1 correct type (and no incorrect type) versus a system B provide the 3 correct types (+1 incorrect type)?

@MichaelRoeder
Copy link
Contributor

Hi all,

task 2 can be divided into 2 subtasks. The tasks are evaluated independently and the F1-score of the complete task is the average of the F1-scores of the single subtasks.

1. Find the string that describes the entity type
The example in the task description shows that it is not always possible to clearly judge whether a word (especially adjective) is part of the entity type or not. Thus, there are different possibilities which string could be marked. It is sufficient to mark one of these possibilities to get a true positive. However, no approach should mark more than one possible entity type per document.

2. Map the type that has been found to the subset of DOLCE+DnS Ultra Lite classes
For evaluating the entity typing subtask the hierarchical F-measure is used. It is important that all types mentioned in the gold standard are present. Missing types as well as additional types are counted as errors.
Regarding the concrete example, it is not easy to say which result the two answers will have because the hierarchical F-measure depends on the type hierarchy as well as on the positions of the single types inside the hierarchy.
However, since in the most cases the types will be leave nodes of the hierarchy, we can assume this for this example. In this case the hierarchical F-measure will perform like the normal F-measure. Thus, the first annotator would have 1 true positive and 2 false negative while the second annotator would have 3 true positives and 1 false positive.

@rtroncy
Copy link

rtroncy commented Mar 8, 2016

Thanks for the clarifications.

@lara-hdr
Copy link
Author

lara-hdr commented Mar 8, 2016

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants