Skip to content

GH-2717: Add option to ignore labels in dataset #2718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 10, 2022

Conversation

alanakbik
Copy link
Collaborator

This PR adds the option to ignore selected labels in any dataset that inherits from ColumnCorpus (i.e. nearly all datasets in Flair).

You can use the label_name_map to achieve this, by mapping all labels you don't want to learn to 'O'. The following snippet shows how to rename and ignore WNUT 17 labels so that it looks like the NER classes from CoNLL-03:

from flair.datasets import WNUT_17

# load WNUT 17 corpus with all regular labels
corpus = WNUT_17(in_memory=False)
print(corpus.make_label_dictionary('ner'))

# load WNUT 17 but rename the label 'person' to 'PER', location to 'LOC' and both 'group' and 'corporation' to ORG
corpus = WNUT_17(in_memory=False, label_name_map={
    'person': 'PER',
    'location': 'LOC',
    'group': 'ORG',
    'corporation': 'ORG',
})
print(corpus.make_label_dictionary('ner'))

# load WNUT 17 and rename like above but also ignore all 'creative-work' and 'product' entities
corpus = WNUT_17(in_memory=False, label_name_map={
    'person': 'PER',
    'location': 'LOC',
    'group': 'ORG',
    'corporation': 'ORG',
    'product': 'O',
    'creative-work': 'O', # by renaming to 'O' this tag gets ignored
})
print(corpus.make_label_dictionary('ner'))

This should print:

Dictionary with 4 tags: <unk>, PER, LOC, ORG

@alanakbik alanakbik merged commit dfcd35f into master Apr 10, 2022
@alanakbik alanakbik deleted the GH-2717-ignore-labels branch April 10, 2022 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant