-
-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guide for NER Augmentation #19
Comments
This library does not support generate augmented data for NER problem yet. I can enhance it if there are any research paper related this problem |
May be I can help , I have a custom data set for which I need to augmentations, may be you can include that in your library?
On 09-Aug-2019, at 10:18 PM, Edward Ma <notifications@github.com<mailto:notifications@github.com>> wrote:
This library does not support generate augmented data for NER problem yet.
I can enhance it if there are any research paper related this problem
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub<#19?email_source=notifications&email_token=AGD5QFYXJNSPIFNFQM3IJZ3QDWNWRA5CNFSM4IKIUBBKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD37GFOQ#issuecomment-519987898>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AGD5QF22EFZUNFBKJVIMYXDQDWNWRANCNFSM4IKIUBBA>.
|
Thanks for your contribution. Please share corresponding papers to me. So, I can check out whether it can be supported or not. |
I'm really interested in this as well as I am trying to do NER with a limited data set. I'm not aware of any papers looking at this specifically, but I think it might be interesting to combine it with a data generating DSL like Chattete (I actually asked about the problems nlpaug tackles in this issue! I think a useful first step might be to just make the substitutions tag-aware, so that you aren't going to do a substitution that changes the tag or something. Potentially you might also want a flag which just prevents substitutions on tagged (i.e. not 'O') words altogether. This of course presumes the existence of a labelled, if small, dataset, which I think is totally reasonable. I think combining context-aware vector substitutions with a DSL language, and maybe some gazetter pipelines to streamline external inputs, could be really powerful, and a cool project to work on if anyone is interested! |
@Zylatis Before that, you may consider to leverage "stopwords" attribute to simulate tag-aware behavior. You can change list of stopwords per augmentation.
|
Hi, even i was looking for this. the above code snippet is helpful for sure. but there is another use case in which we might want to substitute NER tag with another word. is there any example for this? |
This is a simple custom NER augmenter which might help https://gist.github.com/manishiitg/8fd4209fcb3c6cb08ed34705c1f32c86 |
Hi @makcedward @manishiitg , any recent improvements to create Original_text=`My name is Pratik. I live in India' Augmented can be:
|
Thanks for sharing your work, i could not find Any NLP Augmentation library other than this.
Will this Library help in augmenting NER data?
My data looks like this
Thanks in advance
The text was updated successfully, but these errors were encountered: