Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guide for NER Augmentation #19

Open
DecentMakeover opened this issue Aug 8, 2019 · 8 comments
Open

Guide for NER Augmentation #19

DecentMakeover opened this issue Aug 8, 2019 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@DecentMakeover
Copy link

Thanks for sharing your work, i could not find Any NLP Augmentation library other than this.

Will this Library help in augmenting NER data?

My data looks like this

Ryan B-PER
Dsouza B-PER
/DOB O
11/11/1997 B-DOB
/MALE O
22 B-NUM
56565 B-NUM

Thanks in advance

@makcedward
Copy link
Owner

This library does not support generate augmented data for NER problem yet.

I can enhance it if there are any research paper related this problem

@makcedward makcedward added the enhancement New feature or request label Aug 9, 2019
@DecentMakeover
Copy link
Author

DecentMakeover commented Aug 9, 2019 via email

@makcedward
Copy link
Owner

Thanks for your contribution.

Please share corresponding papers to me. So, I can check out whether it can be supported or not.

@makcedward makcedward added the help wanted Extra attention is needed label Aug 26, 2019
@Zylatis
Copy link

Zylatis commented Nov 10, 2019

I'm really interested in this as well as I am trying to do NER with a limited data set. I'm not aware of any papers looking at this specifically, but I think it might be interesting to combine it with a data generating DSL like Chattete (I actually asked about the problems nlpaug tackles in this issue!
SimGus/Chatette#25)

I think a useful first step might be to just make the substitutions tag-aware, so that you aren't going to do a substitution that changes the tag or something. Potentially you might also want a flag which just prevents substitutions on tagged (i.e. not 'O') words altogether.

This of course presumes the existence of a labelled, if small, dataset, which I think is totally reasonable. I think combining context-aware vector substitutions with a DSL language, and maybe some gazetter pipelines to streamline external inputs, could be really powerful, and a cool project to work on if anyone is interested!

@makcedward
Copy link
Owner

@Zylatis
Thank you for your input. DSL can be one of the solution for that. Will further design how can nlpaug support DSL.

Before that, you may consider to leverage "stopwords" attribute to simulate tag-aware behavior. You can change list of stopwords per augmentation.

import nlpaug.augmenter.word as naw
text = "Peter likes dogs"
aug = naw.ContextualWordEmbsAug()
aug.stopwords = ['Peter']
aug.augment(text)

@manishiitg
Copy link

Hi,

even i was looking for this. the above code snippet is helpful for sure.

but there is another use case in which we might want to substitute NER tag with another word.

is there any example for this?

@manishiitg
Copy link

manishiitg commented Jan 29, 2020

This is a simple custom NER augmenter which might help

https://gist.github.com/manishiitg/8fd4209fcb3c6cb08ed34705c1f32c86

@pratikchhapolika
Copy link

Hi @makcedward @manishiitg , any recent improvements to create NER synthetic data.

Original_text=`My name is Pratik. I live in India'

Augmented can be:

  1. `My name is Jon. I live in U.S.A'
  2. 'My name is Manish. I live in China`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants