Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading word2vec model cannot be done with a reasonable memory capacity #239

Closed
SMMousaviSP opened this issue Aug 21, 2021 · 2 comments
Closed

Comments

@SMMousaviSP
Copy link

Hi,
I wanted to do augmentation based on word2vec similarity so I downloaded the word2vec model as said in the README file:

from nlpaug.util.file.download import DownloadUtil

DownloadUtil.download_word2vec(dest_dir='.') # Download word2vec model

A zip file was downloaded and I extracted it, then when I tried to load it with the code below it took too long and crashed because of not enough memory. I also tried to do this on Google Colab which gives me 12 GB of memory, but didn't work for the same reason.

import nlpaug.augmenter.word as naw

text = "Sample text to test augmentation"
aug = naw.WordEmbsAug(
    model_type='word2vec', model_path='GoogleNews-vectors-negative300.bin',
    action="substitute")
augmented_text = aug.augment(text)
print("Original:")
print(text)
print("Augmented Text:")
print(augmented_text)

The .bin file is 3.5 GB, why it's not working even with 12 GB of memory?

@SMMousaviSP SMMousaviSP changed the title Loading word2vec model can not been done with a reasonable memory capacity Loading word2vec model cannot be done with a reasonable memory capacity Aug 21, 2021
@makcedward
Copy link
Owner

Using gensim package to load files. Impoved loading speed and memory consumption. You may retry by getting the latest dev version (pip install gensim git+https://github.com/makcedward/nlpaug.git)

@makcedward
Copy link
Owner

Enhanced in 1.1.8 version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants