Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seq2seq: Replace the embeddings with pre-trained word embeddings such as word2vec #1075

Open
Liranbz opened this issue Jul 16, 2020 · 3 comments
Labels
medium Text Issues relating to text tutorials

Comments

@Liranbz
Copy link

Liranbz commented Jul 16, 2020

Hi,
Thank you for your tutorial! I tried to change the embedding with pre-trained word embeddings such as word2vec, here is my code:

class Lang:
    def __init__(self, name):
        self.name = name
        self.word2index = {}
        self.word2count = {}
        self.index2word = {0: "SOS", 1: "EOS"}
        self.n_words = 2  # Count SOS and EOS

    def get_word2vec(self):
        word2vec = KeyedVectors.load_word2vec_format('Models/Word2Vec/wiki.he.vec')
        return word2vec
    
    def addSentence(self, sentence):
        for word in sentence.split(' '):
            self.addWord(word)

    def addWord(self, word):
        if word not in self.word2index:
            self.word2index[word] = self.get_word2vec[word]
            self.word2count[word] = 1
            self.index2word[self.n_words] = word
            self.n_words += 1
        else:
            self.word2count[word] += 1

the dimension size of this word2vec is 300 dimensions
Is I need to change other things in my Encoder?

Thank you!

@NarenInD
Copy link

NarenInD commented Jul 27, 2020

Yeah I'm trying to train with word2vec.
Word2vec can be either 100d, 200d, 300d vector i.e 1d array with 100 values for each word for 100d model

Can anyone help me where should I change the dimension values.
for eg: what values should be replaced in below lines:
self.embedding(input).view(1, 1, -1)
return torch.tensor(indexes, dtype=torch.long, device=device).view(-1, 1)

@Liranbz Did you get sorted out

@holly1238 holly1238 added the Text Issues relating to text tutorials label Jul 27, 2021
@ivrschool
Copy link

@NarenInD @Liranbz have you found the solution? I have been also looking for the same. Thank you.

@svekars svekars added medium docathon-h1-2023 A label for the docathon in H1 2023 labels May 31, 2023
@QasimKhan5x
Copy link
Contributor

QasimKhan5x commented Jun 4, 2023

torchtext currently supports pretrained GloVe, FastText, and CharNGram embeddings. Other embeddings can be loaded using torchtext.vocab.Vectors. If anyone is interested, I can edit the tutorial to show how you could use those.

@svekars svekars removed the docathon-h1-2023 A label for the docathon in H1 2023 label Oct 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium Text Issues relating to text tutorials
Projects
None yet
Development

No branches or pull requests

6 participants