Modifying train_cbow_pair

Please excuse me for asking this question here since it's not really actual _issue_ regarding gensim.

---

**TL;DR:**

I'd like to know how I can get to the word vectors _before_ they are getting propagated in order to apply transformations on them while training paragraph/document vectors.

---

What I'm trying to do is make a modification to `train_cbow_pair` in `gensim.models.Word2Vec`. However, I struggle a bit to understand what's exactly happening there.

I get, that `l1` is ist the sum of the current context window of words plus the sum of document-tag vectors that is passed to `train_cbow_pair`.

``` python
def train_cbow_pair(model, word, input_word_indices, l1, alpha, learn_vectors=True, learn_hidden=True):
    neu1e = np.zeros(l1.shape)

    if model.hs:
        l2a = model.syn1[word.point]  # 2d matrix, codelen x layer1_size
        fa = 1. / (1. + np.exp(-np.dot(l1, l2a.T)))  # propagate hidden -> output
        ga = (1. - word.code - fa) * alpha  # vector of error gradients multiplied by the learning rate
        if learn_hidden:
            model.syn1[word.point] += np.outer(ga, l1)  # learn hidden -> output
        neu1e += np.dot(ga, l2a)  

    # ...
```

Here I'm not sure what I'm looking at. In particular I struggle to understand the line

``` python
l2a = model.syn1[word.point] 
```

I don't know what `word.point` is describing here and why thisinput is getting propagated.

Does this provide the word vectors for activating the hidden layer - which appears to be `fa` in that case? But this can't actually be the case since `word` is actually just the current word of the context window if I get that right:

``` python
# train_document_dm calling train_cbow_pair
def train_document_dm(model, doc_words, doctag_indexes, alpha, work=None, neu1=None,
                      learn_doctags=True, learn_words=True, learn_hidden=True,
                      word_vectors=None, word_locks=None, doctag_vectors=None, doctag_locks=None):
    # ...
    for pos, word in enumerate(word_vocabs):
        # ...
        neu1e = train_cbow_pair(model, word, word2_indexes, l1, alpha,
                                learn_vectors=False, learn_hidden=learn_hidden)
        # ...
```

So what I'd like to know is actually how I can get to the word vectors _before_ they are getting propagated in order to apply transformations on them beforehand.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying train_cbow_pair #920

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development