Skip to content

Conversation

@gerritjandebruin
Copy link
Contributor

Hi Elior Cohen (and others),

This is my first pull request ever. Please help me a bit when I miss certain things.

A while ago we noticed that this Node2Vec implementation yields different results, even when run in one Python process. This problem is also mentioned by @marcovarrone in Issue #26.

In this pull request, we (@pereirabarataap and me) propose to add a simple parameter seed, which allows to set the random seed of both Numpy and the Python random module.

This issue fixes:

Future work:

  • Make the seed also work with multiple workers.
  • Make the seed work between different Python process.
    At this moment it does not, also not with a fixed PYTHONHASHSEED.

Kind regards,

Gerrit-Jan and António

This allows to get the exact same results in one process.
@eliorc
Copy link
Owner

eliorc commented May 20, 2020

If I understand you correctly - do you mean this works only for workers=1?

@gerritjandebruin
Copy link
Contributor Author

Yes. This is also the case for gensim (https://radimrehurek.com/gensim/models/word2vec.html).

Copy link
Owner

@eliorc eliorc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please view my comments

@eliorc
Copy link
Owner

eliorc commented Oct 10, 2020

Hi, please change the target of the request to the dev branch, I want to include some changes of my own @gerritjandebruin

@gerritjandebruin gerritjandebruin changed the base branch from master to dev October 24, 2020 16:45
Copy link
Contributor Author

@gerritjandebruin gerritjandebruin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two commits fix a uncorrelated typo and add additional explanation about the seed parameter.

The working of the seed parameter can be checked with the following example:

import random

import networkx as nx
from node2vec import Node2Vec
import numpy as np
import pandas as pd

random.seed(0)
np.random.seed(0)

G = nx.karate_club_graph()

def get_embedding():
    model = Node2Vec(G, dimensions=4, seed=0)
    embedding = model.fit()
    return pd.DataFrame(embedding.wv.vectors, index=embedding.wv.index2entity)

print('Embedding 1')
print(get_embedding().sort_index().head())

print('Embedding 2')
print(get_embedding().sort_index().head())

@eliorc eliorc merged commit eb30a74 into eliorc:dev Nov 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants