Skip to content

papers100M preprocessing in gsampler-artifact-evaluation  #11

@ZHUZHU-Whu

Description

@ZHUZHU-Whu

Hello, I'm sorry to bother you. Just like the previous issue, I recently wanted to run a gsampler-artifact-evaluation experiment. I had a problem with the way papers100M is loaded in figure7/load_graph_utils.py, here's the code for the project:

def load_100Mpapers():
    train_id = torch.load("/home/ubuntu/dataset/ogbn_papers100M/papers100m_train_id.pt")
    splitted_idx = dict()
    splitted_idx['train']=train_id
    coo_matrix = sp.load_npz("/home/ubuntu/dataset/ogbn_papers100M/ogbn-papers100M_adj.npz")
    g = dgl.from_scipy(coo_matrix)
    g = dgl.remove_self_loop(g)
    g = dgl.add_self_loop(g)
    g=g.long()
    return g, None, None, None, splitted_idx

This loading looks more efficient than ogb. I use ogb to load papers100M on some devices, which is easy to cause OOM.Could you please provide more detailed papers100M loading preprocessing code? Thank you very much. Wish you a happy life and all the best

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions