Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test set #7

Open
DishaJindal opened this issue May 17, 2020 · 1 comment
Open

Test set #7

DishaJindal opened this issue May 17, 2020 · 1 comment

Comments

@DishaJindal
Copy link

DishaJindal commented May 17, 2020

Hi, Thanks for sharing the repo and the dataset. Would it be possible to share the document ids of the documents in the test split ("nyt.test.h5df") of the NYT dataset? 

@kgarg8
Copy link

kgarg8 commented Mar 2, 2021

Not sure what you mean by document ids!!

Here's a sample script to read the h5df file

import h5py
import json
filename = "../data/NYT/nyt.test.h5df"

with h5py.File(filename, "r") as f:
    a_group_key = list(f.keys())[0]
    data = list(f[a_group_key])

res = json.loads(data[0])

Do res.keys() to see the keys and then use you can extract data on the terminal in the following way:

res['article'][0]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants