Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-base datasets. #1275

Merged
merged 4 commits into from
Feb 9, 2019
Merged

Remove non-base datasets. #1275

merged 4 commits into from
Feb 9, 2019

Conversation

flauted
Copy link
Contributor

@flauted flauted commented Feb 7, 2019

The various [Datatype]Dataset's are just shells that define a sort_key static method. Since torchtext.data.Dataset only uses the sort_key attribute attached to the instance self (despite defining it at the class level) , we can just make the sort_key an instance attribute of the DatasetBase. Consequently DatasetBase is now a general Dataset. After this PR, the file dataset_base.py should be renamed to dataset.py.

@vince62s
Copy link
Member

vince62s commented Feb 8, 2019

looks ok but @bpopeters can you please review.

if hasattr(ex, "tgt"):
return len(ex.src[0]), len(ex.tgt[0])
return len(ex.src[0])
def text_sort_key(ex):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The style of defining sort_key as a static method of a dataset class was to match torchtext, where I think the general idea is people would define a different dataset class for each dataset (classes with names like IWSLT and WMT14, for example). But for the way we're doing it, with just the single Dataset class, I agree that this makes sense.

@flauted
Copy link
Contributor Author

flauted commented Feb 9, 2019

Is this waiting on anything?

@vince62s
Copy link
Member

vince62s commented Feb 9, 2019

I was about to merge it but Travis is weird.
My last PR was fine before merging, and now once committed it shows some fails.

@vince62s vince62s merged commit 3c0a4e5 into OpenNMT:master Feb 9, 2019
@vince62s
Copy link
Member

vince62s commented Feb 9, 2019

@flauted this breaks pre-existing pre-processed files, not a big deal but in case this could be fixed.

@francoishernandez

@flauted
Copy link
Contributor Author

flauted commented Feb 9, 2019

You bumped the major version so I'll let it be. But for what it's worth fixing it would I think require leaving stub AudioDataset, TextDataset, and ImageDataset (maybe with a sort_key method, I'm not sure). It would kinda defeat the purpose of this.

@vince62s
Copy link
Member

vince62s commented Feb 9, 2019

yeah that's what I figured out.

ItaySofer pushed a commit to ItaySofer/OpenNMT-py that referenced this pull request Mar 17, 2019
* Remove non-base datasets.
* Update Dataset documentation.
* Move helper methods out of Dataset and document them.
* Remove , replace w explicit constructor calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants