Remove non-base datasets. #1275

flauted · 2019-02-07T18:04:46Z

The various [Datatype]Dataset's are just shells that define a sort_key static method. Since torchtext.data.Dataset only uses the sort_key attribute attached to the instance self (despite defining it at the class level) , we can just make the sort_key an instance attribute of the DatasetBase. Consequently DatasetBase is now a general Dataset. After this PR, the file dataset_base.py should be renamed to dataset.py.

vince62s · 2019-02-08T07:26:06Z

looks ok but @bpopeters can you please review.

bpopeters · 2019-02-08T09:58:06Z

onmt/inputters/text_dataset.py

-        if hasattr(ex, "tgt"):
-            return len(ex.src[0]), len(ex.tgt[0])
-        return len(ex.src[0])
+def text_sort_key(ex):


The style of defining sort_key as a static method of a dataset class was to match torchtext, where I think the general idea is people would define a different dataset class for each dataset (classes with names like IWSLT and WMT14, for example). But for the way we're doing it, with just the single Dataset class, I agree that this makes sense.

onmt/inputters/inputter.py

flauted · 2019-02-09T11:31:34Z

Is this waiting on anything?

vince62s · 2019-02-09T11:39:01Z

I was about to merge it but Travis is weird.
My last PR was fine before merging, and now once committed it shows some fails.

vince62s · 2019-02-09T16:33:15Z

@flauted this breaks pre-existing pre-processed files, not a big deal but in case this could be fixed.

@francoishernandez

flauted · 2019-02-09T19:29:53Z

You bumped the major version so I'll let it be. But for what it's worth fixing it would I think require leaving stub AudioDataset, TextDataset, and ImageDataset (maybe with a sort_key method, I'm not sure). It would kinda defeat the purpose of this.

vince62s · 2019-02-09T19:44:48Z

yeah that's what I figured out.

* Remove non-base datasets. * Update Dataset documentation. * Move helper methods out of Dataset and document them. * Remove , replace w explicit constructor calls.

flauted added 3 commits February 7, 2019 12:59

Remove non-base datasets.

a0a0b4d

Update Dataset documentation.

0eb9c83

Move helper methods out of Dataset and document them.

d167a9e

bpopeters reviewed Feb 8, 2019

View reviewed changes

onmt/inputters/inputter.py Outdated Show resolved Hide resolved

Remove , replace w explicit constructor calls.

1c5c082

vince62s merged commit 3c0a4e5 into OpenNMT:master Feb 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove non-base datasets. #1275

Remove non-base datasets. #1275

flauted commented Feb 7, 2019

vince62s commented Feb 8, 2019

bpopeters Feb 8, 2019

flauted commented Feb 9, 2019

vince62s commented Feb 9, 2019

vince62s commented Feb 9, 2019

flauted commented Feb 9, 2019

vince62s commented Feb 9, 2019

Remove non-base datasets. #1275

Remove non-base datasets. #1275

Conversation

flauted commented Feb 7, 2019

vince62s commented Feb 8, 2019

bpopeters Feb 8, 2019

Choose a reason for hiding this comment

flauted commented Feb 9, 2019

vince62s commented Feb 9, 2019

vince62s commented Feb 9, 2019

flauted commented Feb 9, 2019

vince62s commented Feb 9, 2019