-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
The dataset librispeech_asr (standard Librispeech) fails to load.
Steps to reproduce the bug
datasets.load_dataset("librispeech_asr")
Expected results
It should download and prepare the whole dataset (all subsets).
In the doc, it says it has two configurations (clean and other).
However, the dataset doc says that not specifying split
should just load the whole dataset, which is what I want.
Also, in case of this specific dataset, this is also the standard what the community uses. When you look at any publications with results on Librispeech, they always use the whole train dataset for training.
Actual results
...
File "/home/az/.cache/huggingface/modules/datasets_modules/datasets/librispeech_asr/1f4602f6b5fed8d3ab3e3382783173f2e12d9877e98775e34d7780881175096c/librispeech_asr.py", line 119, in LibrispeechASR._split_generators
line: archive_path = dl_manager.download(_DL_URLS[self.config.name])
locals:
archive_path = <not found>
dl_manager = <local> <datasets.utils.download_manager.DownloadManager object at 0x7fc07b426160>
dl_manager.download = <local> <bound method DownloadManager.download of <datasets.utils.download_manager.DownloadManager object at 0x7fc07b426160>>
_DL_URLS = <global> {'clean': {'dev': 'http://www.openslr.org/resources/12/dev-clean.tar.gz', 'test': 'http://www.openslr.org/resources/12/test-clean.tar.gz', 'train.100': 'http://www.openslr.org/resources/12/train-clean-100.tar.gz', 'train.360': 'http://www.openslr.org/resources/12/train-clean-360.tar.gz'}, 'other'...
self = <local> <datasets_modules.datasets.librispeech_asr.1f4602f6b5fed8d3ab3e3382783173f2e12d9877e98775e34d7780881175096c.librispeech_asr.LibrispeechASR object at 0x7fc12a633310>
self.config = <local> BuilderConfig(name='default', version=0.0.0, data_dir='/home/az/i6/setups/2022-03-20--sis/work/i6_core/datasets/huggingface/DownloadAndPrepareHuggingFaceDatasetJob.TV6Nwm6dFReF/output/data_dir', data_files=None, description=None)
self.config.name = <local> 'default', len = 7
KeyError: 'default'
Environment info
datasets
version: 2.1.0- Platform: Linux-5.4.0-107-generic-x86_64-with-glibc2.31
- Python version: 3.9.9
- PyArrow version: 6.0.1
- Pandas version: 1.4.2
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working