Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doctests] Correct filenaming #16599

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
improve quicktour
  • Loading branch information
patrickvonplaten committed Apr 5, 2022
commit 125609c66b6225b6eb79642ee209734dfb9ac276
14 changes: 7 additions & 7 deletions docs/source/en/quicktour.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,23 +115,23 @@ Create a [`pipeline`] with the task you want to solve for and the model you want
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h")
```

Next, load a dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [SUPERB](https://huggingface.co/datasets/superb) dataset:
Next, load a dataset (see the 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) for more details) you'd like to iterate over. For example, let's load the [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14) dataset:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgugger @stevhliu - is this ok for you?

I would like to replace superb here with PolyAI/minds14 because Superb has 6GB of data which takes usually around 5 minutes at 10MB/s download speed to preprocess and download. PolyAI/minds14 is just ~400MB so it should only take around 40 seconds which makes the quicktour much nicer to follow IMO.

Ok for you?

Also nicer for the doc tests but that shouldn't be the main reason

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Completely fine by me!


```py
>>> import datasets

>>> dataset = datasets.load_dataset("superb", name="asr", split="test") # doctest: +IGNORE_RESULT
>>> dataset = datasets.load_dataset("minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT
```

You can pass a whole dataset pipeline:

```py
>>> files = dataset["file"]
>>> files = dataset["path"]
>>> speech_recognizer(files[:4])
[{'text': 'HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE'},
{'text': 'STUFFERED INTO YOU HIS BELLY COUNSELLED HIM'},
{'text': 'AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS'},
{'text': 'HO BERTIE ANY GOOD IN YOUR MIND'}]
[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'},
{'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"},
{'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"},
{'text': 'HOW DO I FURN A JOINA COUT'}]
```

For a larger dataset where the inputs are big (like in speech or vision), you will want to pass along a generator instead of a list that loads all the inputs in memory. See the [pipeline documentation](./main_classes/pipelines) for more information.
Expand Down
13 changes: 6 additions & 7 deletions docs/source/es/quicktour.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -115,23 +115,22 @@ Crea un [`pipeline`] con la tarea que deseas resolver y el modelo que quieres us
>>> speech_recognizer = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
```

A continuación, carga el dataset (ve 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) para más detalles) sobre el que quisieras iterar. Por ejemplo, vamos a cargar el dataset [SUPERB](https://huggingface.co/datasets/superb):
A continuación, carga el dataset (ve 🤗 Datasets [Quick Start](https://huggingface.co/docs/datasets/quickstart.html) para más detalles) sobre el que quisieras iterar. Por ejemplo, vamos a cargar el dataset [MInDS-14](https://huggingface.co/datasets/PolyAI/minds14):

```py
>>> import datasets

>>> dataset = datasets.load_dataset("superb", name="asr", split="test") # doctest: +IGNORE_RESULT
>>> dataset = datasets.load_dataset("minds14", name="en-US", split="train") # doctest: +IGNORE_RESULT
```

Puedes pasar un pipeline para un dataset:

```py
>>> files = dataset["file"]
>>> files = dataset["path"]
>>> speech_recognizer(files[:4])
[{'text': 'HE HOPED THERE WOULD BE STEW FOR DINNER TURNIPS AND CARROTS AND BRUISED POTATOES AND FAT MUTTON PIECES TO BE LADLED OUT IN THICK PEPPERED FLOWER FAT AND SAUCE'},
{'text': 'STUFFERED INTO YOU HIS BELLY COUNSELLED HIM'},
{'text': 'AFTER EARLY NIGHTFALL THE YELLOW LAMPS WOULD LIGHT UP HERE AND THERE THE SQUALID QUARTER OF THE BROTHELS'},
{'text': 'HO BERTIE ANY GOOD IN YOUR MIND'}]
[{'text': 'I WOULD LIKE TO SET UP A JOINT ACCOUNT WITH MY PARTNER HOW DO I PROCEED WITH DOING THAT'},
{'text': "FONDERING HOW I'D SET UP A JOIN TO HELL T WITH MY WIFE AND WHERE THE AP MIGHT BE"},
{'text': "I I'D LIKE TOY SET UP A JOINT ACCOUNT WITH MY PARTNER I'M NOT SEEING THE OPTION TO DO IT ON THE APSO I CALLED IN TO GET SOME HELP CAN I JUST DO IT OVER THE PHONE WITH YOU AND GIVE YOU THE INFORMATION OR SHOULD I DO IT IN THE AP AN I'M MISSING SOMETHING UQUETTE HAD PREFERRED TO JUST DO IT OVER THE PHONE OF POSSIBLE THINGS"},
```

Para un dataset más grande, donde los inputs son de mayor tamaño (como en habla/audio o visión), querrás pasar un generador en lugar de una lista que carga todos los inputs en memoria. Ve la [documentación del pipeline](./main_classes/pipelines) para más información.
Expand Down