less script docs (#6993)

huggingface · Jun 27, 2024 · 6cf563f · 6cf563f
1 parent b275462
commit 6cf563f
Show file tree

Hide file tree

Showing 8 changed files with 23 additions and 53 deletions.
diff --git a/docs/source/audio_dataset.mdx b/docs/source/audio_dataset.mdx
@@ -14,8 +14,6 @@ There are several methods for creating and sharing an audio dataset:
 
 * Create an audio dataset repository with the `AudioFolder` builder. This is a no-code solution for quickly creating an audio dataset with several thousand audio files.
 
-* Create an audio dataset by writing a loading script. This method is for advanced users and requires more effort and coding, but you have greater flexibility over how a dataset is defined, downloaded, and generated which can be useful for more complex or large scale audio datasets.
-
 
 <Tip>
 
@@ -175,7 +173,7 @@ Some audio datasets, like those found in [Kaggle competitions](https://www.kaggl
 
 </Tip>
 
-## Loading script
+## (Legacy) Loading script
 
 Write a dataset loading script to manually create a dataset.
 It defines a dataset's splits and configurations, and handles downloading and generating the dataset examples.

diff --git a/docs/source/create_dataset.mdx b/docs/source/create_dataset.mdx
@@ -105,7 +105,7 @@ You can also create a dataset from local files by specifying the path to the dat
 
 ## Next steps
 
-We didn't mention this in the tutorial, but you can also create a dataset with a loading script. A loading script is a more manual and code-intensive method for creating a dataset, but it also gives you the most flexibility and control over how a dataset is generated. It lets you configure additional options such as creating multiple configurations within a dataset, or enabling your dataset to be streamed. 
+We didn't mention this in the tutorial, but you can also create a dataset with a loading script. A loading script is a more manual and code-intensive method for creating a dataset, and are not well supported on Hugging Face. Though in some rare cases it can still be helpful.
 
 To learn more about how to write loading scripts, take a look at the <a href="https://huggingface.co/docs/datasets/main/en/image_dataset#loading-script"><span class="underline decoration-yellow-400 decoration-2 font-semibold">image loading script</span></a>, <a href="https://huggingface.co/docs/datasets/main/en/audio_dataset"><span class="underline decoration-pink-400 decoration-2 font-semibold">audio loading script</span></a>, and <a href="https://huggingface.co/docs/datasets/main/en/dataset_script"><span class="underline decoration-green-400 decoration-2 font-semibold">text loading script</span></a> guides.
 

diff --git a/docs/source/filesystems.mdx b/docs/source/filesystems.mdx
@@ -142,14 +142,6 @@ Load a dataset builder from the Hugging Face Hub (see [how to load from the Hugg
 >>> builder.download_and_prepare(output_dir, storage_options=storage_options, file_format="parquet")
 ```
 
-Load a dataset builder using a loading script (see [how to load a local loading script](./loading#local-loading-script)):
-
-```py
->>> output_dir = "s3://my-bucket/imdb"
->>> builder = load_dataset_builder("path/to/local/loading_script/loading_script.py")
->>> builder.download_and_prepare(output_dir, storage_options=storage_options, file_format="parquet")
-```
-
 Use your own data files (see [how to load local and remote files](./loading#local-and-remote-files)):
 
 ```py

diff --git a/docs/source/image_dataset.mdx b/docs/source/image_dataset.mdx
@@ -2,8 +2,9 @@
 
 There are two methods for creating and sharing an image dataset. This guide will show you how to:
 
+* Create an audio dataset from local files in python with [`Dataset.push_to_hub`]. This is an easy way that requires only a few steps in python.
+
 * Create an image dataset with `ImageFolder` and some metadata. This is a no-code solution for quickly creating an image dataset with several thousand images.
-* Create an image dataset by writing a loading script. This method is a bit more involved, but you have greater flexibility over how a dataset is defined, downloaded, and generated which can be useful for more complex or large scale image datasets.
 
 <Tip>
 
@@ -188,7 +189,7 @@ Load your WebDataset and it will create on column per file suffix (here "jpg" an
 {"bbox": [[302.0, 109.0, 73.0, 52.0]], "categories": [0]}
 ```
 
-## Loading script
+## (Legacy) Loading script
 
 Write a dataset loading script to share a dataset. It defines a dataset's splits and configurations, and handles downloading and generating a dataset. The script is located in the same folder or repository as the dataset and should have the same name.
 

diff --git a/docs/source/load_hub.mdx b/docs/source/load_hub.mdx
@@ -102,11 +102,7 @@ Then load the configuration you want:
 
 ## Remote code
 
-Certain datasets repositories contain a loading script with the Python code used to generate the dataset.
-Those datasets are generally exported to Parquet by Hugging Face, so that 🤗 Datasets can load the dataset fast and without running a loading script.
-
-Even if a Parquet export is not available, you can still use any dataset with Python code in its repository with `load_dataset`.
-All files and code uploaded to the Hub are scanned for malware (refer to the Hub security documentation for more information), but you should still review the dataset loading scripts and authors to avoid executing malicious code on your machine. You should set `trust_remote_code=True` to use a dataset with a loading script, or you will get an error:
+Certain datasets repositories contain a loading script with the Python code used to generate the dataset. All files and code uploaded to the Hub are scanned for malware (refer to the Hub security documentation for more information), but you should still review the dataset loading scripts and authors to avoid executing malicious code on your machine. You should set `trust_remote_code=True` to use a dataset with a loading script, or you will get an error:
 
 ```py
 >>> from datasets import get_dataset_config_names, get_dataset_split_names, load_dataset

diff --git a/docs/source/loading.mdx b/docs/source/loading.mdx
@@ -4,12 +4,12 @@ Your data can be stored in various places; they can be on your local machine's d
 
 This guide will show you how to load a dataset from:
 
-- The Hub without a dataset loading script
-- Local loading script
+- The Hugging Face Hub
 - Local files
 - In-memory data
 - Offline
 - A specific slice of a split
+- Local loading script (legacy)
 
 For more details specific to loading other dataset modalities, take a look at the <a class="underline decoration-pink-400 decoration-2 font-semibold" href="./audio_load">load audio dataset guide</a>, the <a class="underline decoration-yellow-400 decoration-2 font-semibold" href="./image_load">load image dataset guide</a>, or the <a class="underline decoration-green-400 decoration-2 font-semibold" href="./nlp_load">load text dataset guide</a>.
 
@@ -73,35 +73,6 @@ The `split` parameter can also map a data file to a specific split:
 >>> c4_validation = load_dataset("allenai/c4", data_files=data_files, split="validation")
 ```
 
-## Local loading script
-
-You may have a 🤗 Datasets loading script locally on your computer. In this case, load the dataset by passing one of the following paths to [`load_dataset`]:
-
-- The local path to the loading script file.
-- The local path to the directory containing the loading script file (only if the script file has the same name as the directory).
-
-Pass `trust_remote_code=True` to allow 🤗 Datasets to execute the loading script:
-
-```py
->>> dataset = load_dataset("path/to/local/loading_script/loading_script.py", split="train", trust_remote_code=True)
->>> dataset = load_dataset("path/to/local/loading_script", split="train", trust_remote_code=True)  # equivalent because the file has the same name as the directory
-```
-
-### Edit loading script
-
-You can also edit a loading script from the Hub to add your own modifications. Download the dataset repository locally so any data files referenced by a relative path in the loading script can be loaded:
-
-```bash
-git clone https://huggingface.co/datasets/eli5
-```
-
-Make your edits to the loading script and then load it by passing its local path to [`~datasets.load_dataset`]:
-
-```py
->>> from datasets import load_dataset
->>> eli5 = load_dataset("path/to/local/eli5")
-```
-
 ## Local and remote files
 
 Datasets can be loaded from local files stored on your computer and from remote files. The datasets are most likely stored as a `csv`, `json`, `txt` or `parquet` file. The [`load_dataset`] function can load each of these file types.
@@ -534,3 +505,17 @@ In some instances, you may be simultaneously running multiple independent distri
 >>> from datasets import load_metric
 >>> metric = load_metric('glue', 'mrpc', num_process=num_process, process_id=process_id, experiment_id="My_experiment_10")
 ```
+
+## (Legacy) Local loading script
+
+You may have a 🤗 Datasets loading script locally on your computer. In this case, load the dataset by passing one of the following paths to [`load_dataset`]:
+
+- The local path to the loading script file.
+- The local path to the directory containing the loading script file (only if the script file has the same name as the directory).
+
+Pass `trust_remote_code=True` to allow 🤗 Datasets to execute the loading script:
+
+```py
+>>> dataset = load_dataset("path/to/local/loading_script/loading_script.py", split="train", trust_remote_code=True)
+>>> dataset = load_dataset("path/to/local/loading_script", split="train", trust_remote_code=True)  # equivalent because the file has the same name as the directory
+```
diff --git a/docs/source/repository_structure.mdx b/docs/source/repository_structure.mdx
@@ -277,5 +277,3 @@ my_dataset_repository/
         ├── shard_0.csv
         └── shard_1.csv
 ```
-
-For more flexibility over how to load and generate a dataset, you can also write a [dataset loading script](./dataset_script).
diff --git a/docs/source/upload_dataset.mdx b/docs/source/upload_dataset.mdx
@@ -129,6 +129,6 @@ From here, you can go on to:
 
 - Learn more about how to use 🤗 Datasets other functions to [process your dataset](process).
 - [Stream large datasets](stream) without downloading it locally.
-- [Define your dataset splits and configurations](repository_structure) or [loading script](dataset_script) and share your dataset with the community.
+- [Define your dataset splits and configurations](repository_structure) and share your dataset with the community.
 
 If you have any questions about 🤗 Datasets, feel free to join and ask the community on our [forum](https://discuss.huggingface.co/c/datasets/10).