Skip to content

added instruction to config HF dataset viewer for image dataset #42

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
102 changes: 102 additions & 0 deletions docs/wiki-guide/The-Hugging-Face-Dataset-Upload-Guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,105 @@ git add
git commit -m 'comments'
git push
```
## Config Dataset Viewer For Image Dataset

The Hugging Face Dataset Viewer allows you to visualize the image dataset alongside image associated metadata in your web browser.

![Hugging Face Dataset Viewer](images/HF-dataset-upload/hf_dataset_viewer.png){ loading=lazy }

To enable the dataset viewer, you can
- Create a `data` folder at root directory
- Go to `data` directory and place your image files (e.g., `.jpg`, `.png`) into separate folders named `train`, `test`, and `validation`, with each folder containing the images for that split.

Example structure:
``` bash
repo_root
├── data
│ ├── test
│ │ ├── img_1.png
│ │ ├── img_2.png
│ │ └── img_3.png
│ ├── train
│ └── validation
└── README.md
```

!!! warning Be careful with folder names
Avoid including "test", "train", or "validation" in other folder names in your repo, as this may confuse the HF Dataset Viewer and cause it to display the wrong folder.

If you’d like to display additional columns of metadata alongside your images in the dataset viewer, you should create a `metadata.csv` file. This file **must** include a `file_name` column that links each image file to its metadata. **The `metadata.csv` file should be placed either in the same directory as the images it describes or in any parent directory.**

**Example: metadata in the same directory as images**

Folder structure:
``` bash
repo_root
├── data
│ ├── test
│ │ ├── img_1.png
│ │ ├── img_2.png
│ │ ├── img_3.png
│ │ └── metadata.csv
│ ├── train
│ └── validation
└── README.md
```
`metadata.csv`:
```
file_name,genus,species
img_1.png,acinonyx,jubatus
img_2.png,antidorcas,marsupialis,
img_3.png,bos,taurus
```


**Example: metadata in a parent directory, referencing images in subfolders**

Folder Structure:
``` bash
repo_root
├── data
│ ├── test
│ │ ├── metadata.csv
│ │ ├── bird
│ │ │ └── img_1.png
│ │ ├── insect
│ │ │ └── img_2.png
│ │ └── plant
│ │ └── img_3.png
│ ├── train
│ └── validation
└── README.md
```

!!! note
When referencing images in subfolders, use relative paths in the `file_name` column.

`metadata.csv`
```
file_name,genus,species
bird/img_1.png,acinonyx,jubatus
insect/img_2.png,antidorcas,marsupialis,
plant/img_3.png,bos,taurus
```

Dataset Card `README.md`
``` YAML
configs:
- config_name: default
drop_labels: false
```
You can disable this automatic addition of the `label` column by specifying the YAML config in the dataset card. If your directory names have no special meaning, set `drop_labels: true` in the `README` header.

**Additional reference:**

- Example repo:
- [imageomics/IDLE-OO-Camera-Traps](https://huggingface.co/datasets/imageomics/IDLE-OO-Camera-Traps)
- [HF Image Dataset Collection](https://huggingface.co/collections/datasets-examples/image-dataset-6568e7cf28639db76eb92d65)

- Hugging Face Documentation:
- [Data files configuration](https://huggingface.co/docs/hub/datasets-data-files-configuration)
- [Dataset file names & splits](https://huggingface.co/docs/hub/datasets-file-names-and-splits)
- [Config customized dataset structure](https://huggingface.co/docs/hub/datasets-manual-configuration)
- [Config image dataset](https://huggingface.co/docs/hub/datasets-image)

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.