Add nifti support #7815

CloseChoice · 2025-10-13T20:07:32Z

Add support for NIfTI.

supports #7804

This PR follows #7325 very closely

I am a bit unsure what we need to add to the document_dataset.mdx and document_load.mdx. I should probably create a dataset on the hub first to create this guide instead of copy+pasting from PDF.

Open todos:

create nifti dataset on the hub
~~[ ] update document_dataset.mdx and document_load.mdx~~

EDIT:
I tested with two datasets I created on the hub:

for zipped (file extension .nii.gz and unzipped .nii) files and both seem to work fine. Also tested loading locally and that seems to work as well.
Here is the scriptsthat I ran against the hub:

from pathlib import Path

from datasets import load_dataset
import nibabel as nib


dataset = load_dataset(
        "TobiasPitters/test-nifti-unzipped",
        split="test"  # Load as single Dataset, not DatasetDict
)

print("length dataset unzipped:", len(dataset))
for item in dataset:
    isinstance(item["nifti"], nib.nifti1.Nifti1Image)

dataset = load_dataset(
        "TobiasPitters/test-nifti",
        split="train"  # Load as single Dataset, not DatasetDict
)
print("length dataset zipped:", len(dataset))
for item in dataset:
    isinstance(item["nifti"], nib.nifti1.Nifti1Image)

lhoestq

Wow this is awesome ! the code looks all good to me

I am a bit unsure what we need to add to the document_dataset.mdx and document_load.mdx. I should probably create a dataset on the hub first to create this guide instead of copy+pasting from PDF.

imo you could get some inspiration from the PDF docs indeed but showcase how it works for an actual dataset, and ideally what are the main usage of Nifti1Image in general and also in a training setting (convert to PIL.Image or torch tensor for example)

lhoestq · 2025-10-20T14:16:20Z

tests/utils.py

    return test_case


+def require_nibabel(test_case):


don't forget to add nibabel to setup.py in the test dependencies

Done,thanks

HuggingFaceDocBuilderDev · 2025-10-20T14:16:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

lhoestq · 2025-10-20T14:37:04Z

Btw I couldn't resist but share your PR with the community online on twitter already, I hope this is fine !

…-support

CloseChoice

Alright, docs are updated, code in the docs is tested as well and works. Happy for another review round.

EDIT: I created a proper nifti dataset on the hub: https://huggingface.co/datasets/TobiasPitters/NIfTI-SIRF-exercises-geometry, but thought it's not good practice to reference personal (yet public) datasets in the docs.

CloseChoice · 2025-10-21T15:52:47Z

tests/utils.py

    return test_case


+def require_nibabel(test_case):


Done,thanks

CloseChoice · 2025-10-21T15:56:13Z

Btw I couldn't resist but share your PR with the community online on twitter already, I hope this is fine !

Wow, that was quick! Thanks, already liked your comment, I appreciate it!

lhoestq

lgtm !

lhoestq · 2025-11-04T11:45:31Z

NIfTI support is out in datasets==4.4.0 ! 🥳

Btw do you know a good NIfTI vizualizer in HTML/JS or using python ? We could add something like .to_html() (or equivalent) to view data in a notebook and enable the Dataset Viewer on HF if it can be useful

cc @cfahlgren1 @georgiachanning for viz

JINAILAB · 2025-11-05T05:49:48Z

Hi, I have a quick question while testing the new NIfTI support.

I cloned the latest main branch, installed it locally using pip install -e ., and ran the following:

from datasets import load_dataset

dataset = load_dataset(
    "TobiasPitters/NIfTI-SIRF-exercises-geometry",
    split="train"
)
dataset[0]['nifti'].get_fdata()[0].shape

However, I’m getting the following error:

FileNotFoundError: No such file or no access: 'data/nifti/OBJECT_phantom_T2W_TSE_Sag_18_1.nii'

When I manually place the NIfTI file in that local path, it works fine.
But I assume the intended behavior is for the .nii file to be included in the dataset hosted on the Hub, so that load_dataset() automatically loads it without relying on a local file path.

Interestingly, after adding the embed_storage method, it started working properly.
Could you please confirm whether this is the expected behavior, or if my previous setup was missing something?

CloseChoice · 2025-11-05T08:39:37Z

from datasets import load_dataset

dataset = load_dataset(
    "TobiasPitters/NIfTI-SIRF-exercises-geometry",
    split="train"
)
dataset[0]['nifti'].get_fdata()[0].shape

Thanks for the report, I can confirm this. It's a problem with the dataset. Can you try this:

from datasets import load_dataset
import nibabel as nib

dataset = load_dataset(
        "TobiasPitters/test-nifti-unzipped",
        split="train"  # Load as single Dataset, not DatasetDict
)

print("length dataset:", len(dataset))
for item in dataset:
    assert isinstance(item["nifti"], nib.nifti1.Nifti1Image)

If your interested in "TobiasPitters/NIfTI-SIRF-exercises-geometry" I can give it a shot to reupload correctly, otherwise I'd take it down.

CloseChoice · 2025-11-05T18:33:22Z

NIfTI

I would suggest https://github.com/rii-mango/Papaya, just tested it and it looks quite good. How would it work to add that to the dataset-viewer?

And I assume you'd like to have the to_html feature on the NifTI class?

EDIT: do we have anything like this already for other features? Couldn't find anything. I mean we can do this in different ways, simply inlining papaya or building custom components (like e.g. SHAP is doing). If we decide for the latter, this means that we'll need to build js components in datasets, so we'll need a bundler, etc. but this provides the highest flexibility. If that's of interest, I can take a look into this.

JINAILAB · 2025-11-05T23:44:12Z

Hi, thanks for your help earlier. I tested the dataset you shared (TobiasPitters/test-nifti-unzipped), and it works perfectly — all NIfTI files load correctly and get_fdata() returns valid arrays.

However, when I upload my own dataset to the Hugging Face Hub using my code, it doesn’t work properly. The NIfTI files seem not to decode correctly. can you check it?

train_dataset = Dataset.from_pandas(train_df)

def cast_dataset(dataset):
    dataset = dataset.cast_column("nifti", Nifti(decode=True))
    dataset = dataset.cast_column("label", ClassLabel(num_classes=10, names=[str(i) for i in range(10)]))

train_dataset = cast_dataset(train_dataset)

CloseChoice · 2025-11-06T09:42:47Z

Hi, thanks for your help earlier. I tested the dataset you shared (TobiasPitters/test-nifti-unzipped), and it works perfectly — all NIfTI files load correctly and get_fdata() returns valid arrays.

However, when I upload my own dataset to the Hugging Face Hub using my code, it doesn’t work properly. The NIfTI files seem not to decode correctly. can you check it?
train_dataset = Dataset.from_pandas(train_df)

def cast_dataset(dataset):
    dataset = dataset.cast_column("nifti", Nifti(decode=True))
    dataset = dataset.cast_column("label", ClassLabel(num_classes=10, names=[str(i) for i in range(10)]))

train_dataset = cast_dataset(train_dataset)

Are you using zipped Nifti files? It seems like there is an issue with that. I found that this creates problems, that in decode_example the path is something like 'gzip://T1.nii::/home/tobias/programming/github/datasets/nitest-balls1/NIFTI/T1.nii.gz', and then we go down the remote path which results in an KeyError since repo_id is not specified. The root cause for this is in the DownloadManager.extract method, where we extract compressed files.

@lhoestq : what do you suggest here? We could probably do something like this in the decode_example:

if path.startswith("gzip:"):
    path = path.split("::")[-1]

Though I would need to test if this is actually OS agnostic.

lhoestq · 2025-11-06T13:01:58Z

I think the issue with gzip can be fixed using the same code as in Image() imo:

- try:
-     repo_id = string_to_dict(source_url, pattern)["repo_id"]
-     token = token_per_repo_id.get(repo_id)
- except ValueError:
-     token = None
+ source_url_fields = string_to_dict(source_url, pattern)
+ token = (
+     token_per_repo_id.get(source_url_fields["repo_id"]) if source_url_fields is not None else None
+ )

CloseChoice · 2025-11-06T15:17:03Z

Hi, thanks for your help earlier. I tested the dataset you shared (TobiasPitters/test-nifti-unzipped), and it works perfectly — all NIfTI files load correctly and get_fdata() returns valid arrays.

However, when I upload my own dataset to the Hugging Face Hub using my code, it doesn’t work properly. The NIfTI files seem not to decode correctly. can you check it?
train_dataset = Dataset.from_pandas(train_df)

def cast_dataset(dataset):
    dataset = dataset.cast_column("nifti", Nifti(decode=True))
    dataset = dataset.cast_column("label", ClassLabel(num_classes=10, names=[str(i) for i in range(10)]))

train_dataset = cast_dataset(train_dataset)

Can you pls try with this branch:

pip install git+https://github.com/CloseChoice/datasets.git@fix-embed-storage-nifti

This should fix the existing problems with NifTI

JINAILAB · 2025-11-07T00:52:01Z

I checked and it looks like the fix-embed-storage-nifti branch has already been merged into the main. And it worked fine. thanks.

CloseChoice added 3 commits October 13, 2025 21:55

Add nifti support

2fb7679

update docs

80b159e

update nifti after testing locally and from remote hub

3bfbdfb

CloseChoice marked this pull request as ready for review October 14, 2025 17:51

lhoestq reviewed Oct 20, 2025

View reviewed changes

CloseChoice added 5 commits October 21, 2025 17:38

update setup.py to add nibabel and update docs

526c060

add nifti_dataset

0952df9

fix nifti dataset documentation

a7db3b0

Merge branch 'main' of github.com:huggingface/datasets into add-nifti…

3d3c6ee

…-support

add nibabel to test dependency

d07021a

CloseChoice commented Oct 21, 2025

View reviewed changes

tests/utils.py

return test_case

def require_nibabel(test_case):

Copy link

Contributor Author

CloseChoice Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,thanks

CloseChoice requested a review from lhoestq October 21, 2025 15:53

CloseChoice and others added 2 commits October 24, 2025 14:08

Merge branch 'main' into add-nifti-support

065249b

Add section for creating a medical imaging dataset

685b798

lhoestq approved these changes Oct 24, 2025

View reviewed changes

lhoestq merged commit 5138876 into huggingface:main Oct 24, 2025
2 of 14 checks passed

CloseChoice deleted the add-nifti-support branch October 24, 2025 14:32

CloseChoice mentioned this pull request Oct 28, 2025

Add DICOM support #7835

Open

1 task

CloseChoice mentioned this pull request Nov 6, 2025

Problems with NifTI #7852

Closed

CloseChoice mentioned this pull request Nov 19, 2025

Visualization for Medical Imaging Datasets #7870

Closed

Add nifti support #7815

Add nifti support #7815

Conversation

CloseChoice commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

lhoestq Oct 20, 2025

Choose a reason for hiding this comment

Uh oh!

CloseChoice Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Oct 20, 2025

Uh oh!

lhoestq commented Oct 20, 2025

Uh oh!

CloseChoice left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CloseChoice Oct 21, 2025

Choose a reason for hiding this comment

Uh oh!

CloseChoice commented Oct 21, 2025

Uh oh!

lhoestq left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lhoestq commented Nov 4, 2025

Uh oh!

JINAILAB commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CloseChoice commented Nov 5, 2025

Uh oh!

CloseChoice commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JINAILAB commented Nov 5, 2025

Uh oh!

CloseChoice commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lhoestq commented Nov 6, 2025

Uh oh!

CloseChoice commented Nov 6, 2025

Uh oh!

JINAILAB commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CloseChoice commented Oct 13, 2025 •

edited

Loading

CloseChoice left a comment •

edited

Loading

JINAILAB commented Nov 5, 2025 •

edited

Loading

CloseChoice commented Nov 5, 2025 •

edited

Loading

CloseChoice commented Nov 6, 2025 •

edited

Loading