Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tests #55

Open
ZviBaratz opened this issue Jul 20, 2021 · 15 comments
Open

Refactor tests #55

ZviBaratz opened this issue Jul 20, 2021 · 15 comments
Labels
enhancement New feature or request help wanted Extra attention is needed tests Modification or enhancement of tests
Milestone

Comments

@ZviBaratz
Copy link
Collaborator

Move test files to an independent repository and refactor fixtures.

@ZviBaratz ZviBaratz added enhancement New feature or request help wanted Extra attention is needed tests Modification or enhancement of tests labels Jul 20, 2021
@ZviBaratz ZviBaratz added this to the 1.2.0 milestone Jul 20, 2021
@ZviBaratz
Copy link
Collaborator Author

@matthew-brett better late than never.

A solution that might be beneficial to users and other packages is to create a "DICOM Gallery" repository holding a catalog of references to other repositories containing sample datasets. This approach is highly inspired by @neurolabusc's dcm_qa_* repositories, but aims to provide a more comprehensive source for raw or parsed value validation, as well as any modality or vendor-specific information (e.g. complete JSON headers including private tags, pixel array values in different formats, etc.). It would be best to create a template repository that can easily be forked to exhibit new DICOM series. Would be nice to include an MRIcroGL script to create a standardized visualization of the pixel array.

Once a gallery is available, dicom_parser and other packages can integrate it as required in testing and maintain the core code repository light.

@matthew-brett
Copy link
Contributor

Just checking that I understand what we don't have now.

Is the idea to create a new repository, called - say - open-dicom/dicom-gallery, that would have several submodules, each of which contained example DICOM data, with suitable conversions. And the main repo would have some sort of index so you can look up modalities, scanners etc to find the repos you want?

What do the dcm_qa* repos need in order to be suitable for this?

Worth setting up such a thing now to give us an idea what would work?

@ZviBaratz
Copy link
Collaborator Author

ZviBaratz commented Feb 16, 2022

I made a couple of quick examples and created the base dicom-gallery repo. @matthew-brett what do you think?

@matthew-brett
Copy link
Contributor

Aha - so is the idea that each repository is then a separate Series in the DICOM sense? Rather than a separate DICOM-conversion problem as in the dcm_qa* datasets? How would this relate to the dcm_qa* datasets? Would we need to pull out examples as separate repositories?

@ZviBaratz
Copy link
Collaborator Author

Aha - so is the idea that each repository is then a separate Series in the DICOM sense? Rather than a separate DICOM-conversion problem as in the dcm_qa* datasets?

Exactly. Some items (repos) might be added to demonstrate particular aspects of encoding or conversion, but the idea is that it will be a more general resource.

How would this relate to the dcm_qa* datasets? Would we need to pull out examples as separate repositories?

I'm not sure whether that's possible or not. Seems like most dcm_qa_* repositories have a number of series, and I don't know whether they're complete. If nothing else, they might serve as a checklist of acquisitions required for conversion validation.

@matthew-brett
Copy link
Contributor

And how about testing for identifying and converting more than one Series contained in a directory? How would you deal with that? Or someone (like me) who would like an example of the set of files from one subject? (By the way - I found one - https://github.com/matthew-brett/insular-dicoms.)

@ZviBaratz
Copy link
Collaborator Author

And how about testing for identifying and converting more than one Series contained in a directory?

It's not ideal, but that would require a dedicated repo for this test case (which will not be included in the DICOM gallery).

Or someone (like me) who would like an example of the set of files from one subject?

That might out of the scope of what DICOM gallery is meant to help with, however, perhaps some series may be declared as belonging to the same subject, so that you'll just need to access a number of repos but you'll have all the files.

@neurolabusc
Copy link

This looks nice. I have a couple comments about the MRIcroGL screenshot:

  • The label numbers are not positioned correctly on the image. Solutions include:
    • Update MRIcroGL.
    • Set your bmpzoom to 1 to disable interpolation.
    • Create mosaics with L- to disable labels.
  • You may want to extract your images. This removes haze from the image and creates a clearer rendering.
import gl
gl.resetdefaults()
gl.bmpzoom(1)
gl.loadimage("converted")
gl.extract(0,1,5)
gl.mosaic("A L+ H -0.1 V -0.1 0.125 0.25 0.375 0.5; 0.625 0.75 0.875 S X R 0.5");
gl.savebmp("preview.png")

preview

Another option might be to allow the user to interactively view the converted.nii.gz by having a link to a NiiVue page. @hanayik can probably provide an example. I was envisioning a link instead of the current png.

@neurolabusc
Copy link

All examples in dcm_qa_* are complete series, e.g. for fMRI scans we intentionally kept the resolution low and only acquired a few volumes to keep the file size low and regression testing time fast.

I am not sure I would worry about ensuring that a tool can disambiguate different series that share the same folder. Perhaps a single repository could validate that conditional. There are two edge cases you will want to handle: first Siemens multi-echo images where identical series and instance numbers are generated for each echo (in my experience, this is the leading cause of data loss for Siemens users, where PACS and data storage tools assume the images are duplicates and only retain one 2D slice for all echoes). The other odd case is some GE systems where a single DICOM series is stored in multiple folders. Here the system limits each file to 1000 DICOM images. Therefore, a series with more than 1000 2D slices gets stored in multiple folders, and even a series with less than 1000 slices can get exported to separate folders if the preceding series use up most of the capacity of the initial folder.

If you want to segment each dcm_qa_* repository into one repository per series, you could automate this using dcm2niix's renaming function. It would look something like this (the %o avoids overwriting Siemens field maps where the instance number is identical for different images):

mkdir ~/temp
git clone https://github.com/neurolabusc/dcm_qa_stc
dcm2niix -r y -f %t/%m/%s/%p/%4r_%o.dcm -o ~/temp ./dcm_qa_stc 

Note that some dcm_qa_* repositories provide the same raw scans, just exported in different ways (e.g. classic and enhanced DICOM): e.g. Philips and Canon. Each dcm_qa_* was designed to showcase a specific edge case, so separating individual images removes the gist of that repository. That may not be important for this use case. While one use of dcm_qa is to validate dcm2niix for regression testing, the other is to ensure that different tools provide similar solutions to edge cases. From a users perspective, edge cases are handled similarly regardless if dicm2nii (@xiangruili) or dcm2niix was used. It is worth noting that our DICOM conversion manuscript was co-authored by the lead developers for four different code based (dcm2niix, dicm2nii, mriconvert, SPM12): we do not see these tools as competitors. Rather we collaborate to fill different niches.

@neurolabusc
Copy link

It might be worth checking that parsed.json correctly reports private tags are converted to human readable text. For explicit DICOM, the VR for private tags is reported. This example is curious, as the DICOM reader knows the manufacturers name for the private tag but does not extract the value. In contrast, dcmdump does not know the name, but extracts the values from the VR, and gdcmdump reports both the name and values correctly.

Very impressive job parsing the CSA values.

@ZviBaratz
Copy link
Collaborator Author

This looks nice. I have a couple comments about the MRIcroGL screenshot: ...

Thank you! I created a script template to be used and updated the instructions in the README.

Another option might be to allow the user to interactively view the converted.nii.gz by having a link to a NiiVue page. @hanayik can probably provide an example. I was envisioning a link instead of the current png.

I think that could be great. I wish it could be embedded, but I think that would require some JS and GitHub does not allow using script tags in markdown. In any case, I think a link might work well in addition to the png.

It might be worth checking that parsed.json correctly reports private tags are converted to human readable text.

That's a great point. I use dicom_parser to create the parsed JSON, and I still have some work to do there to improve private tag parsing.

Very impressive job parsing the CSA values.

I appreciate that! Looking at the exported JSON made me want to clean it up a little more. I have to admit the current implementation for the ASCII header parsing is poorly written (looking back, it seems like a lot of it should simply be replaced with a defaultdict), but it works (and I can't currrently afford the time to clean it up 😅).

@matthew-brett
Copy link
Contributor

@ZviBaratz - is your ASCII parser parsing the ASCCONV part of the Siemens private header? Can you use the Nibabel parser at https://github.com/nipy/nibabel/blob/master/nibabel/nicom/ascconv.py ? It's pretty fast, because it uses Python's own syntax parser.

@ZviBaratz
Copy link
Collaborator Author

@ZviBaratz - is your ASCII parser parsing the ASCCONV part of the Siemens private header?

Yes.

Can you use the Nibabel parser at https://github.com/nipy/nibabel/blob/master/nibabel/nicom/ascconv.py ?

I definitely wouldn't mind a PR to replace/improve the current implementation based on it.
I'm sorry I can't currently push this forward myself, but I'm happy to discuss and review if that helps.

@matthew-brett
Copy link
Contributor

I had a go at the integration - see: #85

@ZviBaratz
Copy link
Collaborator Author

This issue got a little off track, so I'm just summarizing the current state of things and what still needs to be done in the context of this repo:

dicom_parser testing currently relies on a limited number of test files within the tests/ directory. This makes the package heavier and complicates long-term management, especially if NIfTI conversion functionality is meant to be implemented and tested against dcm2niix in the future (see #67, #70).

Trying to provide a better alternative, I proposed a general template repository for uploading sample .dcm and dcm2niix converted .nii files. The dicom_gallery repository was created in addition, to provide references to existing samples.

To close this issue, the following steps will be required:

  • Create DICOM gallery repositories to replace whatever existing test files from dicom_parser that may be replaced.
  • Make external test files accessible (Git Submodules? @matthew-brett @neurolabusc what do you think?).
  • Remove replaced test files from dicom_parser.
  • Fix tests to use the external test files and pass.

I will probably not have time to push this forward in the next couple of months. However, I wanted to share this here so that I can jump back in more easily in the future, or, of course, if anyone else wants to take this on.

@ZviBaratz ZviBaratz modified the milestones: v1.3.0, v1.2.0 Feb 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed tests Modification or enhancement of tests
Projects
None yet
Development

No branches or pull requests

3 participants