Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: identify invalid json #379

Merged
merged 8 commits into from
Oct 4, 2019
Merged

Conversation

jdkent
Copy link
Contributor

@jdkent jdkent commented Oct 3, 2019

fixes #378

This change prints out what json file was invalid.

Copy link
Member

@yarikoptic yarikoptic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! It will be useful/informative. Please address recommendations ;)

Cheers!

try:
from json.decoder import JSONDecodeError
except ImportError:
JSONDecodeError = ValueError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is to work on PY2. I would have made it explicit.
We don't use/require six at runtime so doomed to use sys.version_info[0] > 2 or define PY2, PY3 e.g. within heudiconv.utils.
That would help later on to find/prune python2-specific code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually -- you do the same in heudiconv.utils, so do it there and just import here the exception

ifname = "invalid.json"
invalid_json_file = tmp_path / ifname
invalid_json_file.write_text(icontent)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, we have create_tree helper, so all above could be expressed as

create_tree(str(tmp_path), {'invalid.json': u"I'm Jason Bourne"})

I believe

with pytest.raises(JSONDecodeError):
load_json(str(invalid_json_file))
captured = capsys.readouterr()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe all the checks should be brought out of the context manager, or otherwise would not actually be executed since above load_json would raise an exception (to be caught by pytest.raises)


with open(str(valid_json_file), "w") as vj:
json.dump(vcontent, vj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_tree here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would create_tree also be appropriate if the contents are a dictionary, not a string? (I used save_json instead)

try:
from json.decoder import JSONDecodeError
except ImportError:
JSONDecodeError = ValueError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit PY2 check here please (see above comment)

with open(filename, 'r') as fp:
data = json.load(fp)
except JSONDecodeError:
print("{fname} is not a valid json file".format(fname=filename))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use lgr.error not pure print

- add explicit py2 check
- change file saving strategy
- use logger instead of print
@jdkent
Copy link
Contributor Author

jdkent commented Oct 3, 2019

Thank you for the helpful review! I believe I've covered your suggestions, with the deviation of using save_json instead of create_tree

@yarikoptic
Copy link
Member

cool, thanks!

@yarikoptic yarikoptic merged commit 2b9319d into nipy:master Oct 4, 2019
@jdkent jdkent deleted the json_load_error branch October 5, 2019 00:41
@yarikoptic yarikoptic added this to the 0.6 milestone Nov 22, 2019
yarikoptic added a commit that referenced this pull request Dec 16, 2019
This is largely a bug fix.  Metadata and order of `_key-value` fields in BIDS
could change from the result of converting using previous versions, thus minor
version boost.
14 people contributed to this release -- thanks
[everyone](https://github.com/nipy/heudiconv/graphs/contributors)!

 Enhancement

- Use [etelemetry](https://pypi.org/project/etelemetry) to inform about most
  recent available version of heudiconv. Please set `NO_ET` environment variable
  if you want to disable it ([#369][])
- BIDS:
  - `--bids` flag became an option. It can (optionally) accept `notop` value
    to avoid creation of top level files (`CHANGES`, `dataset_description.json`,
    etc) as a workaround during parallel execution to avoid race conditions etc.
    ([#344][])
  - Generate basic `.json` files with descriptions of the fields for
    `participants.tsv` and `_scans.tsv` files ([#376][])
  - Use `filelock` while writing top level files. Use
    `HEUDICONV_FILELOCK_TIMEOUT` environment to change the default timeout value
    ([#348][])
  - `_PDT2` was added as a suffix for multi-echo (really "multi-modal")
    sequences ([#345][])
- Calls to `dcm2niix` would include full output path to make it easier to
  discern in the logs what file it is working on ([#351][])
- With recent [datalad]() (>= 0.10), created DataLad dataset will use
  `--fake-dates` functionality of DataLad to not leak data conversion dates,
  which might be close to actual data acquisition/patient visit ([#352][])
- Support multi-echo EPI `_phase` data ([#373][] fixes [#368][])
- Log location of a bad .json file to ease troubleshooting ([#379][])
- Add basic pypi classifiers for the package ([#380][])

 Fixed
- Sorting `_scans.tsv` files lacking valid dates field should not cause a crash
  ([#337][])
- Multi-echo files detection based number of echos ([#339][])
- BIDS
  - Use `EchoTimes` from the associated multi-echo files if `EchoNumber` tag is
    missing ([#366][] fixes [#347][])
  - Tolerate empty ContentTime and/or ContentDate in DICOMs ([#372][]) and place
    "n/a" if value is missing ([#390][])
  - Do not crash and store original .json file is "JSON pretification" fails
    ([#342][])
- ReproIn heuristic
  - tollerate WIP prefix on Philips scanners ([#343][])
  - allow for use of `(...)` instead of `{...}` since `{}` are not allowed
    ([#343][])
  - Support pipolar fieldmaps by providing them with `_epi` not `_magnitude`.
    "Loose" BIDS `_key-value` pairs might come now after `_dir-` even if they
    came first before ([#358][] fixes [#357][])
- All heuristics saved under `.heudiconv/` under `heuristic.py` name, to avoid
  discrepancy during reconversion ([#354][] fixes [#353][])
- Do not crash (with TypeError) while trying to sort absent file list ([#360][])
- heudiconv requires nipype >= 1.0.0 ([#364][]) and blacklists `1.2.[12]` ([#375][])

* tag 'v0.6.0': (60 commits)
  Version boost to 0.6.0
  DOC: populate detailed changelog for 0.6.0 and tune up formatting in previous one
  Fix miscellaneous typos in ReproIn heuristic file.
  BF: fix check for the sbatch (SLURM) not being available
  ENH: make test-compare-two-versions take any two worktrees, and just show diff if results already known
  Update heudiconv/convert.py
  apply @mgxd 's suggestions, adding a warning and a timeout environment variable
  need str typecast
  Use empty string not None
  Empty acq_time results in empty cell not 'n/a'
  DOC: Clarify tarball session handling
  remove repetitive import statement
  respond to review - add explicit py2 check - change file saving strategy - use logger instead of print
  fix remaning py2 errors
  MNT: Add Python support metadata to package
  fix some python2/3 incompatibilities
  add return data (accidently removed return)
  make content unicode
  test that load_json provides filename if invalid
  explicitly name invalid json
  ...
yarikoptic added a commit that referenced this pull request Dec 16, 2019
[0.6.0] - 2019-12-16

This is largely a bug fix.  Metadata and order of `_key-value` fields in BIDS
could change from the result of converting using previous versions, thus minor
version boost.
14 people contributed to this release -- thanks
[everyone](https://github.com/nipy/heudiconv/graphs/contributors)!

Enhancement

- Use [etelemetry](https://pypi.org/project/etelemetry) to inform about most
  recent available version of heudiconv. Please set `NO_ET` environment variable
  if you want to disable it ([#369][])
- BIDS:
  - `--bids` flag became an option. It can (optionally) accept `notop` value
    to avoid creation of top level files (`CHANGES`, `dataset_description.json`,
    etc) as a workaround during parallel execution to avoid race conditions etc.
    ([#344][])
  - Generate basic `.json` files with descriptions of the fields for
    `participants.tsv` and `_scans.tsv` files ([#376][])
  - Use `filelock` while writing top level files. Use
    `HEUDICONV_FILELOCK_TIMEOUT` environment to change the default timeout value
    ([#348][])
  - `_PDT2` was added as a suffix for multi-echo (really "multi-modal")
    sequences ([#345][])
- Calls to `dcm2niix` would include full output path to make it easier to
  discern in the logs what file it is working on ([#351][])
- With recent [datalad]() (>= 0.10), created DataLad dataset will use
  `--fake-dates` functionality of DataLad to not leak data conversion dates,
  which might be close to actual data acquisition/patient visit ([#352][])
- Support multi-echo EPI `_phase` data ([#373][] fixes [#368][])
- Log location of a bad .json file to ease troubleshooting ([#379][])
- Add basic pypi classifiers for the package ([#380][])

Fixed

- Sorting `_scans.tsv` files lacking valid dates field should not cause a crash
  ([#337][])
- Multi-echo files detection based number of echos ([#339][])
- BIDS
  - Use `EchoTimes` from the associated multi-echo files if `EchoNumber` tag is
    missing ([#366][] fixes [#347][])
  - Tolerate empty ContentTime and/or ContentDate in DICOMs ([#372][]) and place
    "n/a" if value is missing ([#390][])
  - Do not crash and store original .json file is "JSON pretification" fails
    ([#342][])
- ReproIn heuristic
  - tolerate WIP prefix on Philips scanners ([#343][])
  - allow for use of `(...)` instead of `{...}` since `{}` are not allowed
    ([#343][])
  - Support pipolar fieldmaps by providing them with `_epi` not `_magnitude`.
    "Loose" BIDS `_key-value` pairs might come now after `_dir-` even if they
    came first before ([#358][] fixes [#357][])
- All heuristics saved under `.heudiconv/` under `heuristic.py` name, to avoid
  discrepancy during reconversion ([#354][] fixes [#353][])
- Do not crash (with TypeError) while trying to sort absent file list ([#360][])
- heudiconv requires nipype >= 1.0.0 ([#364][]) and blacklists `1.2.[12]` ([#375][])

* tag 'v0.6.0':
  Boost perspective release date in changelog to today
  ENH(TST): Fix version to older pytest to ease backward compatibility testing
  RF: use tmpdir not tmp_path fixture
  FIX: minor typo in CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

enhance explicitness about what json files heudiconv fails to read
2 participants