Skip to content

Add Allen CCF location validation checks for mouse subjects#671

Open
bendichter wants to merge 5 commits intodevfrom
add-allen-ccf-location-checks
Open

Add Allen CCF location validation checks for mouse subjects#671
bendichter wants to merge 5 commits intodevfrom
add-allen-ccf-location-checks

Conversation

@bendichter
Copy link
Contributor

Summary

  • Validates location fields against Allen Mouse Brain CCF ontology terms when subject species is mouse
  • Adds checks for ImagingPlane.location, electrodes.location column, and IntracellularElectrode.location
  • Bundles 1,327 Allen CCF structure acronyms and full names as a JSON data file with a cached loader
  • Adds icephys best practices documentation page and updates ecephys/ophys docs with check function references

Closes #670

Test plan

  • 5 ophys tests: pass (acronym + full name), fail (invalid location), skip (non-mouse, no subject)
  • 5 ecephys tests: pass (all valid), fail (deduplicates per unique invalid location), skip (non-mouse, no electrodes, no subject)
  • 5 icephys tests: pass (acronym + full name), fail (invalid location), skip (non-mouse, no subject)
  • Full test suite passes (pre-existing failures unrelated)

🤖 Generated with Claude Code

Validate location fields against Allen Mouse Brain CCF ontology terms
when subject species is mouse. Adds checks for ImagingPlane.location,
electrodes.location column, and IntracellularElectrode.location.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@bendichter bendichter force-pushed the add-allen-ccf-location-checks branch from a7c0450 to 2eaf1b1 Compare February 21, 2026 16:12
Copy link
Collaborator

@h-mayorquin h-mayorquin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the first reading.

Are you including all the tissue types at the level 1 of the hierarchy besides gray?

I think fiber tracts and level 1 ventricular system probably does not make sense for electrode localization but I guess there is not downside from including everything, is it?

from .._registration import Importance, InspectorMessage, register_check
from ..utils import get_data_shape

MOUSE_SPECIES_VALUES = {"Mus musculus", "Mouse", "mouse", "http://purl.obolibrary.org/obo/NCBITaxon_10090"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is repeated across the sets, maybe we should centralize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — centralized into checks/_common.py and updated all three modules (_ophys.py, _ecephys.py, _icephys.py) to import from there.

return None

location = imaging_plane.location
if location is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't location required in the schema? If so, this if does not make sense, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this is in case it was ever not required or it ever becomes not required. It can become a problem when checks throw errors, so I like to be extra careful about the existence of attributes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that it ended up here just because the agents by default write overly defensive coding.

return None
if nwbfile.electrodes is None:
return None
if "location" not in nwbfile.electrodes.colnames:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required on the schema as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to leave it in in case it ever becomes optional

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to have this on the pypi distribution? I would just remove the script entirely but if you want to have it we should modify the hatchling configuration so this is not distributed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it staying in. It's clearly labelled within _internal_configs, and it would be useful if someone needs to regenerate these values with a new schema. What is the downside of keeping this in?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think people will know how to extract it from the wheel. Why would like it there? it is here on github which is the code that someone would get if they want to experiment at that level.

valid_terms = get_allen_ccf_location_terms()
invalid_locations = set()
for location in nwbfile.electrodes["location"].data:
if location not in valid_terms and location not in invalid_locations:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the matching case-sensitive, is this desirable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the abbreviations are exact, including case. If they have the wrong case, that should be flagged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is working as intended. This is a setuptools legacy letover and this project uses hatchling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, then lets remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — reverted. The project uses hatchling, so MANIFEST.in has no effect. The JSON file is already included automatically since it lives under src/nwbinspector/.

bendichter and others added 4 commits February 25, 2026 08:38
Remove MANIFEST.in change (hatchling ignores it) and move the
duplicated MOUSE_SPECIES_VALUES constant into checks/_common.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The pyproject.toml skip list doesn't apply when pre-commit passes
explicit filenames to codespell. Using exclude in the hook config
prevents brain region abbreviations from being flagged as typos.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 95.45455% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.26%. Comparing base (a2dc238) to head (13dd2bd).

Files with missing lines Patch % Lines
src/nwbinspector/checks/_ecephys.py 95.45% 1 Missing ⚠️
src/nwbinspector/checks/_icephys.py 94.11% 1 Missing ⚠️
src/nwbinspector/checks/_ophys.py 94.11% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #671      +/-   ##
==========================================
+ Coverage   78.41%   82.26%   +3.84%     
==========================================
  Files          47       49       +2     
  Lines        1756     1821      +65     
==========================================
+ Hits         1377     1498     +121     
+ Misses        379      323      -56     
Files with missing lines Coverage Δ
src/nwbinspector/_internal_configs/_allen_ccf.py 100.00% <100.00%> (ø)
src/nwbinspector/checks/__init__.py 100.00% <ø> (ø)
src/nwbinspector/checks/_common.py 100.00% <100.00%> (ø)
src/nwbinspector/checks/_ecephys.py 95.16% <95.45%> (+0.01%) ⬆️
src/nwbinspector/checks/_icephys.py 96.42% <94.11%> (-3.58%) ⬇️
src/nwbinspector/checks/_ophys.py 98.18% <94.11%> (-1.82%) ⬇️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check that mouse NWB files use Allen CCF (MBAO) ontology terms for location fields

3 participants