-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] convert sample sizes #919
Conversation
Reviewer's Guide by SourceryThis pull request introduces the ability to use Sequence diagram for converting analysis to dictionary with sample sizessequenceDiagram
participant Study
participant Analysis
participant Result
Analysis->>Analysis: Get sample_sizes from metadata
alt sample_sizes is None or empty
Analysis->>Analysis: Get sample_size from analysis metadata
alt sample_size is None
Analysis->>Study: Get sample_size from study metadata
end
end
Analysis->>Result: Add sample_sizes to result metadata
Result-->>Analysis: Return result
Updated class diagram for Analysis metadata handlingclassDiagram
class Study {
metadata: dict
}
class Analysis {
metadata: dict
annotations: dict
}
class Result {
metadata: dict
labels: dict
}
Analysis --|> Study : Gets sample_size from
Analysis --|> Result : Adds sample_sizes to
Analysis --|> Result : Adds annotations to
note for Analysis "Handles sample_sizes and annotations from metadata"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @jdkent - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider consolidating the sample size handling logic to reduce redundancy.
- It might be good to add a test case that covers the scenario where
sample_size
is in study metadata.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟡 Testing: 2 issues found
- 🟡 Complexity: 1 issue found
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
|
||
# Validate sample sizes if present | ||
if sample_sizes is not None and not isinstance(sample_sizes, (list, tuple)): | ||
raise TypeError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (bug_risk): Review broad exception handling for annotations.
The try/except block around updating the 'labels' dictionary catches TypeError and AttributeError broadly and re-raises them as ValueError. This might mask issues that are not related to annotation formatting. Consider narrowing the exception scope to only the operations you expect might fail.
dset = io.convert_nimads_to_dataset(studyset) | ||
|
||
assert isinstance(dset, nimare.dataset.Dataset) | ||
assert "sample_sizes" in dset.metadata.columns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add tests for edge cases and error handling.
It would be beneficial to include tests that cover edge cases and error conditions within the _analysis_to_dict
function, such as when sample_sizes
is not a list/tuple, sample_size
is not numeric, or when annotations have an invalid format. This ensures that the error handling logic is robust and functions as expected.
Suggested implementation:
def test_convert_nimads_to_dataset_sample_sizes(
example_nimads_studyset, example_nimads_annotation
):
"""Conversion of nimads JSON to nimare dataset."""
studyset = Studyset(example_nimads_studyset)
for study in studyset.studies:
for analysis in study.analyses:
analysis.metadata["sample_sizes"] = [2, 20]
import pytest
from nimare import io, dataset # adjust imports as needed
from nimare.dataset import Dataset
from nimare.io import _analysis_to_dict # assuming _analysis_to_dict is accessible for testing
def test_analysis_to_dict_invalid_sample_sizes_type(example_nimads_studyset):
"""Test _analysis_to_dict raises ValueError when sample_sizes is not a list/tuple."""
studyset = Studyset(example_nimads_studyset)
# Set sample_sizes to an int rather than list/tuple
for study in studyset.studies:
for analysis in study.analyses:
analysis.metadata["sample_sizes"] = 5
with pytest.raises(ValueError):
# Trigger conversion which internally calls _analysis_to_dict
io.convert_nimads_to_dataset(studyset)
def test_analysis_to_dict_non_numeric_sample_sizes(example_nimads_studyset):
"""Test _analysis_to_dict raises ValueError when sample_sizes contains non-numeric values."""
studyset = Studyset(example_nimads_studyset)
# Set sample_sizes to a list with non-numeric value
for study in studyset.studies:
for analysis in study.analyses:
analysis.metadata["sample_sizes"] = [2, "twenty"]
with pytest.raises(ValueError):
io.convert_nimads_to_dataset(studyset)
def test_analysis_to_dict_invalid_annotations_format(example_nimads_studyset):
"""Test _analysis_to_dict raises ValueError when annotations are in an invalid format."""
studyset = Studyset(example_nimads_studyset)
# Here we assume that the annotation is expected to be a dict
# Set annotation to an invalid format (e.g., a string)
for study in studyset.studies:
for analysis in study.analyses:
analysis.metadata["annotations"] = "invalid_format"
with pytest.raises(ValueError):
io.convert_nimads_to_dataset(studyset)
These tests assume that the underlying _analysis_to_dict function (or the conversion process in io.convert_nimads_to_dataset) performs robust type checking and raises a ValueError when encountering:
- sample_sizes that is not a list/tuple,
- sample_sizes with non-numeric values, and
- annotations with an unexpected format.
You may need to adjust the exception type or the input modifications to match your actual implementation of error handling.
Also, ensure that the test file has access to the Studyset class and the necessary fixture definitions (example_nimads_studyset, example_nimads_annotation) or import them as needed.
studyset = Studyset(example_nimads_studyset) | ||
for study in studyset.studies: | ||
for analysis in study.analyses: | ||
analysis.metadata["sample_sizes"] = [2, 20] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Add test for single sample size.
Consider adding a test case where the sample_size
is a single value instead of a list to ensure both code paths are tested.
Suggested implementation:
def test_convert_nimads_to_dataset_single_sample_size(
example_nimads_studyset, example_nimads_annotation
):
"""Test conversion of nimads JSON to nimare dataset with a single sample size value."""
studyset = Studyset(example_nimads_studyset)
for study in studyset.studies:
for analysis in study.analyses:
analysis.metadata["sample_sizes"] = 20
dset = io.convert_nimads_to_dataset(studyset)
assert isinstance(dset, nimare.dataset.Dataset)
assert "sample_sizes" in dset.metadata.columns
def test_convert_sleuth_to_dataset_smoke():
Make sure your test runner picks up the new test function. You may need to adjust the placement of the new test according to your project's test suite conventions.
@@ -62,13 +62,45 @@ | |||
"z": [p.z for p in analysis.points] or [None], | |||
}, | |||
} | |||
sample_size = study.metadata.get("sample_size") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (complexity): Consider extracting the inline sample size and annotation validations into helper functions to improve readability and reduce nesting.
Consider extracting the inline sample size and annotation validations into small helper functions. This will reduce nesting and improve readability. For example, you could do the following:
```python
def validate_sample_sizes(analysis_metadata, study_metadata):
sample_sizes = analysis_metadata.get("sample_sizes")
if sample_sizes is not None and not isinstance(sample_sizes, (list, tuple)):
raise TypeError(
f"Expected sample_sizes to be list or tuple, but got {type(sample_sizes)}"
)
if not sample_sizes:
sample_size = analysis_metadata.get("sample_size") or study_metadata.get("sample_size")
if sample_size is not None and not isinstance(sample_size, (int, float)):
raise TypeError(
f"Expected sample_size to be numeric, but got {type(sample_size)}"
)
return [sample_size] if sample_size is not None else None
return sample_sizes
def validate_annotations(annotations):
labels = {}
for annotation in annotations.values():
if not isinstance(annotation, dict):
raise TypeError(
f"Expected annotation to be dict, but got {type(annotation)}"
)
labels.update(annotation)
return labels
Then simplify your main code as follows:
sample_sizes_validated = validate_sample_sizes(analysis.metadata, study.metadata)
if sample_sizes_validated:
try:
result["metadata"]["sample_sizes"] = sample_sizes_validated
except TypeError as e:
raise TypeError(f"Error converting sample size data to list: {str(e)}") from e
if analysis.annotations:
try:
result["labels"] = validate_annotations(analysis.annotations)
except (TypeError, AttributeError) as e:
raise ValueError(f"Invalid annotation format: {str(e)}") from e
These changes isolate the validation logic into focused units, reducing complexity while preserving functionality.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #919 +/- ##
==========================================
- Coverage 88.23% 88.15% -0.08%
==========================================
Files 48 48
Lines 6383 6400 +17
==========================================
+ Hits 5632 5642 +10
- Misses 751 758 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
references: https://neurostars.org/t/using-neurosynth-compose-and-annotations/32191/2
Changes proposed in this pull request:
sample_sizes
as an analysis metadata variable that can be used to transfer over to a NiMARE datasetSummary by Sourcery
Adds support for 'sample_sizes' as an analysis metadata variable that can be used to transfer over to a NiMARE dataset.
Enhancements: