Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] convert sample sizes #919

Merged
merged 3 commits into from
Mar 11, 2025
Merged

Conversation

jdkent
Copy link
Member

@jdkent jdkent commented Mar 11, 2025

references: https://neurostars.org/t/using-neurosynth-compose-and-annotations/32191/2

Changes proposed in this pull request:

  • add sample_sizes as an analysis metadata variable that can be used to transfer over to a NiMARE dataset

Summary by Sourcery

Adds support for 'sample_sizes' as an analysis metadata variable that can be used to transfer over to a NiMARE dataset.

Enhancements:

  • Adds validation for sample sizes to ensure they are in the correct format (list, tuple, int, or float).
  • Improves error handling and type checking for annotations to ensure they are dictionaries.

Copy link
Contributor

sourcery-ai bot commented Mar 11, 2025

Reviewer's Guide by Sourcery

This pull request introduces the ability to use sample_sizes as an analysis metadata variable when converting NiMADS data to a NiMARE dataset. It also includes validation and error handling improvements for sample sizes and annotations. The implementation involves modifying the _analysis_to_dict function in nimare/io.py to handle sample_sizes and adding a new test case in nimare/tests/test_io.py to verify the functionality.

Sequence diagram for converting analysis to dictionary with sample sizes

sequenceDiagram
  participant Study
  participant Analysis
  participant Result

  Analysis->>Analysis: Get sample_sizes from metadata
  alt sample_sizes is None or empty
    Analysis->>Analysis: Get sample_size from analysis metadata
    alt sample_size is None
      Analysis->>Study: Get sample_size from study metadata
    end
  end
  Analysis->>Result: Add sample_sizes to result metadata
  Result-->>Analysis: Return result
Loading

Updated class diagram for Analysis metadata handling

classDiagram
  class Study {
    metadata: dict
  }
  class Analysis {
    metadata: dict
    annotations: dict
  }
  class Result {
    metadata: dict
    labels: dict
  }
  Analysis --|> Study : Gets sample_size from
  Analysis --|> Result : Adds sample_sizes to
  Analysis --|> Result : Adds annotations to
  note for Analysis "Handles sample_sizes and annotations from metadata"
Loading

File-Level Changes

Change Details Files
Added support for sample_sizes as an analysis metadata variable when converting NiMADS to a NiMARE dataset.
  • Added validation to ensure sample_sizes is a list or tuple.
  • Added logic to extract a single sample_size from analysis or study metadata if sample_sizes is not available.
  • Added validation to ensure a single sample_size is numeric.
  • Added sample_sizes to the result metadata.
  • Added a test case to verify the correct conversion of sample_sizes from NiMADS to a NiMARE dataset.
nimare/io.py
nimare/tests/test_io.py
Improved error handling for annotations.
  • Added a check to ensure that each annotation is a dictionary.
  • Improved the error message when an invalid annotation format is encountered.
nimare/io.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!
  • Generate a plan of action for an issue: Comment @sourcery-ai plan on
    an issue to generate a plan of action for it.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jdkent - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider consolidating the sample size handling logic to reduce redundancy.
  • It might be good to add a test case that covers the scenario where sample_size is in study metadata.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟡 Testing: 2 issues found
  • 🟡 Complexity: 1 issue found
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


# Validate sample sizes if present
if sample_sizes is not None and not isinstance(sample_sizes, (list, tuple)):
raise TypeError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Review broad exception handling for annotations.

The try/except block around updating the 'labels' dictionary catches TypeError and AttributeError broadly and re-raises them as ValueError. This might mask issues that are not related to annotation formatting. Consider narrowing the exception scope to only the operations you expect might fail.

dset = io.convert_nimads_to_dataset(studyset)

assert isinstance(dset, nimare.dataset.Dataset)
assert "sample_sizes" in dset.metadata.columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add tests for edge cases and error handling.

It would be beneficial to include tests that cover edge cases and error conditions within the _analysis_to_dict function, such as when sample_sizes is not a list/tuple, sample_size is not numeric, or when annotations have an invalid format. This ensures that the error handling logic is robust and functions as expected.

Suggested implementation:

def test_convert_nimads_to_dataset_sample_sizes(
    example_nimads_studyset, example_nimads_annotation
):
    """Conversion of nimads JSON to nimare dataset."""
    studyset = Studyset(example_nimads_studyset)
    for study in studyset.studies:
        for analysis in study.analyses:
            analysis.metadata["sample_sizes"] = [2, 20]
import pytest
from nimare import io, dataset  # adjust imports as needed
from nimare.dataset import Dataset
from nimare.io import _analysis_to_dict  # assuming _analysis_to_dict is accessible for testing
def test_analysis_to_dict_invalid_sample_sizes_type(example_nimads_studyset):
    """Test _analysis_to_dict raises ValueError when sample_sizes is not a list/tuple."""
    studyset = Studyset(example_nimads_studyset)
    # Set sample_sizes to an int rather than list/tuple
    for study in studyset.studies:
        for analysis in study.analyses:
            analysis.metadata["sample_sizes"] = 5
    with pytest.raises(ValueError):
        # Trigger conversion which internally calls _analysis_to_dict
        io.convert_nimads_to_dataset(studyset)


def test_analysis_to_dict_non_numeric_sample_sizes(example_nimads_studyset):
    """Test _analysis_to_dict raises ValueError when sample_sizes contains non-numeric values."""
    studyset = Studyset(example_nimads_studyset)
    # Set sample_sizes to a list with non-numeric value
    for study in studyset.studies:
        for analysis in study.analyses:
            analysis.metadata["sample_sizes"] = [2, "twenty"]
    with pytest.raises(ValueError):
        io.convert_nimads_to_dataset(studyset)


def test_analysis_to_dict_invalid_annotations_format(example_nimads_studyset):
    """Test _analysis_to_dict raises ValueError when annotations are in an invalid format."""
    studyset = Studyset(example_nimads_studyset)
    # Here we assume that the annotation is expected to be a dict
    # Set annotation to an invalid format (e.g., a string)
    for study in studyset.studies:
        for analysis in study.analyses:
            analysis.metadata["annotations"] = "invalid_format"
    with pytest.raises(ValueError):
        io.convert_nimads_to_dataset(studyset)

These tests assume that the underlying _analysis_to_dict function (or the conversion process in io.convert_nimads_to_dataset) performs robust type checking and raises a ValueError when encountering:

  1. sample_sizes that is not a list/tuple,
  2. sample_sizes with non-numeric values, and
  3. annotations with an unexpected format.
    You may need to adjust the exception type or the input modifications to match your actual implementation of error handling.
    Also, ensure that the test file has access to the Studyset class and the necessary fixture definitions (example_nimads_studyset, example_nimads_annotation) or import them as needed.

studyset = Studyset(example_nimads_studyset)
for study in studyset.studies:
for analysis in study.analyses:
analysis.metadata["sample_sizes"] = [2, 20]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (testing): Add test for single sample size.

Consider adding a test case where the sample_size is a single value instead of a list to ensure both code paths are tested.

Suggested implementation:

def test_convert_nimads_to_dataset_single_sample_size(
    example_nimads_studyset, example_nimads_annotation
):
    """Test conversion of nimads JSON to nimare dataset with a single sample size value."""
    studyset = Studyset(example_nimads_studyset)
    for study in studyset.studies:
        for analysis in study.analyses:
            analysis.metadata["sample_sizes"] = 20

    dset = io.convert_nimads_to_dataset(studyset)

    assert isinstance(dset, nimare.dataset.Dataset)
    assert "sample_sizes" in dset.metadata.columns

def test_convert_sleuth_to_dataset_smoke():

Make sure your test runner picks up the new test function. You may need to adjust the placement of the new test according to your project's test suite conventions.

@@ -62,13 +62,45 @@
"z": [p.z for p in analysis.points] or [None],
},
}
sample_size = study.metadata.get("sample_size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (complexity): Consider extracting the inline sample size and annotation validations into helper functions to improve readability and reduce nesting.

Consider extracting the inline sample size and annotation validations into small helper functions. This will reduce nesting and improve readability. For example, you could do the following:

```python
def validate_sample_sizes(analysis_metadata, study_metadata):
    sample_sizes = analysis_metadata.get("sample_sizes")
    if sample_sizes is not None and not isinstance(sample_sizes, (list, tuple)):
        raise TypeError(
            f"Expected sample_sizes to be list or tuple, but got {type(sample_sizes)}"
        )
    if not sample_sizes:
        sample_size = analysis_metadata.get("sample_size") or study_metadata.get("sample_size")
        if sample_size is not None and not isinstance(sample_size, (int, float)):
            raise TypeError(
                f"Expected sample_size to be numeric, but got {type(sample_size)}"
            )
        return [sample_size] if sample_size is not None else None
    return sample_sizes

def validate_annotations(annotations):
    labels = {}
    for annotation in annotations.values():
        if not isinstance(annotation, dict):
            raise TypeError(
                f"Expected annotation to be dict, but got {type(annotation)}"
            )
        labels.update(annotation)
    return labels

Then simplify your main code as follows:

sample_sizes_validated = validate_sample_sizes(analysis.metadata, study.metadata)
if sample_sizes_validated:
    try:
        result["metadata"]["sample_sizes"] = sample_sizes_validated
    except TypeError as e:
        raise TypeError(f"Error converting sample size data to list: {str(e)}") from e

if analysis.annotations:
    try:
        result["labels"] = validate_annotations(analysis.annotations)
    except (TypeError, AttributeError) as e:
        raise ValueError(f"Invalid annotation format: {str(e)}") from e

These changes isolate the validation logic into focused units, reducing complexity while preserving functionality.

Copy link

codecov bot commented Mar 11, 2025

Codecov Report

Attention: Patch coverage is 68.18182% with 7 lines in your changes missing coverage. Please review.

Project coverage is 88.15%. Comparing base (895c154) to head (42b749c).

Files with missing lines Patch % Lines
nimare/io.py 68.18% 7 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #919      +/-   ##
==========================================
- Coverage   88.23%   88.15%   -0.08%     
==========================================
  Files          48       48              
  Lines        6383     6400      +17     
==========================================
+ Hits         5632     5642      +10     
- Misses        751      758       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jdkent jdkent merged commit b838dbb into neurostuff:main Mar 11, 2025
8 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant