Feat/imputation improvements by jorgeMFS · Pull Request #74 · jorgeMFS/PhenoQC

jorgeMFS · 2025-08-09T23:48:25Z

Summary by Sourcery

Add comprehensive imputation‐bias diagnostics throughout the pipeline, including a new bias report function, CLI/GUI configuration options, report visualization enhancements, imputation mask tracking, mapping improvements, and accompanying example scripts and tests.

New Features:

Add imputation-bias diagnostic metric to QC reports, CLI, GUI, and batch pipeline
Introduce CLI flags and GUI controls to enable bias diagnostics and configure SMD, variance ratio, and KS thresholds
Implement imputation_bias_report function computing standardized mean differences, variance ratios, KS‐test triggers, and warning flags
Integrate traffic‐light status bar in PDF/Markdown reports for quick visualization of bias warnings
Extend central quality_metrics registry to include imputation_bias

Enhancements:

Track per-cell imputation mask in missing_data engine to support bias diagnostics
Allow direct ontology identifier pass-through in mapping (e.g., HP:0001250)
Refactor example and test scripts with consistent naming and paths for end-to-end scenarios

Documentation:

Update README with new end-to-end and quality metrics example scripts

Tests:

Add tests for imputation_bias_report covering mean, KNN, MICE, SVD imputers and warning thresholds

…: track imputation mask; feat(batch): include bias results in *_qc_summary.json

…: controls for bias diag; test: add basic imputation bias tests

…s tests pass

… all strategies in imputation bias tests

…taset, dedicated schema/config, and QC JSON validation

…_cli_test; use output/end_to_end and e2e_* filenames; improve naming across script

…e2e_medium_cli_test (medium), e2e_small_quality_metrics_cli_test (small); standardize output folder names; update imputation params script outputs

…rdized to end_to_end_e2e_cli_test.py

…update README to e2e_* scripts

….py wrapper to e2e_medium; keep standardized naming while preserving legacy entry points

…ts (superseded by output/end_to_end)

…e_medium_cli_test.py directly

…gnostics in PDF/MD and GUI; pass thresholds to report; fix PDF indentation issues

…as thresholds at top-level in session_state; avoid list-index TypeError when enabling bias

… Quality when enabled; pass enabled metrics from pipeline

…or quality_metrics and thresholds; fix report section gating; improve ontology mapping (ID indexing + direct ID pass-through)

…d config/schema under scripts/; scenarios now run end-to-end

sourcery-ai · 2025-08-09T23:48:32Z

Reviewer's Guide

This PR implements a comprehensive imputation-bias diagnostic workflow by introducing a dedicated reporting function, tracking per-cell imputation masks, enhancing mapping logic for ontology IDs, extending the QC report generation with a bias diagnostics section (including traffic-light status bars), and exposing new CLI/GUI controls and example scripts to configure and demonstrate end-to-end bias checks.

Sequence diagram for imputation-bias diagnostics in batch QC pipeline

sequenceDiagram
    participant CLI
    participant BatchProcessor
    participant MissingDataEngine
    participant QualityMetrics
    participant ReportGenerator
    CLI->>BatchProcessor: Start batch_process (with bias thresholds)
    BatchProcessor->>MissingDataEngine: Impute missing data
    MissingDataEngine->>BatchProcessor: Return imputed_df, imputation_mask
    BatchProcessor->>QualityMetrics: imputation_bias_report(original_df, imputed_df, imputation_mask, thresholds)
    QualityMetrics-->>BatchProcessor: Return bias diagnostics DataFrame
    BatchProcessor->>ReportGenerator: generate_qc_report(..., bias_diagnostics, bias_thresholds)
    ReportGenerator-->>BatchProcessor: QC report with bias diagnostics

Class diagram for imputation bias diagnostics and mask tracking

classDiagram
    class MissingDataEngine {
        +imputation_mask: Dict[str, pd.Series]
        +fit_transform(df: pd.DataFrame) pd.DataFrame
        +_update_imputation_mask(before: pd.DataFrame, after: pd.DataFrame, cols: List[str])
        _apply_mean(...)
        _apply_median(...)
        _apply_mode(...)
        _apply_knn(...)
        _apply_mice(...)
        _apply_svd(...)
    }
    class BiasRow {
        +column: str
        +n_obs: int
        +n_imp: int
        +mean_diff: float
        +smd: float
        +var_ratio: float
        +ks_stat: Optional[float]
        +ks_p: Optional[float]
        +warn: bool
    }
    class QualityMetrics {
        +imputation_bias_report(...): pd.DataFrame
    }
    MissingDataEngine --> "imputation_mask" BiasRow
    QualityMetrics --> "imputation_bias_report" BiasRow

Class diagram for enhanced OntologyMapper mapping logic

classDiagram
    class OntologyMapper {
        +PREFIX_ALIASES: dict
        +DEFAULT_ID_REGEX: dict
        +_id_regex: dict
        +_alt_to_primary: Dict[str, Dict[str, str]]
        +map_term(term, target_ontologies, custom_mappings) Dict[str, Optional[str]]
        +_extract_direct_id(term_norm)
        +_normalize_text(text: str)
        +_build_id_regex_from_config(cfg: dict)
        +_scan_alt_map_obo(ontology_file_path: str)
        +_augment_alt_ids_from_obo(ontology_file_path: str, mapping: Dict[str, str])
    }
    OntologyMapper --> "_alt_to_primary" Ontology

File-Level Changes

Change	Details	Files
Add imputation-bias reporting function and pipeline integration	Define BiasRow dataclass and imputation_bias_report in quality_metrics.py Invoke imputation_bias_report in batch_processing.py and attach bias_diagnostics & thresholds to QC report payload Populate QC summary JSON with bias_rows under quality_metrics.imputation_bias	`src/phenoqc/quality_metrics.py` `src/phenoqc/batch_processing.py`
Track per-cell imputation mask in the missing_data engine	Capture before/after DataFrame slices for each imputation strategy Implement _update_imputation_mask to flag newly imputed cells Store imputation_mask on the engine for downstream bias analysis	`src/phenoqc/missing_data.py`
Enhance OntologyMapper to support direct IDs, normalization, and alt_id indexing	Add PREFIX_ALIASES and DEFAULT_ID_REGEX with _build_id_regex_from_config Introduce _normalize_text for term normalization Extend parse_ontology to index alt_id/xref and add _augment_alt_ids_from_obo and _scan_alt_map_obo Revamp map_term to extract direct IDs, apply prefix mapping, and fall back to fuzzy matching	`src/phenoqc/mapping.py` `tests/test_mapping.py`
Extend QC report generation with bias diagnostics and status visuals	Add bias_diagnostics and bias_thresholds parameters to generate_qc_report Insert PDF section showing triggers and traffic-light table for top variables Insert Markdown section rendering triggers and a markdown table for bias diagnostics	`src/phenoqc/reporting.py`
Expose CLI/GUI configuration flags and update examples/tests	Add --bias-* flags to CLI and pass thresholds into batch processing Add GUI controls to enable bias diagnostics and sync thresholds to session_state Rename and add end-to-end and quality-metrics scripts under scripts/, update unified_scenarios_test Add test_imputation_bias.py and update README with new example scripts	`src/phenoqc/cli.py` `src/phenoqc/gui/gui.py` `scripts/` `tests/test_imputation_bias.py` `README.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

codecov-commenter · 2025-08-09T23:49:54Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

❌ Patch coverage is 38.98305% with 252 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/phenoqc/reporting.py	2.45%	113 Missing and 6 partials ⚠️
src/phenoqc/gui/gui.py	0.00%	60 Missing ⚠️
src/phenoqc/mapping.py	71.12%	31 Missing and 10 partials ⚠️
src/phenoqc/batch_processing.py	6.25%	15 Missing ⚠️
src/phenoqc/missing_data.py	68.00%	4 Missing and 4 partials ⚠️
src/phenoqc/quality_metrics.py	88.63%	4 Missing and 1 partial ⚠️
src/phenoqc/cli.py	0.00%	4 Missing ⚠️
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Files with missing lines	Coverage Δ
src/phenoqc/cli.py	`0.00% <0.00%> (ø)`
src/phenoqc/quality_metrics.py	`85.29% <88.63%> (+1.16%)`	⬆️
src/phenoqc/missing_data.py	`65.54% <68.00%> (+9.45%)`	⬆️
src/phenoqc/batch_processing.py	`16.39% <6.25%> (-0.59%)`	⬇️
src/phenoqc/mapping.py	`68.37% <71.12%> (+4.41%)`	⬆️
src/phenoqc/gui/gui.py	`0.00% <0.00%> (ø)`
src/phenoqc/reporting.py	`37.97% <2.45%> (-11.26%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

sourcery-ai

Hey @jorgeMFS - I've reviewed your changes - here's some feedback:

Blocking issues:

Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'. (link)

General comments:

I see the trigger‐reason logic (SMD, variance ratio, KS p‐value) is copied in the PDF report builder, Markdown exporter, and GUI—consider extracting it to a shared utility so any future tweaks stay in sync.
The code for parsing whether bias diagnostics are enabled (list vs dict, top‐level vs nested under quality_metrics) is scattered across CLI, GUI, and batch processing—centralizing that parsing into one helper would reduce subtle inconsistencies.
In imputation_bias_report you use sqrt(obs_var) as the pooled standard deviation—double-check this against the conventional two‐sample SMD formula or document this choice to avoid confusion about how SMD is calculated.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- I see the trigger‐reason logic (SMD, variance ratio, KS p‐value) is copied in the PDF report builder, Markdown exporter, and GUI—consider extracting it to a shared utility so any future tweaks stay in sync.
- The code for parsing whether bias diagnostics are enabled (list vs dict, top‐level vs nested under quality_metrics) is scattered across CLI, GUI, and batch processing—centralizing that parsing into one helper would reduce subtle inconsistencies.
- In `imputation_bias_report` you use sqrt(obs_var) as the pooled standard deviation—double-check this against the conventional two‐sample SMD formula or document this choice to avoid confusion about how SMD is calculated.

## Individual Comments

### Comment 1
<location> `src/phenoqc/mapping.py:185` </location>
<code_context>
                 mappings[ontology_id] = custom_id
             return mappings

+        # If the input already looks like an ontology ID, accept it directly for the matching ontology
+        # e.g., 'hp:0001250', 'doid:9352', 'mp:0000001'
+        if term_lower.startswith('hp:'):
+            direct = {'HPO': term.replace('hp:', 'HP:')}
+            # fill non-targets with None below
+        elif term_lower.startswith('doid:'):
+            direct = {'DO': term.replace('doid:', 'DOID:')}
+        elif term_lower.startswith('mp:'):
+            direct = {'MPO': term.replace('mp:', 'MP:')}
+        else:
+            direct = {}
+
</code_context>

<issue_to_address>
Direct ID pass-through logic may not handle case variations or alternative prefixes.

The current logic only matches lowercase prefixes. To improve robustness, normalize input casing or allow for additional prefix formats.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
        # If the input already looks like an ontology ID, accept it directly for the matching ontology
        # e.g., 'hp:0001250', 'doid:9352', 'mp:0000001'
        if term_lower.startswith('hp:'):
            direct = {'HPO': term.replace('hp:', 'HP:')}
            # fill non-targets with None below
        elif term_lower.startswith('doid:'):
            direct = {'DO': term.replace('doid:', 'DOID:')}
        elif term_lower.startswith('mp:'):
            direct = {'MPO': term.replace('mp:', 'MP:')}
        else:
            direct = {}
=======
        # If the input already looks like an ontology ID, accept it directly for the matching ontology
        # e.g., 'hp:0001250', 'doid:9352', 'mp:0000001', 'HP:0001250', 'DOID:9352', 'MP:0000001'
        import re

        direct = {}
        hp_match = re.match(r'^(hp:|HP:)(\d+)$', term.strip())
        doid_match = re.match(r'^(doid:|DOID:)(\d+)$', term.strip())
        mp_match = re.match(r'^(mp:|MP:)(\d+)$', term.strip())

        if hp_match:
            direct = {'HPO': f'HP:{hp_match.group(2)}'}
        elif doid_match:
            direct = {'DO': f'DOID:{doid_match.group(2)}'}
        elif mp_match:
            direct = {'MPO': f'MP:{mp_match.group(2)}'}
        else:
            direct = {}
>>>>>>> REPLACE

</suggested_fix>

### Comment 2
<location> `src/phenoqc/batch_processing.py:788` </location>
<code_context>
+                        original_df = pd.concat(all_chunks) if all_chunks else sample_df
+                    except Exception:
+                        original_df = sample_df
+                    bias_df = imputation_bias_report(
+                        original_df=original_df if not original_df.empty else chunk,
+                        imputed_df=chunk,
+                        imputation_mask=getattr(engine, 'imputation_mask', {}),
+                        smd_threshold=float(bias_thresholds.get('smd_threshold', 0.10)),
+                        var_ratio_low=float(bias_thresholds.get('var_ratio_low', 0.5)),
+                        var_ratio_high=float(bias_thresholds.get('var_ratio_high', 2.0)),
+                        ks_alpha=float(bias_thresholds.get('ks_alpha', 0.05)),
+                    )
+            except Exception as _bias_ex:
+                log_activity(f"{file_path}: imputation-bias diagnostic failed: {_bias_ex}", level="warning")
</code_context>

<issue_to_address>
original_df fallback logic may not always reflect the true original data.

Please clarify or document the fallback to sample_df, or ensure sample_df is always a valid stand-in for the original data when all_chunks is empty.
</issue_to_address>

### Comment 3
<location> `tests/test_imputation_bias.py:8` </location>
<code_context>
+from phenoqc.quality_metrics import imputation_bias_report
+
+
+def test_imputation_bias_mean_imputer_warns_on_variance():
+    rng = np.random.RandomState(0)
+    n = 200
+    x = rng.normal(loc=0.0, scale=1.0, size=n)
+    # Inject 30% missing completely at random
+    mask = rng.rand(n) < 0.3
+    df = pd.DataFrame({'x': x})
+    df_missing = df.copy()
+    df_missing.loc[mask, 'x'] = np.nan
+
+    engine = ImputationEngine({'strategy': 'mean'})
+    df_imputed = engine.fit_transform(df_missing)
+    imputation_mask = getattr(engine, 'imputation_mask', {})
+
+    bias_df = imputation_bias_report(
+        original_df=df_missing, imputed_df=df_imputed, imputation_mask=imputation_mask,
+        columns=['x'], smd_threshold=0.05
+    )
+    assert isinstance(bias_df, pd.DataFrame)
+    assert not bias_df.empty
+    # With mean imputation, variance typically shrinks and warn flags can appear
+    assert bool(bias_df.iloc[0]['warn']) in (True, False)
+
+
</code_context>

<issue_to_address>
Good coverage of mean imputation bias, but missing explicit checks for edge cases like all missing or all observed.

Please add tests for cases where the input column contains only missing values or only observed values to verify that the bias report returns an empty DataFrame or handles these scenarios without errors.
</issue_to_address>

<suggested_fix>
<<<<<<< SEARCH
def test_imputation_bias_mean_imputer_warns_on_variance():
    rng = np.random.RandomState(0)
    n = 200
    x = rng.normal(loc=0.0, scale=1.0, size=n)
    # Inject 30% missing completely at random
    mask = rng.rand(n) < 0.3
    df = pd.DataFrame({'x': x})
    df_missing = df.copy()
    df_missing.loc[mask, 'x'] = np.nan

    engine = ImputationEngine({'strategy': 'mean'})
    df_imputed = engine.fit_transform(df_missing)
    imputation_mask = getattr(engine, 'imputation_mask', {})

    bias_df = imputation_bias_report(
        original_df=df_missing, imputed_df=df_imputed, imputation_mask=imputation_mask,
        columns=['x'], smd_threshold=0.05
    )
    assert isinstance(bias_df, pd.DataFrame)
    assert not bias_df.empty
    # With mean imputation, variance typically shrinks and warn flags can appear
    assert bool(bias_df.iloc[0]['warn']) in (True, False)
=======
def test_imputation_bias_mean_imputer_warns_on_variance():
    rng = np.random.RandomState(0)
    n = 200
    x = rng.normal(loc=0.0, scale=1.0, size=n)
    # Inject 30% missing completely at random
    mask = rng.rand(n) < 0.3
    df = pd.DataFrame({'x': x})
    df_missing = df.copy()
    df_missing.loc[mask, 'x'] = np.nan

    engine = ImputationEngine({'strategy': 'mean'})
    df_imputed = engine.fit_transform(df_missing)
    imputation_mask = getattr(engine, 'imputation_mask', {})

    bias_df = imputation_bias_report(
        original_df=df_missing, imputed_df=df_imputed, imputation_mask=imputation_mask,
        columns=['x'], smd_threshold=0.05
    )
    assert isinstance(bias_df, pd.DataFrame)
    assert not bias_df.empty
    # With mean imputation, variance typically shrinks and warn flags can appear
    assert bool(bias_df.iloc[0]['warn']) in (True, False)


def test_imputation_bias_all_missing_column():
    # All values missing
    df_missing = pd.DataFrame({'x': [np.nan] * 10})
    engine = ImputationEngine({'strategy': 'mean'})
    df_imputed = engine.fit_transform(df_missing)
    imputation_mask = getattr(engine, 'imputation_mask', {})
    bias_df = imputation_bias_report(
        original_df=df_missing, imputed_df=df_imputed, imputation_mask=imputation_mask,
        columns=['x'], smd_threshold=0.05
    )
    assert isinstance(bias_df, pd.DataFrame)
    # Should be empty or have no rows for 'x'
    assert bias_df.empty or not (bias_df['column'] == 'x').any()


def test_imputation_bias_all_observed_column():
    # All values observed, no missing
    df_observed = pd.DataFrame({'x': np.arange(10)})
    engine = ImputationEngine({'strategy': 'mean'})
    df_imputed = engine.fit_transform(df_observed)
    imputation_mask = getattr(engine, 'imputation_mask', {})
    bias_df = imputation_bias_report(
        original_df=df_observed, imputed_df=df_imputed, imputation_mask=imputation_mask,
        columns=['x'], smd_threshold=0.05
    )
    assert isinstance(bias_df, pd.DataFrame)
    # Should be empty or have no rows for 'x'
    assert bias_df.empty or not (bias_df['column'] == 'x').any()
>>>>>>> REPLACE

</suggested_fix>

## Security Issues

### Issue 1
<location> `scripts/end_to_end_e2e_cli_test.py:222` </location>

<issue_to_address>
**security (python.lang.security.audit.dangerous-subprocess-use-audit):** Detected subprocess function 'run' without a static string. If this data can be controlled by a malicious actor, it may be an instance of command injection. Audit the use of this call to ensure it is not controllable by an external resource. You may consider using 'shlex.escape()'.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

src/phenoqc/mapping.py

src/phenoqc/batch_processing.py

tests/test_imputation_bias.py

scripts/end_to_end_e2e_cli_test.py

src/phenoqc/batch_processing.py

tests/test_imputation_bias.py

src/phenoqc/quality_metrics.py

tests/test_imputation_bias.py

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

…run-tests Initialize alt ID mapping before loading ontologies

…e imputation bias handling

jorgeMFS · 2025-08-10T11:17:47Z

@sourcery-ai dismiss @sourcery-ai review @sourcery-ai summary

Automated Sourcery review dismissed.

jorgeMFS · 2025-08-10T11:18:29Z

@sourcery-ai guide

jorgeMFS added 17 commits August 9, 2025 20:05

feat(metrics): add optional imputation-bias diagnostic; feat(missing)…

ea908c4

…: track imputation mask; feat(batch): include bias results in *_qc_summary.json

feat(reporting): add Imputation-bias diagnostics to PDF/MD; feat(gui)…

f26f4aa

…: controls for bias diag; test: add basic imputation bias tests

feat(missing): track imputation_mask for mean/median/mode; tests: bia…

28d6155

…s tests pass

feat(reporting): add traffic-light status table for bias; test: cover…

6b4be56

… all strategies in imputation bias tests

scripts: add super_comprehensive_cli_test with large heterogeneous da…

f0b6bca

…taset, dedicated schema/config, and QC JSON validation

chore(scripts): rename super_comprehensive_cli_test to end_to_end_e2e…

283e853

…_cli_test; use output/end_to_end and e2e_* filenames; improve naming across script

chore(scripts): finalize renames to end_to_end_e2e_cli_test (large), …

13f3a17

…e2e_medium_cli_test (medium), e2e_small_quality_metrics_cli_test (small); standardize output folder names; update imputation params script outputs

chore(scripts): remove legacy super_comprehensive_cli_test.py; standa…

7773e1c

…rdized to end_to_end_e2e_cli_test.py

chore(naming): remove legacy comprehensive/super_* files and folder; …

f3320bd

…update README to e2e_* scripts

chore(compat): restore super_* configs and add comprehensive_cli_test…

fbb1304

….py wrapper to e2e_medium; keep standardized naming while preserving legacy entry points

chore(clean): remove stale scripts/output/super_comprehensive artifac…

5a6a4e3

…ts (superseded by output/end_to_end)

chore(clean): remove legacy comprehensive_cli_test.py; use scripts/e2…

fa42911

…e_medium_cli_test.py directly

feat(report/gui): add Triggers explainability for imputation-bias dia…

79ee754

…gnostics in PDF/MD and GUI; pass thresholds to report; fix PDF indentation issues

fix(gui): handle quality_metrics as list or dict; store imputation_bi…

ede45d5

…as thresholds at top-level in session_state; avoid list-index TypeError when enabling bias

fix(report): avoid duplicate quality sections; only render Additional…

4e39c6c

… Quality when enabled; pass enabled metrics from pipeline

feat: bias diagnostics explainability in PDF/MD/GUI; GUI robustness f…

25aa287

…or quality_metrics and thresholds; fix report section gating; improve ontology mapping (ID indexing + direct ID pass-through)

chore(scripts): fix unified_scenarios paths to generator/benchmark an…

4f55637

…d config/schema under scripts/; scenarios now run end-to-end

sourcery-ai bot previously requested changes Aug 9, 2025

View reviewed changes

jorgeMFS and others added 9 commits August 10, 2025 01:01

Update src/phenoqc/mapping.py

a3308a6

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Update tests/test_imputation_bias.py

25e5bd5

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Update tests/test_imputation_bias.py

378a324

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Update src/phenoqc/quality_metrics.py

190cf3c

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

Update src/phenoqc/batch_processing.py

9b1250f

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>

improving mapping

82a2d49

Initialize alt ID mapping before ontology loading

673adf4

Merge pull request #75 from jorgeMFS/codex/fix-ontology-mappings-and-…

fe1c7c2

…run-tests Initialize alt ID mapping before loading ontologies

Fix test failures: resolve _alt_to_primary attribute error and improv…

98a53bc

…e imputation bias handling

jorgeMFS merged commit 0936d66 into main Aug 10, 2025
4 checks passed

jorgeMFS deleted the feat/imputation-improvements branch August 10, 2025 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/imputation improvements#74

Feat/imputation improvements#74
jorgeMFS merged 26 commits intomainfrom
feat/imputation-improvements

jorgeMFS commented Aug 9, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Aug 9, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codecov-commenter commented Aug 9, 2025 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorgeMFS commented Aug 10, 2025

Uh oh!

jorgeMFS commented Aug 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jorgeMFS commented Aug 9, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for imputation-bias diagnostics in batch QC pipeline

Class diagram for imputation bias diagnostics and mask tracking

Class diagram for enhanced OntologyMapper mapping logic

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

codecov-commenter commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jorgeMFS commented Aug 10, 2025

Uh oh!

jorgeMFS commented Aug 10, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jorgeMFS commented Aug 9, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Aug 9, 2025 •

edited

Loading

codecov-commenter commented Aug 9, 2025 •

edited

Loading