Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
92 commits
Select commit Hold shift + click to select a range
6ed71ae
Fix metatraits transform input file path.
realmarcin Mar 21, 2026
8604c09
Fix NCBITaxon adapter to use .db file instead of .owl.
realmarcin Mar 21, 2026
0ecb645
Merge chemical_mappings branch into fix_metatraits
realmarcin Mar 22, 2026
4277bbd
Enhance metatraits transform with chemical mapping and expanded METPO…
realmarcin Mar 22, 2026
d931cc8
Fix ChEBI categories: normalize to ChemicalSubstance
realmarcin Mar 22, 2026
4217349
Fix CHEBI category in microbial_trait_mappings to ChemicalSubstance
realmarcin Mar 22, 2026
decbd97
Add multiprocessing optimization to MetaTraits transforms (2-3x speedup)
realmarcin Mar 23, 2026
f04577e
Clarify NCBITaxon database download message in metatraits transform
realmarcin Mar 23, 2026
be2bbdf
Fix NCBITaxon database handling: use symlink to OAK cache
realmarcin Mar 23, 2026
cd0a365
Add item-level parallelism and limit GTDB metatraits to species-only
realmarcin Mar 24, 2026
d7264c2
Disable OAK adapter for GTDB metatraits - use GTDB metadata only
realmarcin Mar 24, 2026
9c20b0f
Fix worker class instantiation to use correct transform type
realmarcin Mar 24, 2026
e59cde1
Fix chunked parallelism worker count and drop_duplicates API
realmarcin Mar 24, 2026
a06962f
Suppress pkg_resources deprecation warning from eutils
realmarcin Mar 25, 2026
0e9bbb9
Fix metatraits_gtdb writing output to wrong directory
realmarcin Mar 25, 2026
ae151d9
Add comprehensive METPO mapping analysis for metatraits transform
realmarcin Mar 25, 2026
0997784
Fix critical phenotype mapping bugs and implement METPO-first resolution
realmarcin Mar 25, 2026
f7880b4
Add comprehensive documentation for METPO-first implementation
realmarcin Mar 25, 2026
e91fa3b
Fix critical bug in METPO-first Tier 2 filter
realmarcin Mar 26, 2026
ba99e49
Use correct METPO synonym column for metatraits transform
realmarcin Mar 26, 2026
044319f
Reduce multiprocessing semaphore leaks by disposing OAK adapters
realmarcin Mar 26, 2026
f04faff
Add Phase 5 coverage analysis for METPO-first implementation
realmarcin Mar 26, 2026
d3403d5
Add DuckDB query skill for organism and media queries
realmarcin Mar 27, 2026
7d6c404
Fix metatraits issues and add strain resolution strategy
realmarcin Mar 27, 2026
0531bba
Add accession-based GTDB taxonomy mapping and hierarchical edge frame…
realmarcin Mar 28, 2026
9df5685
Fix ImportError: use SAME_AS_PREDICATE instead of non-existent OWL_SA…
realmarcin Mar 28, 2026
3501849
Add comprehensive METPO term proposals for unmapped traits
realmarcin Mar 28, 2026
070fb82
Eliminate all hardcoded mappings and create METPO proposals
realmarcin Apr 6, 2026
77e07d8
Add enzyme GO mappings, required-for-growth resolver, and chemical sy…
realmarcin Apr 6, 2026
2a6f10f
Fix multiprocessing: add enzyme_name_to_go to worker shared data
realmarcin Apr 6, 2026
368fdcd
Add EC2GO mapping and fix Round 2 unmapped traits
realmarcin Apr 7, 2026
14b5ba5
Add Round 3 unmapped trait improvements
realmarcin Apr 7, 2026
481ece4
Add Round 3 implementation summary document
realmarcin Apr 7, 2026
7b417b8
Add audit-mappings skill for data-driven architecture compliance
realmarcin Apr 7, 2026
f990d4d
Merge master into fix_metatraits
realmarcin Apr 7, 2026
3ab6504
Address Copilot PR review feedback
realmarcin Apr 7, 2026
c6c0db3
Address Copilot PR review comments for PR #531
realmarcin Apr 7, 2026
5392c37
Fix CI check errors for PR #531
realmarcin Apr 7, 2026
f83c358
Fix final CI check error for PR #531
realmarcin Apr 7, 2026
82d1d4b
Replace KGM:alkaliphilic placeholder with METPO:1003002
realmarcin Apr 7, 2026
9441559
Update metpo_gaps_metadata.tsv to reflect alkaliphilic term found
realmarcin Apr 7, 2026
87afdcb
Add handlers for growth temperature/NaCl observations (170K+ obs)
realmarcin Apr 8, 2026
2e61e68
Add Priority 2 chemical and enzyme mappings (~120 observations)
realmarcin Apr 8, 2026
8b1cd8b
Add Priority 3 antibiotic/secondary metabolite mappings (52 observati…
realmarcin Apr 8, 2026
a137d45
Add comprehensive final summary of unmapped traits work
realmarcin Apr 8, 2026
2e5a070
Add comprehensive hardcoded mappings audit report
realmarcin Apr 8, 2026
5939d51
Add comprehensive external mapping files report
realmarcin Apr 8, 2026
832f1a9
Add comprehensive provenance documentation for 366 manually curated m…
realmarcin Apr 8, 2026
c9f198a
Add Phase 1 provenance work summary
realmarcin Apr 8, 2026
1f9be5a
Phase 2: Migrate chemical mappings to unified file with provenance
realmarcin Apr 8, 2026
e021d72
Add Phase 2 migration completion summary
realmarcin Apr 8, 2026
b143da1
Fix UnboundLocalError: remove duplicate chebi_id check
realmarcin Apr 8, 2026
4f844a4
Add migration verification report
realmarcin Apr 8, 2026
641cfcd
Fix metatraits_gtdb: remove deprecated chemical_name_synonyms
realmarcin Apr 8, 2026
d4dfba4
Add post-migration unmapped analysis and quick wins plan
realmarcin Apr 8, 2026
92c83f9
Implement 3 quick wins for unmapped trait reduction
realmarcin Apr 8, 2026
740d8de
Fix: Add concentration prefix stripping to growth/metabolic resolvers
realmarcin Apr 8, 2026
f18af23
Fix: Repair corrupted unified chemical mappings file
realmarcin Apr 8, 2026
6c86e48
Fix: Add concentration suffix stripping to all chemical resolvers
realmarcin Apr 8, 2026
443b595
Add 4-aminovalerate synonym to CHEBI:15887 in unified mappings
realmarcin Apr 8, 2026
5afa691
Add Options A & B implementation summary documentation
realmarcin Apr 8, 2026
df14535
Add final comprehensive unmapped analysis after all fixes
realmarcin Apr 9, 2026
7925965
Refactor: replace hardcoded NCBI-GTDB mapping, fix enzyme GO IDs, arc…
realmarcin Apr 9, 2026
ef06941
Fix mapping errors across all 5 metatraits TSV files
realmarcin Apr 9, 2026
bbc7602
Revert gelatinase mapping to GO:0004175 (endopeptidase activity)
realmarcin Apr 9, 2026
5950765
Fix spurious synonyms in unified_chemical_mappings.tsv.gz (Phase 1)
realmarcin Apr 9, 2026
3acc900
Fix additional spurious synonyms and add Phase 2 OLS audit (unified_c…
realmarcin Apr 9, 2026
a9aee46
Fix unified_chemical_mappings.tsv.gz: remove invalid rows and correct…
realmarcin Apr 9, 2026
938118c
Add 31 new ontology mappings for high-frequency unmapped metatraits
realmarcin Apr 9, 2026
65829fe
Add KGM custom terms for 108 secondary metabolites with no public CHE…
realmarcin Apr 9, 2026
63bc66d
Fix negative assertion handling in Tier 3 resolvers
realmarcin Apr 9, 2026
465580e
Fix test expectation for enzyme activity: oxidase mapping
realmarcin Apr 9, 2026
11ce37f
Add kg-model-review skill for KGX/Biolink/METPO alignment checks
realmarcin Apr 14, 2026
78a0234
Migrate KGM: prefix to typed kgmicrobe.X: namespaces
realmarcin Apr 15, 2026
feaef9a
Fix critical KG modeling errors in mediadive, madin_etal, and metatraits
realmarcin Apr 15, 2026
e40fe2e
Fix Tier 1 contamination in unified_chemical_mappings.tsv.gz
realmarcin Apr 15, 2026
33420ba
Fix biolink: predicate used as relation in metatraits quantitative ed…
realmarcin Apr 15, 2026
e9f08f7
Fix duplicate IDs in bacdive/mediadive and improve KG model review co…
realmarcin Apr 16, 2026
12d08c9
Fix CI failures: ruff E501/F401 in madin_etal/mediadive and docstr-co…
realmarcin Apr 16, 2026
9ddf952
Add and fix test_transform_category_alignment: use ChemicalSubstance …
realmarcin Apr 16, 2026
bcc68df
Fix CI: exclude scripts/ from pytest collection via norecursedirs
realmarcin Apr 16, 2026
1de973d
Fix CI: skip chebi_categories tests when ontologies transform not run
realmarcin Apr 16, 2026
d96583a
Address Copilot PR #531 review: fix query_utils bugs and docs
realmarcin Apr 16, 2026
8f168d5
Drop archived mapping directories from the repo and ignore them
realmarcin Apr 16, 2026
353bd49
Organize PR markdown: move root notes to notes/, register docs/ files
realmarcin Apr 16, 2026
14c9b14
Drop one-time mapping fix scripts; keep reusable validator
realmarcin Apr 17, 2026
d3c8f9d
Drop stale PR work-log markdown from notes/ and docs/; gitignore futu…
realmarcin Apr 17, 2026
795de5f
Fix madin_etal empty categories and expand kg-model-review allowlists
realmarcin Apr 17, 2026
247e1b8
Normalize transform/merge TSV schemas and fix post-merge cleanup
realmarcin Apr 18, 2026
07a06af
Address three deferred KG modeling findings
realmarcin Apr 18, 2026
be95a6d
Fix CI: update test_attributes_1_node_header for new `deprecated` column
realmarcin Apr 18, 2026
03f5414
Address Copilot review feedback on PR #531
realmarcin Apr 18, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions .claude/skills/audit-mappings/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
name: audit-mappings
description: Audit transform code for hardcoded ontology mappings and generate coverage report
---

# Audit Mappings Skill

Scans KG-Microbe transform code to identify hardcoded ontology mappings and generates a comprehensive audit report. This skill helps maintain data-driven architecture by detecting inline CURIE mappings that should be moved to mapping files.

## What This Skill Audits

### Python Code Patterns
- **Hardcoded dictionaries** with CURIE values (e.g., `{"trait": "METPO:12345"}`)
- **Inline string assignments** with ontology prefixes (CHEBI:, GO:, EC:, METPO:, etc.)
- **Mapping dictionaries** embedded in transform code
- Filters out false positives: imports, comments, docstrings, configuration constants

### Mapping Files
- **TSV files** in `kg_microbe/transform_utils/*/mappings/*.tsv`
- **YAML files** like `custom_curies.yaml`
- **JSON files** matching pattern `*mapping*.json`
- Counts entries and categorizes by type

## Usage

### Audit all transforms
```bash
/audit-mappings
```

### Audit specific transform
```bash
/audit-mappings --transform metatraits
```

### Generate detailed report with code snippets
```bash
/audit-mappings --transform bacdive --verbose
```

### Generate markdown report
```bash
/audit-mappings --format md > mapping_audit_report.md
```

### Inventory mapping files only (skip code scanning)
```bash
/audit-mappings --mapping-files-only
```

## Options

- `--transform NAME` - Audit specific transform (default: all)
- `--format {text,json,md}` - Output format (default: text)
- `--verbose` - Include code snippets and line numbers
- `--mapping-files-only` - Only scan mapping files, skip Python code

## Output Format

### Text Format (default)
```
=== Hardcoded Mapping Audit Report ===
Date: 2026-04-06

Transform: metatraits
Python hardcoded mappings: 1
- metatraits.py:52-95 (METPO_TO_BIOLINK_PREDICATE, 44 entries)

Mapping files: 5
- chemical_name_synonyms.tsv (44 entries)
- enzyme_name_to_go.tsv (34 entries)
- special_chemical_mappings.tsv (35 entries)
- ec2go.txt (4,822 entries)

Status: ✅ 99.97% data-driven

---
Summary:
Total transforms scanned: 20
Transforms with hardcoded mappings: 15
Total mapping files: 25
Total mapping entries: 5,200+
```

### JSON Format
```json
{
"report_date": "2026-04-06",
"transforms": [
{
"name": "metatraits",
"hardcoded_mappings": [...],
"mapping_files": [...],
"data_driven_percentage": 99.97
}
],
"summary": {...}
}
```

## Classification Rules

### Data-driven (Good) ✅
- Mappings loaded from TSV/YAML/JSON files
- Dynamic lookups via OAK/ChemicalMappingLoader
- Predicate lookups via resolver methods

### Hardcoded (Flag) ⚠️
- Inline dictionaries with >5 CURIE mappings
- String literals with CURIEs in business logic
- Should be migrated to mapping files

### Acceptable Hardcoded
- API endpoints and URLs
- Configuration constants (paths, file names)
- Schema-level mappings (e.g., METPO → Biolink predicates)
- Fallback placeholders for ontology gaps

## Use Cases

- **Audit data-driven compliance** before releases
- **Identify migration candidates** for refactoring
- **Track mapping coverage** across transforms
- **Prevent regressions** to hardcoded patterns
- **CI/CD quality checks** as part of test suite

## Implementation

See `audit_mappings.py` for the scanning logic.
Loading
Loading