Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
## Summary

<!-- Briefly describe what this PR adds or changes. -->

## Evidence for new vocabulary tokens

<!--
Every PR that adds or modifies vocabulary tokens MUST complete this section.
A token is justified when it appears in ≥ 3 distinct IMAS DD paths or
facility signals. The imas-codex `sn gaps` command can provide this data.

Delete this section if the PR does not touch vocabulary files.
-->

- **Token(s) proposed:** <!-- e.g. runaway_electron, curvature_drift -->
- **Number of distinct DD paths demanding this token (N):** <!-- must be ≥ 3 -->
- **Paths (list at least 3):**
- <!-- path 1 -->
- <!-- path 2 -->
- <!-- path 3 -->
- **Why an existing token does not suffice:**
<!-- One paragraph explaining semantic gap -->

## Motivation

<!-- Why is this change needed? Link to an imas-codex issue or VocabGap report if applicable. -->

## Changes

<!-- List the files changed and what was modified. -->

## Testing

- [ ] All existing tests pass (`uv run pytest`)
- [ ] New tests added for any new grammar rules or validation logic
- [ ] Grammar validates correctly (`uv run pytest tests/`)

## Checklist

- [ ] I have read the [CONTRIBUTING.md](../CONTRIBUTING.md) guidelines
- [ ] Conventional commit message format used
- [ ] For vocabulary additions: N ≥ 3 evidence provided above
3 changes: 2 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ jobs:

publish-pypi:
needs: test-install
if: github.repository == 'iterorganization/IMAS-Standard-Names'
runs-on: ubuntu-latest
environment:
name: pypi
Expand All @@ -77,7 +78,7 @@ jobs:
uses: pypa/gh-action-pypi-publish@release/v1

github-release:
needs: publish-pypi
needs: test-install
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
Expand Down
30 changes: 30 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,8 @@ imas_standard_names/
├── tools/ # MCP read-only tool implementations
├── grammar/ # Grammar parsing, composition, and validation
├── catalog/ # SQLite catalog management
├── graph/ # NetworkX local graph builder (plan 41, optional)
├── rendering/ # MkDocs catalog renderer
├── repository.py # Main repository facade (read-only)
├── models.py # Pydantic data models
└── validation/ # Validation logic
Expand All @@ -84,6 +86,34 @@ tests/ # Test suite
- Group related functionality in focused modules
- Keep modules cohesive and loosely coupled

### Local Graph (plan 41)

The `graph/local_graph.py` module builds a NetworkX `DiGraph` over the
per-domain catalog YAML (`<domain>.yml`). Five edge types are emitted:

| Edge type | Source attribute | Direction |
|-----------|------------------|-----------|
| `HAS_ARGUMENT` | `arguments.*.name` | entry → argument |
| `HAS_ERROR` | `error_variants[]` | **base → variant** (base-centric) |
| `HAS_PREDECESSOR` | `deprecates` | entry → deprecated predecessor |
| `HAS_SUCCESSOR` | `superseded_by` | deprecated entry → successor |
| `REFERENCES` | `links[]` | entry → referenced name |

Forward references and external names appear as stub nodes
(`node["stub"] = True`). Install with `uv sync --extra graph-local`.

Four MCP tools in `tools/graph.py` wrap the graph:

- `get_standard_name_neighbours(name, edge_types=None, direction="both")`
- `get_standard_name_ancestors(name, max_depth=None)` — transitive closure
over HAS_ARGUMENT (out) ∪ HAS_ERROR (in), i.e. ordering parents.
- `get_standard_name_descendants(name, max_depth=None)` — inverse.
- `shortest_standard_name_path(source, target, edge_types=None)`

All four are registered as optional read-only tools (skipped when the
`networkx` import is unavailable).


## Project Setup

### Terminal Usage
Expand Down
12 changes: 12 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,18 @@ If you find a problem or have a suggestion:
3. Update the documentation alongside code changes.
4. Reviewers may request changes before merging.

## Vocabulary Token Policy

When proposing new vocabulary tokens (entries in `imas_standard_names/grammar/vocabularies/*.yml`), the following **N ≥ 3 evidence gate** applies:

1. **Minimum evidence:** A new token must appear in **at least 3 distinct** IMAS Data Dictionary paths or facility signals. This prevents one-off jargon from polluting the grammar.

2. **How to gather evidence:** Run `imas-codex sn gaps --direction missing` or `imas-codex sn gaps --direction saturated` to see tokens with occurrence counts. The PR template includes a section for listing the supporting DD paths.

3. **Deprecation:** Tokens that no longer appear on ≥ 3 DD paths after a Data Dictionary update should be marked for deprecation in the next release cycle.

4. **Exceptions:** Tokens introduced for structural reasons (e.g., uncertainty operators like `upper_uncertainty`) may bypass the N ≥ 3 rule if they serve a systematic grammar purpose documented in the PR.

## Proposing Standard Names

To propose new standard names, use the [imas-standard-names-catalog](https://github.com/iterorganization/imas-standard-names-catalog) repository. Names are generated by [imas-codex](https://github.com/iterorganization/imas-codex) and reviewed by domain experts before being merged into the catalog.
Expand Down
58 changes: 58 additions & 0 deletions docs/vocab-retrospective-rc21-rc26.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Vocabulary Token Retrospective: rc21 → rc26

Generated by imas-codex Phase 6c retrospective audit.

## Method

Tokens added between ISN `v0.7.0rc21` and `v0.7.0rc26` were extracted via
`git diff v0.7.0rc21..v0.7.0rc26 -- imas_standard_names/grammar/vocabularies/`.
Path counts were estimated from the IMAS Data Dictionary paths that demand each
token for standard name composition (via DD MCP search + cluster analysis).

**Verdict rules:**
- **keep**: ≥ 3 distinct DD paths demand this token
- **deprecate**: 1–2 DD paths demand this token
- **retire**: 0 DD paths demand this token

## Audit Table

| Token | Segment | RC | N paths (est.) | Verdict | Notes |
|-------|---------|-----|---------------|---------|-------|
| `upper_uncertainty` | operator | rc23 | 1000+ | **keep** | Systematic operator for every `_error_upper` DD field. Structural purpose — N≥3 rule exempt. |
| `lower_uncertainty` | operator | rc23 | 1000+ | **keep** | Systematic operator for every `_error_lower` DD field. Structural purpose. |
| `uncertainty_index` | operator | rc23 | 1000+ | **keep** | Systematic operator for every `_error_index` DD field. Structural purpose. |
| `trapped` | subject | rc26 | 12+ | **keep** | `distributions/distribution/profiles_1d/trapped/*` — density, collisions, source terms across multiple subtrees. |
| `passing` | subject | rc26 | 10+ | **keep** | Paired with `co_passing`/`counter_passing` subtrees in `distributions` IDS. |
| `e_cross_b_drift` | process | rc26 | 8+ | **keep** | `velocity_exb` fields in `edge_profiles`, `plasma_profiles`, `core_transport` (E×B drift processes). |
| `heat_viscosity` | process | rc26 | 6+ | **keep** | Viscous heating terms in `core_transport/model/profiles_1d` transport coefficients. |
| `counter_passing` | subject | rc26 | 6+ | **keep** | `distributions/distribution/profiles_1d/counter_passing/*` — density, collisions, source. |
| `co_current` | subject | rc26 | 4+ | **keep** | Co-current particle populations in `distributions/distribution/profiles_1d/co_passing/*`. |
| `counter_current` | subject | rc26 | 4+ | **keep** | Counter-current populations, transport model coupling in `core_transport`. |
| `ohmic_induction` | process | rc26 | 4+ | **keep** | Ohmic induction current density and flux terms in `core_transport`. |
| `inertial` | subject | rc26 | 3+ | **keep** | Inertial force/term fields in `core_transport` momentum equations. |
| `sonic` | subject | rc26 | 3+ | **keep** | Sonic Mach number quantities in `core_profiles` and `edge_profiles`. |
| `left_hand_circularly_polarized` | subject | rc26 | 3+ | **keep** | LHCP wave modes in `waves` IDS (wave polarisation). |
| `right_hand_circularly_polarized` | subject | rc26 | 3+ | **keep** | RHCP wave modes in `waves` IDS (wave polarisation). |

## Summary

- **15 tokens** added across rc22–rc26
- **15 tokens** meet the N ≥ 3 evidence gate → all **keep**
- **0 tokens** recommended for deprecation
- **0 tokens** recommended for retirement

### Provenance

All rc26 tokens were proposed via the codex `sn gaps --direction saturated`
promotion pipeline which enforces `min_usage_count=3` and `min_review_score=0.75`
before candidacy. The rc23 uncertainty operators were added for structural grammar
completeness (B9 error siblings) and are exempt from the N ≥ 3 rule by the
"systematic grammar purpose" exception in the vocab policy.

### Recommendations

1. **No action needed** — all tokens are well-evidenced.
2. **Future audits** should run after each DD version bump using
`imas-codex sn gaps` to detect orphaned tokens.
3. Consider adding a CI check that validates token coverage as part
of the ISN release process.
74 changes: 74 additions & 0 deletions docs/vocab-retrospective.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Vocabulary Token Retrospective

Ongoing record of vocab gap triage runs. Each section corresponds to a
pilot / release cycle. Evidence counts (N) are the number of distinct
IMAS DD paths that required the token for standard-name composition as
recorded by imas-codex VocabGap nodes.

---

## EMW Pilot (rc27 candidates)

Source: imas-codex tier-a-pilot run `2026-04-24` on `electromagnetic_wave_diagnostics`.
Paths processed: 340, Names composed: 272, Vocab gaps identified: 32.

Evidence gate: N ≥ 3 distinct IMAS DD paths required to add a token.

### Added at rc27 (N≥3 evidence)

| Token | Segment | Vocab file | Evidence N | Example refs |
|-------|---------|------------|-----------|--------------|
| `diagnostic_latency` | base → physical_base | `physical_bases.yml` | 4 | `refractometer/latency`, `reflectometer_fluctuation/latency`, `reflectometer_profile/latency`, `ece/latency` |
| `sweep_duration` | base → physical_base | `physical_bases.yml` | 3 | `reflectometer_profile/channel/sweep_time`, `refractometer/channel/sweep_time`, `reflectometer_fluctuation/channel/sweep_time` |
| `x1_coordinate` | base → geometry_carrier | `geometry_carriers.yml` | 3 | `reflectometer_fluctuation/channel/antenna_detection_static/outline/x1`, `reflectometer_profile/channel/antenna_detection/outline/x1` |
| `x2_coordinate` | base → geometry_carrier | `geometry_carriers.yml` | 3 | `reflectometer_fluctuation/channel/antenna_detection_static/outline/x2`, `reflectometer_profile/channel/antenna_detection/outline/x2` |
| `x1_width` | base → physical_base | `physical_bases.yml` | 3 | `reflectometer_fluctuation/channel/antenna_detection_static/x1_width`, `reflectometer_profile/channel/antenna_detection/x1_width` |

### Deferred (N<3 evidence)

Re-evaluate after Tier B pilot adds more electromagnetic-wave diagnostics coverage.

| Token | Segment | Evidence N | Reason |
|-------|---------|-----------|--------|
| `variation_flag` | base | 2 | insufficient evidence at rc27; re-evaluate after Tier B |
| `probing_frequency` | base | 2 | insufficient evidence at rc27; re-evaluate after Tier B |
| `time_window_duration` | base | 2 | insufficient evidence at rc27; re-evaluate after Tier B |
| `x2_width` | base | 2 | insufficient evidence at rc27; re-evaluate after Tier B |
| `calibration_factor` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `wave_vector_component` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `normalized_toroidal_flux_coordinate` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B (token already in geometry_carriers.yml — possible parser issue) |
| `fringe_jump_correction_time` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `analysis_time_window` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `bandwidth` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `frequency_axis` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `carrier_frequency` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `phase_ellipse_rotation_angle` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `beam_spot_ellipse_rotation_angle` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `emission_position_correction_toroidal_angle` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `emission_position_correction_poloidal_angle` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `probing_signal_phase` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `arc_length` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `beam_spot_size` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `unit_vector_component` | base | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `suprathermal_electron_position_correction` | locus | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `beam_path` | locus | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `ece_beam_position` | locus | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `launched` | operators | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `correction_to` | operators | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `x1_unit_vector` | position | 1 | insufficient evidence at rc27; re-evaluate after Tier B |
| `x2_unit_vector` | position | 1 | insufficient evidence at rc27; re-evaluate after Tier B |

### Pilot context notes

- **Q1 (`x/y/z_cartesian` spatial qualifiers):** The pilot flagged 9 names with
cartesian-system coordinate tokens. The graph evidence mapped these to
`x1_coordinate` (N=3), `x2_coordinate` (N=3), and `x1_width` (N=3) — all from
reflectometer antenna-grid geometry fields. All three met the gate and were
added. `x2_width` reached only N=2 and is deferred.
- **Q2 (`wave_power_flow` dimensionless power, 2 names):** Unit-rules audit
issue, not a vocab gap. No token addition warranted.
- **Q3 (`fluctuation_power_spectrum` density-unit, 1 name):** Pattern exception,
not a vocab gap. No token addition warranted.
- **Q4 (`line_integrated` cumulative prefix, 1 name):** `operators` segment,
N=1 — deferred (not captured directly in the gap nodes above; the
`launched` and `correction_to` operator gaps were flagged instead).
Loading
Loading