Skip to content

Comments

feat: optional term_filter for run-inference and latest-inference-cohort endpoint#201

Draft
chapmanhk wants to merge 9 commits intodevelopfrom
feat/inference-cohort-selection
Draft

feat: optional term_filter for run-inference and latest-inference-cohort endpoint#201
chapmanhk wants to merge 9 commits intodevelopfrom
feat/inference-cohort-selection

Conversation

@chapmanhk
Copy link
Contributor

@chapmanhk chapmanhk commented Feb 2, 2026

changes

To review only the latest commit: see Checkpoint term support and cohort_labels below.

Run-inference: optional term_filter list (API → Databricks)

  • API (routers/models.py)

    • InferenceRunRequest: added optional term_filter: list[str] | None = None.
    • When term_filter is present: require at least one non-empty label; reject empty list (400) and empty or whitespace-only labels (400), with logging before each raise.
    • Build DatabricksInferenceRunRequest with term_filter=req.term_filter and pass it to run_pdp_inference.
    • After a successful run, log once: user, term_filter (cohorts), and job_run_id.
    • Docstring updated to describe the optional term_filter field.
    • Batch-not-found error message fixed to use len(batch_result) instead of len(inst_result).
    • Renamed res to inference_run_response in trigger_inference_run for clarity.
  • Databricks (databricks.py)

    • DatabricksInferenceRunRequest: added optional term_filter: list[str] | None = None.
    • In run_pdp_inference(): build job_params dict; when req.term_filter is not None, set job_params["term_filter"] = json.dumps(req.term_filter). Omit the key when term_filter is None.
    • Improved logging: use %-style and include exception in log messages for job lookup and job run failures.
  • Tests (routers/models_test.py)

    • No term_filter in request → 200 and mock called with term_filter=None.
    • Single and multiple cohorts in term_filter → 200 and mock receives the correct list.
    • Empty list, empty string in list, and whitespace-only label in term_filter → 400 with the expected detail messages.

Latest-inference-cohort endpoint and config from Databricks

  • API (routers/data.py)

    • New GET /{inst_id}/latest-inference-cohort returning LatestInferenceCohortResponse (cohort_labels, valid_student_count, status, batch_name, reason). Always returns 200; uses status='invalid' with reason when no batch, missing config, or no cohort has course data.
    • Optional query params: model_name (optional): if omitted, the institution must have exactly one registered model (valid, non-deleted); otherwise invalid. When provided, the model must exist for the institution or the endpoint returns invalid. batch_name (optional): when provided (URL-decoded), cohort selection uses that batch; when omitted, uses the most recent batch that has both STUDENT and COURSE files.
    • Config is read from the model's training run in the silver volume via DatabricksControl.read_volume_training_config(inst.name, model_run_id). model_run_id is the latest model version's run ID for the chosen model.
    • New/refactored helpers: _resolve_model_name_and_run_id_for_inference (resolve model name, validate existence, fetch run ID), _load_batch_and_dataframes_for_inference (resolve batch, load config, read batch files, validate STUDENT/COURSE), get_batch_for_inference, get_batch_by_name_with_student_and_course, _batch_has_student_and_course, plus existing cohort logic split into _latest_inference_cohort_validation_response, _first_valid_from_ordered_terms, and _resolve_latest_inference_cohort_from_dataframes (≤50 lines per function). Mypy type-narrowing assertions added where optional values are guaranteed after error checks.
  • Config (config.py)

    • Added ENV_TO_VOLUME_SCHEMA = {"DEV": "dev_sst_02", "STAGING": "staging_sst_01"} for Databricks volume paths. Used by databricks.read_volume_training_config and front_end_tables.get_model_cards (replacing a local dict there).
  • Databricks (databricks.py)

    • New read_volume_training_config(inst_name, model_run_id): reads from /Volumes/{env_schema}/{slug}_silver/silver_volume/{model_run_id}/, finds any .toml under that path that contains [preprocessing.selection], and returns that section only (or None if missing/unreadable). Uses ENV_TO_VOLUME_SCHEMA; returns None for unsupported ENV (e.g. LOCAL). New helpers: _parse_config_toml_to_selection(raw: bytes) for TOML parsing and section extraction, and _find_selection_in_toml_under(w, directory_path, inst_name) to list directory and scan .toml files. Guard for response.contents is None before calling .read() (mypy-safe).
  • Front-end (routers/front_end_tables.py)

    • get_model_cards now uses ENV_TO_VOLUME_SCHEMA from config instead of a local schema dict.
  • Tests

    • routers/data_test.py: Tests for get_latest_inference_cohort (unauthorized; no model_name and no models / models exist but none registered; missing cohort column/term/config; valid; valid with batch_name; model_name not found; batch_name not found; loop-back when no course data for most recent; no cohort has course data; missing student id column; zero students meeting criteria; student_criteria references missing column); for get_latest_batch_with_student_and_course (returns batch when it has STUDENT and COURSE, returns None when no batch or no batch has both); and for apply_student_criteria_count, filter_to_most_recent_checkpoint_term, get_ordered_checkpoint_terms.
    • databricks_test.py: Tests for _parse_config_toml_to_selection (valid TOML with section, invalid/missing section); for read_volume_training_config (empty inst_name, empty model_run_id, ENV not DEV/STAGING, databricksify raises, WorkspaceClient raises, list_raises, download raises, TOML missing [preprocessing.selection], success when .toml found under run dir). Added return type -> list[DirectoryEntry] to _one_toml_entry for mypy.

Checkpoint term support and cohort_labels (latest commit)

  • API (routers/data.py)

    • Config normalization: Added _normalize_selection_config_for_inference() so pipeline config can use checkpoint_term in student_criteria; at inference entry it is normalized to cohort_term once. Downstream code and DB column stay cohort_term. _allowed_checkpoint_terms_from_config now reads only cohort_term (config is always normalized first).
    • Naming: Renamed inference-path naming to "checkpoint term": e.g. get_ordered_checkpoint_terms, filter_to_most_recent_checkpoint_term, _try_checkpoint_term_candidate, _first_valid_from_ordered_terms, CHECKPOINT_TERM_REQUIRED_MSG, _TERM_ORDER, _UNKNOWN_TERM_SORT_RANK. Column and schema remain cohort_term.
    • Response model: LatestInferenceCohortResponse.cohort_labelcohort_labels: list[str]; all construction sites updated. User-facing message for missing column says "cohort term" (matches column); other messages keep "checkpoint term."
    • Logic: Defensive fallback message in filter_to_most_recent_checkpoint_term; explicit fields on early "Institution not found" response. Extracted _try_checkpoint_term_candidate, _ordered_checkpoint_terms_from_cohort_column, and _ordered_checkpoint_terms_from_entry_year for single-purpose helpers and 50-line limit.
  • Databricks (databricks.py)

    • _parse_config_toml_to_selection: accepts checkpoint_term in config (normalized to cohort_term by API); behavior unchanged for cohort_term.
  • Tests

    • routers/data_test.py: Normalizer (valid, non-dict student_criteria); checkpoint_term in config; cohort_term as string in config; cohort_term empty list (valid, count 0); response contract (cohort_labels list).
    • databricks_test.py: test_parse_config_toml_to_selection_accepts_checkpoint_term; return type -> None on the other parse test.

context

  • Run-inference term_filter: Callers need to be able to choose which cohorts (e.g. "fall 2024-25") to run inference on. This PR adds an optional term_filter list to the run-inference request and passes it through to the Databricks job so the pipeline can filter by those terms/cohorts. When term_filter is omitted, no term_filter key is sent to Databricks and behavior is unchanged (pipeline uses its config default).

  • Latest-inference-cohort: The UI needs a single "latest inference-ready cohort" (label, valid student count, status) for an institution. This endpoint supports optional model_name and batch_name so the UI can show a specific model/batch or rely on defaults (single registered model, latest batch with student+course files). It reads the chosen model's training/preprocessing selection config from Databricks (silver volume, per model run) and returns the first cohort term that has course data and meets the config criteria. Refactors and tests align with project standards (e.g. function length, naming, error-path coverage).

  • Checkpoint term and cohort_labels: Pipelines may specify the inference cohort via checkpoint_term in config; the API normalizes this to cohort_term at entry so the rest of the stack stays column-aligned. The response now exposes cohort_labels (list) for consistency and future multi-cohort use. Naming ("checkpoint term") is aligned with product language while keeping the DB column name cohort_term.

questions

  • None.

chapmanhk and others added 4 commits January 30, 2026 10:00
- Add GET /institutions/{inst_id}/latest-inference-cohort endpoint returning
  LatestInferenceCohortResponse (status, cohort_label, valid_student_count,
  batch_name, reason). Always 200; status=invalid with reason when no batch,
  missing config, missing cohort_term, or no cohort has course data.
- Define "most recent cohort" as most recent cohort term (e.g. Fall 2024,
  Spring 2025). Require cohort_term column; support cohort+cohort_term or
  entry_year+cohort_term. Use config preprocessing.selection.cohort_term to
  filter allowed terms. Loop back to next cohort term when current has no
  course data for students.
- Add DatabricksControl.read_volume_training_config to read config.toml from
  institution bronze volume and return [preprocessing.selection] for cohort
  term filtering and student_criteria.
- Add get_ordered_cohort_terms, filter_to_most_recent_cohort,
  _resolve_latest_inference_cohort_from_dataframes, and helpers; integrate
  apply_student_criteria_count with course-data restriction.
- Add unit tests for get_ordered_cohort_terms and filter_to_most_recent_cohort;
  API tests for latest-inference-cohort (unauthorized, missing cohort/term/config,
  valid, loop-back, missing student id, valid_count=0, missing criteria column);
  apply_student_criteria_count tests; tighten response shape assertions.
- Remove unused imports (joinedload, LatestInferenceCohortResponse); apply black.
- Add optional cohort: list[str] to InferenceRunRequest and DatabricksInferenceRunRequest
- Validate cohort when provided: reject empty list and empty/whitespace-only labels
- Serialize cohort as JSON in job_parameters when set; omit key when None (backward compatible)
- Log user, cohorts, and job_run_id once in API after successful run
- Fix batch error message to use len(batch_result) instead of len(inst_result)
- Add tests: backward compat (no cohort), single/multiple cohorts, empty list, empty string, whitespace-only
- Centralize ENV_TO_VOLUME_SCHEMA in config; use in front_end_tables and databricks
- Add read_volume_training_config and _parse_config_toml_to_selection for volume TOML
- Refactor _resolve_latest_inference_cohort_from_dataframes into helpers (≤50 lines)
- Improve run_pdp_inference logging (%-style, exception context); guard None response.contents
- Rename res to inference_run_response in trigger_inference_run
- Add tests: get_latest_batch when no batch has both STUDENT and COURSE; read_volume_training_config error paths

Co-authored-by: Cursor <cursoragent@cursor.com>
@chapmanhk chapmanhk requested a review from Mesh-ach February 3, 2026 17:43
chapmanhk and others added 2 commits February 3, 2026 13:52
…ection

- Add optional batch_name query param; resolve batch by name or latest with student+course
- Resolve model from single registered (valid, non-deleted) when model_name omitted
- Validate model exists for institution when model_name is provided
- Read training config from Databricks volumes for cohort selection
- Refactor endpoint into focused helpers; add mypy type-narrowing assertions
- Add tests for batch_name, model/batch validation, and registered-model resolution

Co-authored-by: Cursor <cursoragent@cursor.com>
@chapmanhk chapmanhk changed the title feat: Pass optional cohort list to run-inference API and forward to Databricks. feat: optional cohort for run-inference and latest-inference-cohort endpoint Feb 3, 2026
- Add term_filter to InferenceRunRequest and DatabricksInferenceRunRequest
- Pass term_filter in job_params to Databricks job
- Update models router validation and tests
- Extend databricks tests and remove unused import

Co-authored-by: Cursor <cursoragent@cursor.com>
@chapmanhk chapmanhk changed the title feat: optional cohort for run-inference and latest-inference-cohort endpoint feat: optional term_filter for run-inference and latest-inference-cohort endpoint Feb 3, 2026
chapmanhk and others added 2 commits February 17, 2026 16:22
- Normalize selection config at inference entry (checkpoint_term -> cohort_term)
- Rename inference path to checkpoint term (constants, helpers, docstrings)
- Change LatestInferenceCohortResponse.cohort_label to cohort_labels (list)
- Extract _try_checkpoint_term_candidate, _ordered_checkpoint_terms_from_*
  for single-purpose helpers and 50-line limit; add _UNKNOWN_TERM_SORT_RANK
- Add tests: normalizer edge cases, cohort_term string/empty list, response
  contract; databricks checkpoint_term parse; type hint on test

Co-authored-by: Cursor <cursoragent@cursor.com>
@chapmanhk chapmanhk requested review from kaylawilding and removed request for Mesh-ach and nm3224 February 18, 2026 00:26
) -> Dict[str, Any]:
"""Normalize selection_config so student_criteria uses column name cohort_term for inference.

If checkpoint_term is present in student_criteria, set cohort_term to that value and
Copy link
Contributor

@kaylawilding kaylawilding Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry what is checkpoint_term here? Is it meant to be the term config field we have added listing the terms for inference?
If so, a couple thoughts, it shouldn't be a term the same as cohort_term field is simply FALL or SPRING, but instead it also includes the year so it will be like FALL 2023-24, eg.

return out


def _latest_inference_cohort_validation_response(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should talk about how to square this with that were using terms now, do we want both?



def _first_valid_from_ordered_terms(
ordered_checkpoint_terms: List[Dict[str, Any]],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah i see you are doing both, i will need to change the config to terms i believe

remove checkpoint_term so downstream (get_ordered_*, apply_student_criteria_count)
only see cohort_term. If both keys exist, prefer checkpoint_term.
"""
criteria = selection_config.get("student_criteria")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make sure this is not the student criteria field

@chapmanhk chapmanhk marked this pull request as draft February 18, 2026 19:15
Copy link
Contributor

@kaylawilding kaylawilding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding comments but this is on hold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants