Skip to content

Conversation

@johncmerfeld
Copy link
Collaborator

@johncmerfeld johncmerfeld commented Jan 5, 2026

Tested by running a CIRCLE file (with updated code) with 100% match rate and confirming that the executor does not wrongly report that there are unmatched students, and that it still reports the correct number of unmatched students otherwise

For a bundle such as CIRCLE or TX_KEA whose input has multiple header rows, the executor’s current approach to checking whether Earthmover produced unmatched students (“does the file contain non-header rows?”) is too brittle.

  • Remove the offending count_unmatched_students function
  • Split handle_unmatched_students into two functions: record_highest_match_rate and report_unmatched_students
  • record_highest_match_rate runs after earthmover run, whether the run succeeds or not. It reads the match rates CSV and records information about the highest match rate found in that file. If there are no much unmatched students, don't upload the (empty but existent) unmatched students CSV
  • report_unmatched_students is a pared-down version of handle_unmatched_students that simply checks whether a) there are any unmatched students to report and b) whether there are so many that we shouldn't bother sending back the unmatched students CSV

@johncmerfeld johncmerfeld self-assigned this Jan 5, 2026
@johncmerfeld johncmerfeld marked this pull request as ready for review January 7, 2026 22:19
@johncmerfeld johncmerfeld requested a review from tomreitz January 7, 2026 22:20
Copy link
Collaborator

@tomreitz tomreitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this @johncmerfeld, what you said it does makes sense. Thanks!

self.highest_match_id_name = highest_match["source_column_name"]
self.highest_match_id_type = highest_match["edfi_column_name"]
self.num_unmatched_students = int(highest_match["num_rows"]) - int(highest_match["num_matches"])

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Feel free to take or leave this comment - it's not introduced by this change, just something I noticed while reviewing it.)

The sorting of match_rates could be done by earthmover, rather than here. The best_id_match transformation already does; id_match_rates could as well.

@johncmerfeld johncmerfeld merged commit 8644012 into development Jan 13, 2026
1 check passed
@edandylytics edandylytics mentioned this pull request Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants