-
Notifications
You must be signed in to change notification settings - Fork 0
Executor: Fix check for unmatched student IDs #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tomreitz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this @johncmerfeld, what you said it does makes sense. Thanks!
| self.highest_match_id_name = highest_match["source_column_name"] | ||
| self.highest_match_id_type = highest_match["edfi_column_name"] | ||
| self.num_unmatched_students = int(highest_match["num_rows"]) - int(highest_match["num_matches"]) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Feel free to take or leave this comment - it's not introduced by this change, just something I noticed while reviewing it.)
The sorting of match_rates could be done by earthmover, rather than here. The best_id_match transformation already does; id_match_rates could as well.
Tested by running a CIRCLE file (with updated code) with 100% match rate and confirming that the executor does not wrongly report that there are unmatched students, and that it still reports the correct number of unmatched students otherwise
For a bundle such as CIRCLE or TX_KEA whose input has multiple header rows, the executor’s current approach to checking whether Earthmover produced unmatched students (“does the file contain non-header rows?”) is too brittle.
count_unmatched_studentsfunctionhandle_unmatched_studentsinto two functions:record_highest_match_rateandreport_unmatched_studentsrecord_highest_match_rateruns afterearthmover run, whether the run succeeds or not. It reads the match rates CSV and records information about the highest match rate found in that file. If there are no much unmatched students, don't upload the (empty but existent) unmatched students CSVreport_unmatched_studentsis a pared-down version ofhandle_unmatched_studentsthat simply checks whether a) there are any unmatched students to report and b) whether there are so many that we shouldn't bother sending back the unmatched students CSV