Skip to content

Conversation

@nurse-the-code
Copy link
Collaborator

@nurse-the-code nurse-the-code commented Jan 9, 2025

Key Changes:

  • Clean up CastVoteRecord identifier logic
    • getId - guaranteed unique identifier per ballot across tabulations. coalesce(computedId, suppliedId)
    • getSuppliedId - vendor provided Id. Not reliably unique. Nullable
  • Uses getId throughout audit log and rctab_cvr.csv
  • rctab_cvr.csv now explicitly includes both. getId provides the "RCTab CVR Id" and getSuppliedId provides the "Vendor Id" (formerly "Ballot Id")
  • Changes Dominion computedId delimiter from pipe (|) to dash (-) for improved readability

Testing Needed:

  1. Verify ID consistency:
    • ✅ Compare IDs in audit log with rctab_cvr.csv output
    • ✅ Ensure id returned from getId method is consistent across multiple tabulations
    • ✅ Check ID format follows tabulator-batch-record pattern for records from Dominion CVR sources
  2. Test fallback behavior:
    • ✅ Cases where computedId is null
    • ✅ Cases where computedId is empty or whitespace
    • ✅ Cases where only suppliedId is available
  3. Test with each CVR provider
    • ✅ CDF/Unisyn
    • ✅ Clear Ballot
    • ✅ Dominion (tested with 2024 Alaska US House race)
    • ✅ ES&S (in test suite)
    • ✅ Hart
    • ✅ generic CSV
  4. Review and update failing tests:
    • ✅ Done for now

Feature Request: Add Standardized Unique Identifiers for Cast Vote Records
#911
Replace UUID-based identification with a more consistent ID system that
uses the same identifier across audit logs and RCTab CVR output. The ID
is primarily derived from the computedId (tabulator-batch-record) with
fallback to suppliedId when computedId is unavailable.

Unlike UUIDs which change between tabulations, this approach ensures the
same ballot maintains the same identifier across multiple tabulations,
making it easier to track specific ballots across different runs.

Changes:
- Remove UUID generation for CVR identification
- Add getPrimaryId() method to centralize ID logic
- Update audit log and RCTab CVR to use the same ID format
- Change ID delimiter from pipe to dash for better readability
- Add TODO to review legacy getId() usage

Note: This change ensures that CVR IDs are consistent between audit logs,
RCTab CVR output, and repeated tabulations, making it easier to track
individual ballots across different output formats and multiple runs.
null. Null handling is now handled where getSuppliedId is called.
@yezr yezr added the ready-for-code-review ready for a live code review label Feb 11, 2025
Stream.of(tabulatorId, batchId, Integer.toString(recordId))
.filter(s -> s != null && !s.isBlank())
.collect(Collectors.joining("|"));
.collect(Collectors.joining("-")); // using a dash since this is an Id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand why - vs | is important, but the comment makes me think it matters? Is it because it will be used in filenames? If so, perhaps that should be explicit.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an aesthetic choice. Originally I had used pipes as they aren't really used in the wild (certainly less so than -), and I wanted to signal that it was computed. I'm going to revert to pipe to signal that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discovered another reason why pipes won't work, but I forget what it was. Can we change the comment to be the reason you discovered?

@yezr yezr force-pushed the feature/issue-911-rctab-cvr-primary-id branch from 6669937 to d15c79d Compare March 13, 2025 14:16
@yezr yezr marked this pull request as ready for review March 18, 2025 15:04
artoonie
artoonie previously approved these changes Mar 26, 2025
Copy link
Collaborator

@artoonie artoonie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some [nit]s but this looks good

Stream.of(tabulatorId, batchId, Integer.toString(recordId))
.filter(s -> s != null && !s.isBlank())
.collect(Collectors.joining("|"));
.collect(Collectors.joining("-")); // using a dash since this is an Id
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we discovered another reason why pipes won't work, but I forget what it was. Can we change the comment to be the reason you discovered?

@yezr yezr merged commit 926948c into develop Apr 3, 2025
1 check passed
@yezr yezr deleted the feature/issue-911-rctab-cvr-primary-id branch April 3, 2025 19:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-code-review ready for a live code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add common id to both audit log and rctab_cvr.csv

4 participants