Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate entries that did not make it to ITEM_SENT #16068

Open
3 tasks
jsutantio opened this issue Oct 2, 2024 · 0 comments
Open
3 tasks

Remove duplicate entries that did not make it to ITEM_SENT #16068

jsutantio opened this issue Oct 2, 2024 · 0 comments
Assignees
Labels
platform Platform Team tech-debt Anything that is purely a technical issue and does not affect functionality

Comments

@jsutantio
Copy link
Collaborator

jsutantio commented Oct 2, 2024

User Story
As a product manager and ReportStream team member, I would like my dashboard's data source to be the most accurate and reliable so that the resultant calculations and dashboards are less prone to error.

Description
This is a continuation of #16067, which swapped the datasource to the ITEM_SENT event.

The task is to remove the duplicative results that were caused by resubmissions or filtered out reports, which were accounted for in the ITEM_ROUTED event.

Considerations
Prior notes

  • Tried to match Properties.reportId from REPORT_SENT with those of ITEM_ROUTED in the aug16explosion table.
  • I could not find any matches. This may be because the the reportId is generated for each step? I’ll need to next track based on parentReportId or trackingId (most promising field).
  • This means that I’ll need to go back into Power Query to display the trackingId column on all the existing data (lots of data).
  • Try for just the aug16explosion table first.
  • See if there exists a lookup function to delete rows with duplicative trackingId’s but keeps the one row that has REPORT_SENT… but then I’ll still need the observational data in ITEM_ROUTED.

Acceptance Criteria

  • Remove duplicates from current date to as far back as possible (theoretically Aug 8 when more advanced metadata was being recorded in Azure)
  • Document changes in the documentation PowerPoint
  • Publish updated OKR Metrics Dashboard
@jsutantio jsutantio added the platform Platform Team label Oct 2, 2024
@jsutantio jsutantio self-assigned this Oct 2, 2024
@jsutantio jsutantio changed the title Copy of Switch Data Sources from ITEM_ROUTED to ITEM_SENT Remove duplicate entries that did not make it to ITEM_SENT Oct 2, 2024
@jsutantio jsutantio added the tech-debt Anything that is purely a technical issue and does not affect functionality label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
platform Platform Team tech-debt Anything that is purely a technical issue and does not affect functionality
Projects
None yet
Development

No branches or pull requests

1 participant