Skip to content

Create scheduled job to process retention data #3074

Closed
@antross

Description

@antross

Splitting out the processing as a separate task from the telemetry collection covered in #3056.

Processing

Due to some of the records containing redundant data, structured queries aren't suitable to generate the retention chart directly. Instead we'll run a daily scheduled web job to convert the records into a form that's easier to query.

As an example, using a rolling 4 day period (where _ marks data outside the 4-day period), the following table shows the "real" user (not included in actual data) and corresponding logged activity. It also shows which records would be ignored due to being redundant with data submitted later.

Day User Activity Record Redundant
1 A [1, _, _, _] X
1 C [1, _, _, _] X
2 A [1, 1, _, _]
2 C [1, 1, _, _] X
2 D [1, 0, _, _]
3 B [1, 0, 0, _] X
4 A [1, 0, 1, 1]
4 B [1, 1, 0, 0]

Or alternatively to show how the data aligns across days:

1A          [1, _, _, _] X
1C          [1, _, _, _] X
2A       [1, 1, _, _]
2C       [1, 1, _, _]    X
2D       [1, 0, _, _]
3B    [1, 0, 0, _]       X
4A [1, 0, 1, 1]
4B [1, 1, 0, 0]

Note that marking a record as redundant only means it matches the same usage pattern - it doesn't actually have to originate from the same user. Since record 4A ends in 1, 1, it needs to cancel out a record from day 2 starting with 1, 1 and a record from day 1 starting with 1. In this example the cancelled record from day 2 actually came from user C, but that's okay as balancing the numbers so each day's activity only gets counted once is what matters.

So to determine how many unique users were active for at least two days in this time period, we simply count how many non-redundant records have at least two 1s within the four day range. That's 2A, 4A, and 4B for a total of three unique users. The actual users who met this criteria were A, B, and C, but we don't need to know that - only how many of them there were.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions