You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, there is no buffering of data sent to the analytics infrastructure. So, if the connection between dotCMS and the analytics infrastructure goes down (due to network issues, config/credential changes, the infra being down, or anything else), any data which was intended to be sent will be lost.
Note: Loss of data does not affect the validity of the Experiments statistics. But it does reduce the amount of data collected, which will reduce the confidence interval, and thus extend the time it takes for an Experiment to reach a higher level of confidence. And it will cause gaps in any snapshot (non-aggregated) values used in the Metrics reports
Solution
Implement some kind of buffering of data sent to the analytics infrastructure. For example, all data could be put into a message queue, which is then sent by the scheduled job. But messages would be retained in the queue if the send fails for any reason.
Acceptance Criteria
Data is not lost when any of the following happen for a period of time:
The network connection to the Analytics infrastructure is broken
The analytics infrastructure is shut down or restarted
The configuration for the analytics infrastructure is invalidated (e.g. delete the client key, etc.)
Data is not lost when the dotCMS instance is restarted
So, a restart of the container/pod/server should not clear the data buffer/queue
The data buffer/queue can be manually cleared without a server restart
The support/maintenance team must be able to clear it
It might make sense to do both of the following:
Add a button in the Analytics App to clear the queue manually
Implement an endpoint (or arguments to existing endpoints) to both check the size of the queue, and clear it
This will give cloud engineering the ability to automate management of the queue
The implementation is not specific to Experiments
The buffering also must be usable by the Metrics feature, and other analytics features in the future
The text was updated successfully, but these errors were encountered:
Problem
Currently, there is no buffering of data sent to the analytics infrastructure. So, if the connection between dotCMS and the analytics infrastructure goes down (due to network issues, config/credential changes, the infra being down, or anything else), any data which was intended to be sent will be lost.
Note: Loss of data does not affect the validity of the Experiments statistics. But it does reduce the amount of data collected, which will reduce the confidence interval, and thus extend the time it takes for an Experiment to reach a higher level of confidence. And it will cause gaps in any snapshot (non-aggregated) values used in the Metrics reports
Solution
Implement some kind of buffering of data sent to the analytics infrastructure. For example, all data could be put into a message queue, which is then sent by the scheduled job. But messages would be retained in the queue if the send fails for any reason.
Acceptance Criteria
The text was updated successfully, but these errors were encountered: