Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message queue to send data to Analytics infrastructure #26589

Closed
Tracked by #27892
john-thomas-dotcms opened this issue Nov 1, 2023 · 2 comments
Closed
Tracked by #27892

Message queue to send data to Analytics infrastructure #26589

john-thomas-dotcms opened this issue Nov 1, 2023 · 2 comments
Labels
dotCMS : Experiments Analytics Umbrella: Experiments Feature Team : Falcon Type : Enhancement

Comments

@john-thomas-dotcms
Copy link
Contributor

john-thomas-dotcms commented Nov 1, 2023

Problem

Currently, there is no buffering of data sent to the analytics infrastructure. So, if the connection between dotCMS and the analytics infrastructure goes down (due to network issues, config/credential changes, the infra being down, or anything else), any data which was intended to be sent will be lost.

Note: Loss of data does not affect the validity of the Experiments statistics. But it does reduce the amount of data collected, which will reduce the confidence interval, and thus extend the time it takes for an Experiment to reach a higher level of confidence. And it will cause gaps in any snapshot (non-aggregated) values used in the Metrics reports

Solution

Implement some kind of buffering of data sent to the analytics infrastructure. For example, all data could be put into a message queue, which is then sent by the scheduled job. But messages would be retained in the queue if the send fails for any reason.

Acceptance Criteria

  1. Data is not lost when any of the following happen for a period of time:
    • The network connection to the Analytics infrastructure is broken
    • The analytics infrastructure is shut down or restarted
    • The configuration for the analytics infrastructure is invalidated (e.g. delete the client key, etc.)
  2. Data is not lost when the dotCMS instance is restarted
    • So, a restart of the container/pod/server should not clear the data buffer/queue
  3. The data buffer/queue can be manually cleared without a server restart
    • The support/maintenance team must be able to clear it
    • It might make sense to do both of the following:
      • Add a button in the Analytics App to clear the queue manually
      • Implement an endpoint (or arguments to existing endpoints) to both check the size of the queue, and clear it
        • This will give cloud engineering the ability to automate management of the queue
  4. The implementation is not specific to Experiments
    • The buffering also must be usable by the Metrics feature, and other analytics features in the future
Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@john-thomas-dotcms
Copy link
Contributor Author

This isn't needed yet, we'll open a new card for this when we feel it's important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dotCMS : Experiments Analytics Umbrella: Experiments Feature Team : Falcon Type : Enhancement
Projects
Status: Done
Development

No branches or pull requests

1 participant