Description
Issue
When a Datafeed is configured, the end user provides a query_delay
. At times this delay is too small and consequently, when the Datafeed pulls data from the index(es) data could be missed that has yet to be indexed.
We currently do a poor job of indicating if any data was missed and alerting the user to such.
Solution
A proposed solution is for a separate process in real-time Datafeeds to look at past finalized bucket(s) and compare the event_count
with a the current actual count of documents for that bucket(s) time window and the user provided query.
To capture bucket discrepancies over an arbitrary number of buckets in the past, a date_histogram
aggregation with interval=bucket_span
. When this is used in conjunction with the Datafeed's query it allows us to have an accurate count for what the event_count
SHOULD be given the current data in the index. Then for each finalized bucket, we compare the event_count
to the true data in the matching date_histogram
bucket. If the true data has a higher count than the event_count
, then that is considered a discrepancy.
If a discrepancy is found, an Audit should be made suggesting an increase in the query delay. As more capabilities are added (possibly Annotations?), those could be utilized to give a better indication of how much data was missed over a given timerange.