Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Support dynamic data delay scenario in rollup #877

Open
bowenlan-amzn opened this issue Jul 28, 2023 · 0 comments
Open

[FEATURE] Support dynamic data delay scenario in rollup #877

bowenlan-amzn opened this issue Jul 28, 2023 · 0 comments

Comments

@bowenlan-amzn
Copy link
Member

bowenlan-amzn commented Jul 28, 2023

When doing continuous rollup, the ingested data may be delayed/out of order.
Currently, the delay in rollup is the only solution to handle this problem. The delay acts on the field on which rollup does date_histogram.
However, delay is a fixed value which cannot handle dynamic delay or more complicated scenario.
For example, there could be multiple data sources ingested into the index being rollup continuously, one data source ingestion may be up-to-date but the other ingestion may fall behind dynamically due to variety of issues. So even with a delay defined, our current rollup is still not able to handle this case.

Proposed solution:
During ingestion, user adds a field that records the actual ingested timestamp - ingested_at
Enhance rollup to be able to act on a range of ingested_at time, for example, doing continuous rollup on every hour of ingested_at time.
The first composite search query rollup does will be like

{
  "query": {
    "range": {
      "ingested_at": {
        "gte": "2023-01-01T00:00:00",
        "lt": "2023-01-01T01:00:00"
      }
    }
  },
  "aggs": {
    ...
  }
}

The result of this query shows the buckets (b_new) for these ingested time range, and then there or at least 2 ways to combine the results of b_new with existing rollup data.

  1. another composite search performed only on b_new, effectively, re-rollup the updated buckets. (personally preferred)
  2. retrieve the rollup data of b_new if exists, then combine them together
@bowenlan-amzn bowenlan-amzn added the good first issue Good for newcomers label Sep 14, 2023
@bowenlan-amzn bowenlan-amzn added rollup medium and removed good first issue Good for newcomers labels Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Todo
Development

No branches or pull requests

1 participant