Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add interval time check #16951

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Add interval time check #16951

wants to merge 4 commits into from

Conversation

GWphua
Copy link
Contributor

@GWphua GWphua commented Aug 23, 2024

Fixes #16945

Description

Add interval check functionality

  • System will throw IAE if check is enabled, and the user-defined rules are violated.
  • Default: Time check functionality set to false. (Do not want systems to break all of a sudden if they actually want years like "20240").
  • Can enable functionality, and set start and end time with JSON input: safeInput (boolean), safeStart (string), safeEnd (string) respectively.
  • Documented changes in ingestion-spec.md.
  • Update equals and hashcode function, adhering to rule that objects that are equal must have the same hashcode.

Unit test changes

  • Renamed test class allows easy navigation between to and from implementation classes with (Cmd Shift T) in IntelliJ.
  • Added unit tests for new functionalities.

Release note


Add functionality to check your ingestion intervals, giving early reporting to erroneous time intervals during ingestion.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Add tests for Time interval

Add check to disallow intervals later than current time

Resolve NPE

Update interval to accept default of start = now, end = one year later

Refactor logic in constructor

Add boolean check

Rename and allow testing pass and fail case

Remove unnecessary JSON annotations

Rename boolean to make it more user-friendly

Adhere to Druid code style conventions

Update ingestion-spec docs to reflect functionality

Add restricted start and end time

Change test case constructors, revamp Equal checks

Adhere to style checks and conduct test build

Update start safety interval to check violations before timestamp

Fix equal bug in intervals

Update hashcode function
@GWphua GWphua changed the title Update inputIntervals Javadocs Add interval time check Aug 23, 2024
@GWphua
Copy link
Contributor Author

GWphua commented Aug 31, 2024

Bump @kfaraz 😄

Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @GWphua !
I don't think touching GranularitySpec for this requirement is the correct approach.

Also, I am not sure if the change is even needed. We already support validation of intervals:

Do you want to raise an alert when an out-of-range record is encountered or just filter out such records?

@GWphua
Copy link
Contributor Author

GWphua commented Sep 5, 2024

Thanks for the review @kfaraz,

From the raised issue, it seems that there are cases where year of data is incorrectly labelled. I chose to make the system throw an error when facing an out-of-range data to notify the user.

Noted on the support of validation. If such capabilities are already supported in Druid, I can offer to close this PR 😄

@kfaraz
Copy link
Contributor

kfaraz commented Sep 5, 2024

Thanks, @GWphua !
I have left a comment on the original issue. You can close this PR for the time being.
When we have more clarity on the requirement, you can create another PR with the appropriate implementation.

@GWphua GWphua closed this Sep 6, 2024
@GWphua
Copy link
Contributor Author

GWphua commented Sep 12, 2024

Hi @kfaraz, Reopen pull request based on the explanation in the relevant Issue.

I will explain the concept behind the check. Taking the example of daily intervals, in normal circumstances, tasks will be initialized with the following intervals:

(Supposing task of size 100G per day)
2024-06-11/2024-06-12 // 100G
2024-06-12/2024-06-13 // 100G
...
2024-09-11/2024-09-12 // 100G

Assuming we create the tasks today (12 Sep 2024). Based on my understanding, if payloads are misconfigured with 20240, they will be initialized with the following:

2024-06-11/20240-06-12 // 9200G
2024-06-12/20240-06-13 // 9100G
...
2024-09-11/20240-09-12 // 100G

This is encountered during ingestion. The implementations you raised deal with supervisors during ingestion, and not with the Granularity.

@GWphua GWphua reopened this Sep 12, 2024
@kfaraz
Copy link
Contributor

kfaraz commented Sep 23, 2024

Assuming we create the tasks today (12 Sep 2024). Based on my understanding, if payloads are misconfigured with 20240, they will be initialized with the following:
2024-06-11/20240-06-12 // 9200G
2024-06-12/20240-06-13 // 9100G
...
2024-09-11/20240-09-12 // 100G

@GWphua , I am not sure if this is actually the case in the current code. Could you please share the logs of the ingestion that you tried where you encountered this?

Also, IIUC, the issue alludes to validating task payloads. The right way to do that would be to add new validations in the method TaskQueue.validateTaskPayload() and not modify any of the Granularity classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Check interval range to avoid cases where year is inappropriately entered
4 participants