-
Notifications
You must be signed in to change notification settings - Fork 25.3k
[Transform] implement retention policy to delete data from a transform #67832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
45b6960
to
f46a155
Compare
c75a88e
to
029f091
Compare
@elasticmachine update branch |
9c36908
to
158f96b
Compare
run elasticsearch-ci/default-distro |
ac758be
to
1985bd3
Compare
Pinging @elastic/ml-core (:ml/Transform) |
import static org.mockito.Mockito.times; | ||
import static org.mockito.Mockito.verify; | ||
|
||
public class TransformIndexerFailureHandlingTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is the renamed TransformIndexerTests
, no changes done
import static org.mockito.Mockito.mock; | ||
import static org.mockito.Mockito.times; | ||
import static org.mockito.Mockito.verify; | ||
|
||
public class TransformIndexerTests extends ESTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the former TransformIndexerTests
to TransformIndexerFailureHandlingTests
, so this is a new file.
...-high-level/src/main/java/org/elasticsearch/client/transform/transforms/TransformConfig.java
Outdated
Show resolved
Hide resolved
...-high-level/src/main/java/org/elasticsearch/client/transform/transforms/TransformConfig.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/action/admin/indices/refresh/RefreshResponse.java
Show resolved
Hide resolved
...in/core/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformConfig.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformConfigUpdate.java
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Outdated
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Outdated
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will finish in the afternoon.
I wish renaming the unit test base classes was a separate PR but no worries, I'll be able to go through it.
client/rest-high-level/src/test/java/org/elasticsearch/client/RestHighLevelClientTests.java
Show resolved
Hide resolved
...e/src/main/java/org/elasticsearch/xpack/core/transform/transforms/TransformIndexerStats.java
Show resolved
Hide resolved
...re/src/test/java/org/elasticsearch/xpack/core/transform/transforms/TransformConfigTests.java
Outdated
Show resolved
Hide resolved
...rc/test/java/org/elasticsearch/xpack/core/transform/transforms/latest/LatestConfigTests.java
Outdated
Show resolved
Hide resolved
...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java
Outdated
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformContext.java
Outdated
Show resolved
Hide resolved
...elasticsearch/xpack/transform/transforms/RetentionPolicyToDeleteByQueryRequestConverter.java
Outdated
Show resolved
Hide resolved
...elasticsearch/xpack/transform/transforms/RetentionPolicyToDeleteByQueryRequestConverter.java
Outdated
Show resolved
Hide resolved
...elasticsearch/xpack/transform/transforms/RetentionPolicyToDeleteByQueryRequestConverter.java
Outdated
Show resolved
Hide resolved
if (retentionPolicyConfig instanceof TimeRetentionPolicyConfig) { | ||
request.setQuery(getDeleteQueryFromTimeBasedRetentionPolicy((TimeRetentionPolicyConfig) retentionPolicyConfig, checkpoint)); | ||
} else { | ||
throw new RetentionPolicyException("unsupported retention policy of type [{}]", retentionPolicyConfig.getWriteableName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this cause problems when other types of policies are added in subsequent versions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add the accessors to RetentionPolicyConfig
so that you don't have to do this casting.
Or RetentionPolicyConfig
know everything about itself it could have a toDeleteByQueryRequest
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it would? I think thats the idea. If you add a new policy, you sure better make sure to write a request converter.
Throwing here is a good pattern. It enforces case handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or RetentionPolicyConfig know everything about itself it could have a toDeleteByQueryRequest method
Intentionally I avoided that. We had (probably still have) a lot of logic in the config classes in the past and I had a hard time to separate config and logic.
I agree it looks a little dumb at the moment, its just 1 config with 1 implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 from me.
I really like how nicely it falls into the indexer finish flow.
.endObject() | ||
.endObject() | ||
.endObject(); | ||
.startObject("properties") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any integration test added for the new functionality? I couldn't find it in this PR and I think it would be beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time to craft one, I don't want to slow down the test execution further. It would be nice to manipulate the clock, but I think that's hard to do from the outside. E.g. I defined that max_age
must be at least 60s
. So an integration test must run at least 60s
.
What I did:
- unit test: test a policy is translated to a delete by query
- module test: test that the indexer is calling the delete extension points if a policy is there and not if a policy is
null
-> ensures the right hooks are called at the end of the checkpoint. see the newTransformIndexerTests
out of scope:
- testing correctness of delete by query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
if (retentionPolicyConfig instanceof TimeRetentionPolicyConfig) { | ||
request.setQuery(getDeleteQueryFromTimeBasedRetentionPolicy((TimeRetentionPolicyConfig) retentionPolicyConfig, checkpoint)); | ||
} else { | ||
throw new RetentionPolicyException("unsupported retention policy of type [{}]", retentionPolicyConfig.getWriteableName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add the accessors to RetentionPolicyConfig
so that you don't have to do this casting.
Or RetentionPolicyConfig
know everything about itself it could have a toDeleteByQueryRequest
method
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Outdated
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Outdated
Show resolved
Hide resolved
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformIndexer.java
Outdated
Show resolved
Hide resolved
...sform/src/main/java/org/elasticsearch/xpack/transform/transforms/ClientTransformIndexer.java
Outdated
Show resolved
Hide resolved
.endObject() | ||
.endObject() | ||
.endObject(); | ||
.startObject("properties") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a hard time to craft one, I don't want to slow down the test execution further. It would be nice to manipulate the clock, but I think that's hard to do from the outside. E.g. I defined that max_age
must be at least 60s
. So an integration test must run at least 60s
.
What I did:
- unit test: test a policy is translated to a delete by query
- module test: test that the indexer is calling the delete extension points if a policy is there and not if a policy is
null
-> ensures the right hooks are called at the end of the checkpoint. see the newTransformIndexerTests
out of scope:
- testing correctness of delete by query
...n/transform/src/main/java/org/elasticsearch/xpack/transform/transforms/TransformContext.java
Outdated
Show resolved
Hide resolved
// paranoia: we are not expecting dbq to fail for other reasons | ||
if (bulkByScrollResponse.getBulkFailures().size() > 0 || bulkByScrollResponse.getSearchFailures().size() > 0) { | ||
assert false : "delete by query failed unexpectedly" + bulkByScrollResponse; | ||
logger.warn( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this case, but actually think I will change it to call onFailure
, it would be inconsistent to search and indexing errors, I mean that's why we have retry-logic.
if (retentionPolicyConfig instanceof TimeRetentionPolicyConfig) { | ||
request.setQuery(getDeleteQueryFromTimeBasedRetentionPolicy((TimeRetentionPolicyConfig) retentionPolicyConfig, checkpoint)); | ||
} else { | ||
throw new RetentionPolicyException("unsupported retention policy of type [{}]", retentionPolicyConfig.getWriteableName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or RetentionPolicyConfig know everything about itself it could have a toDeleteByQueryRequest method
Intentionally I avoided that. We had (probably still have) a lot of logic in the config classes in the past and I had a hard time to separate config and logic.
I agree it looks a little dumb at the moment, its just 1 config with 1 implementation.
76f3dde
to
8068051
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Just one nit on json formatting in Java code.
+ " \"unique_key\": []," | ||
+ " \"sort\": \"timestamp\"" | ||
+ "}"; | ||
String json = "{" + " \"unique_key\": []," + " \"sort\": \"timestamp\"" + "}"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan to apply
// tag::NO_CODE_FORMAT
here and in other test cases?
Now it looks strange IMO to have this concatenations in one line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I did not fix those, will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs changes LGTM. Thanks!
elastic#67832) add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint. fixes elastic#67916
add retention policy and sync to documentation IT, add builders for sync and retention policy relates #67832
add retention policy and sync to documentation IT, add builders for sync and retention policy relates #67832
add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint.
fixes #67916
todo:
_cat/transforms/