Skip to content

[Transform] implement retention policy to delete data from a transform #67832

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Feb 8, 2021

Conversation

hendrikmuhs
Copy link

@hendrikmuhs hendrikmuhs commented Jan 21, 2021

add a retention policy to transform to delete data that is considered outdated as part of a transform checkpoint.

fixes #67916

todo:

  • documentation changes
  • support for added counters in _cat/transforms/

@hendrikmuhs hendrikmuhs changed the title [Transform] add the possibility to delete data based on a retention policy [Transform] implement retention policy to delete data from a transform Jan 26, 2021
@hendrikmuhs hendrikmuhs force-pushed the transform-retention branch 2 times, most recently from c75a88e to 029f091 Compare February 1, 2021 09:52
@mark-vieira
Copy link
Contributor

@elasticmachine update branch

@hendrikmuhs
Copy link
Author

run elasticsearch-ci/default-distro

@hendrikmuhs hendrikmuhs marked this pull request as ready for review February 4, 2021 07:34
@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml/Transform)

import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;

public class TransformIndexerFailureHandlingTests extends ESTestCase {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the renamed TransformIndexerTests, no changes done

import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.times;
import static org.mockito.Mockito.verify;

public class TransformIndexerTests extends ESTestCase {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the former TransformIndexerTests to TransformIndexerFailureHandlingTests, so this is a new file.

@benwtrent benwtrent self-requested a review February 4, 2021 15:27
Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review, will finish in the afternoon.

I wish renaming the unit test base classes was a separate PR but no worries, I'll be able to go through it.

if (retentionPolicyConfig instanceof TimeRetentionPolicyConfig) {
request.setQuery(getDeleteQueryFromTimeBasedRetentionPolicy((TimeRetentionPolicyConfig) retentionPolicyConfig, checkpoint));
} else {
throw new RetentionPolicyException("unsupported retention policy of type [{}]", retentionPolicyConfig.getWriteableName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this cause problems when other types of policies are added in subsequent versions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add the accessors to RetentionPolicyConfig so that you don't have to do this casting.

Or RetentionPolicyConfig know everything about itself it could have a toDeleteByQueryRequest method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it would? I think thats the idea. If you add a new policy, you sure better make sure to write a request converter.

Throwing here is a good pattern. It enforces case handling.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or RetentionPolicyConfig know everything about itself it could have a toDeleteByQueryRequest method

Intentionally I avoided that. We had (probably still have) a lot of logic in the config classes in the past and I had a hard time to separate config and logic.

I agree it looks a little dumb at the moment, its just 1 config with 1 implementation.

Copy link
Member

@benwtrent benwtrent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 from me.

I really like how nicely it falls into the indexer finish flow.

.endObject()
.endObject()
.endObject();
.startObject("properties")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any integration test added for the new functionality? I couldn't find it in this PR and I think it would be beneficial.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time to craft one, I don't want to slow down the test execution further. It would be nice to manipulate the clock, but I think that's hard to do from the outside. E.g. I defined that max_age must be at least 60s. So an integration test must run at least 60s.

What I did:

  • unit test: test a policy is translated to a delete by query
  • module test: test that the indexer is calling the delete extension points if a policy is there and not if a policy is null -> ensures the right hooks are called at the end of the checkpoint. see the new TransformIndexerTests

out of scope:

  • testing correctness of delete by query

Copy link
Member

@davidkyle davidkyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if (retentionPolicyConfig instanceof TimeRetentionPolicyConfig) {
request.setQuery(getDeleteQueryFromTimeBasedRetentionPolicy((TimeRetentionPolicyConfig) retentionPolicyConfig, checkpoint));
} else {
throw new RetentionPolicyException("unsupported retention policy of type [{}]", retentionPolicyConfig.getWriteableName());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add the accessors to RetentionPolicyConfig so that you don't have to do this casting.

Or RetentionPolicyConfig know everything about itself it could have a toDeleteByQueryRequest method

.endObject()
.endObject()
.endObject();
.startObject("properties")
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a hard time to craft one, I don't want to slow down the test execution further. It would be nice to manipulate the clock, but I think that's hard to do from the outside. E.g. I defined that max_age must be at least 60s. So an integration test must run at least 60s.

What I did:

  • unit test: test a policy is translated to a delete by query
  • module test: test that the indexer is calling the delete extension points if a policy is there and not if a policy is null -> ensures the right hooks are called at the end of the checkpoint. see the new TransformIndexerTests

out of scope:

  • testing correctness of delete by query

// paranoia: we are not expecting dbq to fail for other reasons
if (bulkByScrollResponse.getBulkFailures().size() > 0 || bulkByScrollResponse.getSearchFailures().size() > 0) {
assert false : "delete by query failed unexpectedly" + bulkByScrollResponse;
logger.warn(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this case, but actually think I will change it to call onFailure, it would be inconsistent to search and indexing errors, I mean that's why we have retry-logic.

if (retentionPolicyConfig instanceof TimeRetentionPolicyConfig) {
request.setQuery(getDeleteQueryFromTimeBasedRetentionPolicy((TimeRetentionPolicyConfig) retentionPolicyConfig, checkpoint));
} else {
throw new RetentionPolicyException("unsupported retention policy of type [{}]", retentionPolicyConfig.getWriteableName());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or RetentionPolicyConfig know everything about itself it could have a toDeleteByQueryRequest method

Intentionally I avoided that. We had (probably still have) a lot of logic in the config classes in the past and I had a hard time to separate config and logic.

I agree it looks a little dumb at the moment, its just 1 config with 1 implementation.

Copy link
Contributor

@przemekwitek przemekwitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Just one nit on json formatting in Java code.

+ " \"unique_key\": [],"
+ " \"sort\": \"timestamp\""
+ "}";
String json = "{" + " \"unique_key\": []," + " \"sort\": \"timestamp\"" + "}";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to apply

// tag::NO_CODE_FORMAT

here and in other test cases?
Now it looks strange IMO to have this concatenations in one line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I did not fix those, will do.

Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs changes LGTM. Thanks!

@hendrikmuhs hendrikmuhs merged commit 54ed2e3 into elastic:master Feb 8, 2021
@hendrikmuhs hendrikmuhs deleted the transform-retention branch February 8, 2021 14:06
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Feb 10, 2021
elastic#67832)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

fixes elastic#67916
hendrikmuhs pushed a commit that referenced this pull request Feb 11, 2021
…nsform (#67832) (#68814)

add a retention policy to transform to delete data that is considered outdated as part of a
transform checkpoint.

backport #67832
fixes #67916
hendrikmuhs pushed a commit that referenced this pull request Feb 11, 2021
hendrikmuhs pushed a commit that referenced this pull request Feb 11, 2021
add retention policy and sync to documentation IT, add builders for sync
and retention policy

relates #67832
hendrikmuhs pushed a commit that referenced this pull request Feb 11, 2021
add retention policy and sync to documentation IT, add builders for sync
and retention policy

relates #67832
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Mar 11, 2021
hendrikmuhs pushed a commit that referenced this pull request Mar 11, 2021
When using wait_for_checkpoint the destination index should be searchable when the call returns.
This issue has been fixed as part of #67832. This change removes the index refresh workarounds still
present.

fixes #51154
hendrikmuhs pushed a commit that referenced this pull request Mar 11, 2021
When using wait_for_checkpoint the destination index should be searchable when the call returns.
This issue has been fixed as part of #67832. This change removes the index refresh workarounds still
present.

fixes #51154
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Transform] add the ability to delete documents from the destination index
8 participants