Skip to content

Introduce forget follower API #39718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Mar 7, 2019
Merged

Conversation

jasontedor
Copy link
Member

This commit introduces the forget follower API. This API is needed in cases that unfollowing a following index fails to remove the shard history retention leases on the leader index. This can happen explicitly through user action, or implicitly through an index managed by ILM. When this occurs, history will be retained longer than necessary. While the retention lease will eventually expire, it can be expensive to allow history to persist for that long, and also prevent ILM from performing actions like shrink on the leader index. As such, we introduce an API to allow for manual removal of the shard history retention leases in this case.

Relates #37165

This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
@jasontedor jasontedor added >enhancement v7.0.0 :Distributed Indexing/CCR Issues around the Cross Cluster State Replication features v6.7.0 v8.0.0 v7.2.0 labels Mar 5, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks good. I left some comments and questions.


public class ForgetFollowerAction extends Action<BroadcastResponse> {

public static final String ACTION_NAME = "indices:admin/xpack/ccr/forget_follower";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not include /xpack/ in new actions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather to be consistent with all of CCR than introduce inconsistency there. We can address wholesale later, if needed.

public class BroadcastResponseTests extends ESTestCase {

public void testFromXContent() throws IOException {
final String index = randomAlphaOfLength(8);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use AbstractXContentTestCase#xContentTester() static method to test xcontent serialization?
I think it is a bit shorter and we have more test iterations by default.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that AbstractXContentTestCase here is a lie. This is because what we really want to test is that if the REST layer sends back an XContent serialized org.elasticsearch.action.support.broadcast.BroadcastResponse then org.elasticsearch.client.core.BroadcastResponse is able to de-serialize this XContent. AbstractXContentTestCase does not allow to test such a situation, instead only allowing you to go from concrete instance (of org.elasticsearch.client.core.BroadcastResponse) to XContent and then back to a concrete instance of (org.elasticsearch.client.core.BroadcastResponse), and then checks if the result is equal. That is not what actually happens in production though, what happens in production is what I outlined above: a concrete instance of org.elasticsearch.action.support.broadcast.BroadcastResponse is serialized on the REST layer which then has to be de-serialized on the HL client layer by org.elasticsearch.client.core.BroadcastResponse. Hence, this test reflects that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think we should open an issue, so that goes over the existing hlrc response serialization tests and change those tests to test way is tested here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #39745.

@@ -395,6 +401,98 @@ public void onFailure(Exception e) {
assertTrue(latch.await(30L, TimeUnit.SECONDS));
}

public void testForgetFollower() throws InterruptedException, IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the hlrc docs itself are missing? Test looks good though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed 63721a4.


==== Authorization

If the {es} {security-features} are enabled, you must have `manage_follow_index`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IndexPrivilege#MANAGE_FOLLOW_INDEX_AUTOMATON constant isn't updated to include the forget follower api. Also this text should say: index privileges for the leader index. instead of index privileges for the follower index..

I also I wonder whether manage_follow_index is the right privilege here. Because this is about a leases for the follower index in the leader index. I don't know a good alternative here. I just wanted to mention this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed e68b0c0.

@@ -104,7 +104,7 @@ public class RestTestsFromSnippetsTask extends SnippetsTask {
* format of the response is incompatible i.e. it is not a JSON object.
*/
static shouldAddShardFailureCheck(String path) {
return path.startsWith('_cat') == false && path.startsWith('_ml/datafeeds/') == false
return path.startsWith('_cat') == false && path.startsWith('_ml/datafeeds/') == false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are the changes to the groovy files needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the docs tests where we fire forget follower requests, it is entirely too hard to set up a request that is not going to fail. For example, to set up a request that is not going to fail, we have to know the index UUID of the follower. Obtaining that in a docs test is not possible to my knowledge, we do not have a way to set a value to use in a later docs test? Maybe I am missing something?

However, the docs infrastructure automatically adds is_false: _shards.failures to every docs test. Therefore, I had to add a hook that enables us to say skip that check for these tests where we expect failures.

Again, if there's a way to avoid the failures that I am not seeing, please let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explanation!

@SuppressWarnings("unchecked") final Map<String, Object> shardStatsAsMap = (Map<String, Object>)shardsStats.get(0);
@SuppressWarnings("unchecked") final Map<String, Object> retentionLeasesStats =
(Map<String, Object>) shardStatsAsMap.get("retention_leases");
@SuppressWarnings("unchecked") final ArrayList<Object> leases = (ArrayList<Object>)retentionLeasesStats.get("leases");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since no unfollow has happened, the shard follow task may add a new lease, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not since it is paused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right, I missed that pause follow index api call.

@jasontedor jasontedor requested a review from martijnvg March 6, 2019 15:53
@martijnvg
Copy link
Member

I made the above change, but the test is still failing, but now it fails because a lease is still returned, which isn't expected, as the leases should have been removed.

FAILURE 1.84s | FollowIndexSecurityIT.testForgetFollower <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: 
   > Expected: an empty collection
   >      but: <[{id=follow-cluster/forget-follower/FgX6xx8PRdW-7Go66pjbQw-following-leader_cluster/forget-leader/tcYPeEYiRuCQoAVGQWFQLA, retaining_seq_no=0, timestamp=1551903399688, source=ccr}]>
   >    at __randomizedtesting.SeedInfo.seed([BD569197DFE90BBD:31C54051FDB9761D]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.xpack.ccr.FollowIndexSecurityIT.testForgetFollower(FollowIndexSecurityIT.java:227)
   >    at java.lang.Thread.run(Thread.java:748)

I also noticed that the specified leader_remote_cluster was incorrect:

--- a/x-pack/plugin/ccr/qa/security/src/test/java/org/elasticsearch/xpack/ccr/FollowIndexSecurityIT.java
+++ b/x-pack/plugin/ccr/qa/security/src/test/java/org/elasticsearch/xpack/ccr/FollowIndexSecurityIT.java
@@ -208,9 +208,10 @@ public class FollowIndexSecurityIT extends ESCCRRestTestCase {
                         "\"follower_cluster\":\"follow-cluster\"," +
                         "\"follower_index\":\"" +  forgetFollower + "\"," +
                         "\"follower_index_uuid\":\"" + followerIndexUUID + "\"," +
-                        "\"leader_remote_cluster\":\"leader_cluster\"" +
+                        "\"leader_remote_cluster\":\"leader-cluster\"" +
                         "}";
                 request.setJsonEntity(requestBody);

But even with this change there is still a lease returned. I don't yet understand why that happens.

@jasontedor
Copy link
Member Author

@martijnvg leader_cluster is correct because the name of the lease is based on the name of the remote cluster alias, not the actual name of the cluster.

@jasontedor
Copy link
Member Author

@martijnvg The fix is 2e96e66.

@jasontedor jasontedor requested a review from martijnvg March 7, 2019 02:50
@jasontedor
Copy link
Member Author

@elasticmachine run elasticsearch-ci/bwc

Copy link
Member

@martijnvg martijnvg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I've merged master into your branch, because that fixes the bwc test that was failing.

@jasontedor
Copy link
Member Author

@martijnvg I pushed a commit that would have made the failure more apparent: 238c29b

This gives:

  2> REPRODUCE WITH: ./gradlew :x-pack:plugin:ccr:qa:security:followClusterTestRunner -Dtests.seed=A31A2C218548E5CC -Dtests.class=org.elasticsearch.xpack.ccr.FollowIndexSecurityIT -Dtests.method="testForgetFollower" -Dtests.security.manager=true -Dtests.locale=da -Dtests.timezone=Pacific/Bougainville -Dcompiler.java=11 -Druntime.java=8
FAILURE 1.77s | FollowIndexSecurityIT.testForgetFollower <<< FAILURES!
   > Throwable #1: java.lang.AssertionError: [{shard=0, index=forget-leader, status=INTERNAL_SERVER_ERROR, reason={type=failed_node_exception, reason=Failed node [ZQdq-2YiRkuZ8Rkwhg4IwQ], node_id=ZQdq-2YiRkuZ8Rkwhg4IwQ, caused_by={type=security_exception, reason=action [indices:admin/xpack/ccr/forget_follower[n]] is unauthorized for user [test_ccr]}}}]
   > Expected: <1>
   >      but: was <0>
   >    at __randomizedtesting.SeedInfo.seed([A31A2C218548E5CC:2F89FDE7A718986C]:0)
   >    at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   >    at org.elasticsearch.xpack.ccr.FollowIndexSecurityIT.testForgetFollower(FollowIndexSecurityIT.java:219)
   >    at java.lang.Thread.run(Thread.java:748)

@jasontedor
Copy link
Member Author

Thanks for the really helpful review @martijnvg.

@jasontedor jasontedor merged commit 8c156ed into elastic:master Mar 7, 2019
jasontedor added a commit that referenced this pull request Mar 7, 2019
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
jasontedor added a commit that referenced this pull request Mar 7, 2019
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
jasontedor added a commit that referenced this pull request Mar 7, 2019
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
@jasontedor jasontedor deleted the forget-follower branch March 7, 2019 16:33
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Mar 7, 2019
* 6.7:
  Fix CCR HLRC docs
  Introduce forget follower API (elastic#39718)
  6.6.2 release notes.
  Update release notes for 6.7.0
  Add documentation for min_hash filter (elastic#39671)
  Unmute testIndividualActionsTimeout
  Unmute testFollowIndexAndCloseNode
  Use unwrapped cause to determine if node is closing (elastic#39723)
  Don’t ack if unable to remove failing replica (elastic#39584)
  Wipe Snapshots Before Indices in RestTests (elastic#39662) (elastic#39765)
  Bug fix for AnnotatedTextHighlighter (elastic#39525)
  Fix Snapshot BwC with Version 5.6.x (elastic#39737)
  Fix occasional SearchServiceTests failure (elastic#39697)
  Correct date in daterange-aggregation.asciidoc (elastic#39727)
  Add a note to purge the ingest-geoip plugin (elastic#39553)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants