Skip to content

.fleet-actions-results data stream cannot be restored via the fleet feature state #89261

Closed
@romain-chanu

Description

@romain-chanu

Elasticsearch Version

8.3.3

Installed Plugins

No response

Java Version

bundled

OS Version

Deployment in ESS

Problem Description

.fleet-actions-results data stream cannot be restored via the fleet feature state.

Consider the following scenario (observed in the field in ESS):

  1. Due to unforeseen situation, cluster becomes red with the following red indices:
health status index                                                                                                                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size sth
green  open   .ds-.fleet-actions-results-2022.05.04-000002                                                                                    eZO3mXu3RYOZpygHvC2dgQ   1   1          0            0       450b           225b false
red    open   .ds-.fleet-actions-results-2022.06.03-000003                                                                                    iBbSWmHaQbqJFn_aBVqaYg   1   1                                                   false
red    open   .ds-.fleet-actions-results-2022.07.03-000004                                                                                    sF3-S4uoQkybpm7ujaZBVg   1   1                                                   false
red    open   .ds-.fleet-actions-results-2022.08.02-000006                                                                                    t-U-Wrd_RpqZUqSS2a3TqA   1   1                                                   false
red    open   .fleet-actions-7                                                                                                                8zgOKVzdQIeS_YGq_JX--w   1   1                                                   false
red    open   .fleet-agents-7                                                                                                                 p7sWhvhPRaWQ_unOHIJQTQ   1   1                                                   false
red    open   .fleet-artifacts-7                                                                                                              iingfeghRJ2bfqLAGFt0Aw   1   1                                                   false
red    open   .fleet-enrollment-api-keys-7                                                                                                    8J1tyEuJSfyhMxf5HsfU2A   1   1                                                   false
red    open   .fleet-policies-7                                                                                                               HufDBhgBQraUYlNosY1ysg   1   1                                                   false
red    open   .fleet-policies-leader-7                                                                                                        jpqhCaF9SL-S0AjlWqa6xg   1   1                                                   false
red    open   .fleet-servers-7                                                                                                                5xdgNy-kSXSdsWZbM8mRHw   1   1                                                   false
  1. User attempts to restore the fleet feature state using the following restore snapshot API:
POST _snapshot/found-snapshots/cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/_restore?wait_for_completion=false
{
  "indices": "-*",
  "ignore_unavailable": "true",
  "include_global_state": "false",
  "include_aliases": "false",
  "feature_states": [
   "fleet"
  ]
}
  1. Above API fails with the following error:
{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_restore_exception",
        "reason": "[found-snapshots:cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/H3i28HlrSiKyrLaiDCE6uA] cannot restore index [.ds-.fleet-actions-results-2022.06.03-000003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
      }
    ],
    "type": "snapshot_restore_exception",
    "reason": "[found-snapshots:cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/H3i28HlrSiKyrLaiDCE6uA] cannot restore index [.ds-.fleet-actions-results-2022.06.03-000003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
  },
  "status": 500
}
  1. Checking the fleet feature state, it seems that the SystemIndexDescriptor (c.f code) does contain the .fleet-actions-results-* pattern. A couple of guesses about the reported problem:
  • The implementation only considers regular indices and not data streams?
  • The implementation considers the data stream but fails to close the backing indices before restoring them?

Steps to Reproduce

  1. Create a cluster version 8.3.3 and deploy an Elastic Agent with the Osquery Manager integration.
  2. Run a new live Osquery.
  3. Observe that the .fleet-actions-results data stream is created with the respective backing indices.
  4. Restore the fleet feature state using the restore snapshot API and observe the same error as above.

Workaround

  1. Create fleet_superuser role
POST _security/role/fleet_superuser
{
  "indices": [
    {
      "names": [
        ".fleet*"
      ],
      "privileges": [
        "all"
      ],
      "allow_restricted_indices": true
    }
  ]
}
  1. Create temp_user user with superuser, fleet_superuser roles:
POST _security/user/temp_user
{
  "password": "temp_password",
  "roles": [
    "superuser",
    "fleet_superuser"
  ]
}
  1. Close .fleet-actions-results backing indices using the below cURL command:
curl -k -XPOST --user temp_user:temp_password -H 'x-elastic-product-origin:fleet' https://$CLUSTER_ADDRESS/.ds-.fleet-actions-results-2022.05.04-000002,.ds-.fleet-actions-results-2022.06.03-000003,.ds-.fleet-actions-results-2022.07.03-000004,.ds-.fleet-actions-results-2022.08.02-000006/_close

Note: for users running the cURL command on Windows, make sure to use double quotes instead for the header: "x-elastic-product-origin:fleet"

  1. Restore fleet feature state:
POST _snapshot/found-snapshots/cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/_restore?wait_for_completion=false
{
  "indices": "-*",
  "ignore_unavailable": "true",
  "include_global_state": "false",
  "include_aliases": "false",
  "feature_states": [
    "fleet"
  ]
}
  1. Delete temp_user user
DELETE _security/user/temp_user
  1. Delete fleet_superuser role
DELETE _security/role/fleet_superuser

Logs (if relevant)

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    :Core/Infra/CoreCore issues without another label>bugSupportabilityImprove our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better.Team:Core/InfraMeta label for core/infra team

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions