Skip to content

[Fleet] Packages with a large number of saved objects in them cause Kibana to crash #147695

@xcrzx

Description

@xcrzx

We have recently encountered an issue where Kibana crashes when installing a Fleet package that contains a large number of saved objects. The crash occurs during the installation process and seems to be caused by the deletion of the previous package version.

Steps to reproduce:

  1. Install a Fleet package that contains a large number of saved objects (e.g. over 10,000) using POST /api/fleet/epm/packages/<package>/<version>.
    You could follow the steps from this ticket to generate a package with a large number of saved objects and install it.
  2. Observe that Kibana crashes during the installation process.

Expected result:
The Fleet package should be installed successfully without crashing Kibana.

Actual result:
Kibana crashes during the installation process. Elasticsearch logs show dozens of warnings similar to this:

block until refresh ran out of slots and forced a refresh: [BulkShardRequest [[.kibana_8.7.0_001][0]] containing [delete {[.kibana_8.7.0][security-rule:d8fc1cca-93ed-43c1-bbb6-c0dd3eff2958:102.0.6]}] blocking until refresh]

During that time, all requests to Kibana fail with

{"statusCode":503,"error":"Service Unavailable","message":"connect EADDRNOTAVAIL 127.0.0.1:9200 - Local (0.0.0.0:0)"}

Notes:

This issue does not occur with smaller packages containing fewer saved objects.

The issue can be temporarily resolved by manually deleting the saved objects from the previous package version before installing the new one, but this is not a permanent solution.

APM logs show hundreds of DELETE requests sent in parallel, they seem to overflow Elasticsearch, making it unresponsive:

Screenshot 2022-12-16 at 15 01 23

Metadata

Metadata

Assignees

Labels

8.7 candidateFeature:Prebuilt Detection RulesSecurity Solution Prebuilt Detection Rules areaTeam: SecuritySolutionSecurity Solutions Team working on SIEM, Endpoint, Timeline, Resolver, etc.Team:Detection Rule ManagementSecurity Detection Rule Management TeamTeam:Detections and RespSecurity Detection Response TeamTeam:FleetTeam label for Observability Data Collection Fleet teambugFixes for quality problems that affect the customer experienceimpact:highAddressing this issue will have a high level of impact on the quality/strength of our product.v8.7.0

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions