Skip to content

[Change Proposal] Support Knowledge Base Packages #693

Closed

Description

Summary

This is a proposal much like #346 or #351 for enabling the ability to bundle static data stream data within a package, such that when the package is installed, the data stream is created, and the bundled data is ingested into the data stream.

The specific use case here is for shipping 'Knowledge Base' content for use by the Elastic Assistants. For example, both Security and Observability Assistants are currently bundling our ES|QL docs with the Kibana distribution for each release. We then take this data, optionally chunk it, and then embed/ingest it using ELSER into a 'knowledge base' data stream so the assistants can query it for their ES|QL query generation features. Each release we'll need to update this content, and ship it as part of the Kibana distribution, with no ability to ship intermediate content updates outside of the Kibana release cycle.

Additionally, as mentioned in #346 (comment), this essentially provides us the ability to ship 'Custom GPTs' that can integrate with our assistants, and so opens up a world of possibilities for users to configure and expand the capabilities of the Security and Observability Assistants.

Requirement Details

Configuration

The core requirement here is for the ability to include the following when creating a package:

  • Any number of data streams to create, though realistically one is probably sufficient
  • An arbitrary number of documents, perhaps in json format, or zipped as detailed in [discuss] Support (fairly large) sample data set package  #346
  • Some configuration for the destination data stream of the bundled documents. If we include a raw dump of the documents from ES, perhaps we can use just the _index fields to route them accordingly?

Behavior

Upon installation, the package should install the included data streams, then ingest the bundled documents into their destination data stream. This initial data should stick around for as long as the package is installed. If the package is removed, the data stream + initial data should be removed as well. When the package is updated, it would be fine to wipe the data stream/initial data and treat it as a fresh install. Whatever is easiest/most resilient would be fine for the first iteration here. No need to worry about appending new data on upgrade, or dealing with mapping changes, just delete the data streams and re-install/re-ingest the initial data.


The above would be sufficient enough for us to start bundling knowledge base documents in packages, at which point we could install as needed in support of specific assistant features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

Team:EcosystemLabel for the Packages Ecosystem teamdiscussIssue needs discussion

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions