[Change Proposal] Support Knowledge Base Packages

### Summary

This is a proposal much like #346 or #351 for enabling the ability to bundle static data stream data within a package, such that when the package is installed, the data stream is created, and the bundled data is ingested into the data stream.

The specific use case here is for shipping 'Knowledge Base' content for use by the Elastic Assistants. For example, both [Security](https://github.com/elastic/kibana/pull/169593) and [Observability](https://github.com/elastic/kibana/pull/174583) Assistants are currently bundling our ES|QL docs with the Kibana distribution for each release. We then take this data, optionally chunk it, and then embed/ingest it using ELSER into a 'knowledge base' data stream so the assistants can query it for their ES|QL query generation features. Each release we'll need to update this content, and ship it as part of the Kibana distribution, with no ability to ship intermediate content updates outside of the Kibana release cycle.

Additionally, as mentioned in https://github.com/elastic/package-spec/issues/346#issuecomment-1890186256, this essentially provides us the ability to ship 'Custom GPTs' that can integrate with our assistants, and so opens up a world of possibilities for users to configure and expand the capabilities of the Security and Observability Assistants.

### Requirement Details

#### Configuration 

The core requirement here is for the ability to include the following when creating a package:

* Any number of data streams to create, though realistically one is probably sufficient
* An arbitrary number of documents, perhaps in json format, or zipped as detailed in #346 
  *  This generally won't be large amounts of data as detailed in #346 (our ES|QL docs are 196 documents and ~125KB), however I would expect some users would push this to enable RAG over larger data sets
* Some configuration for the destination data stream of the bundled documents. If we include a raw dump of the documents from ES, perhaps we can use just the `_index` fields to route them accordingly?

#### Behavior

Upon installation, the package should install the included data streams, then ingest the bundled documents into their destination data stream. This initial data should stick around for as long as the package is installed. If the package is removed, the data stream + initial data should be removed as well. When the package is updated, it would be fine to wipe the data stream/initial data and treat it as a fresh install. Whatever is easiest/most resilient would be fine for the first iteration here. No need to worry about appending new data on upgrade, or dealing with mapping changes, just delete the data streams and re-install/re-ingest the initial data.

---

The above would be sufficient enough for us to start bundling knowledge base documents in packages, at which point we could install as needed in support of specific assistant features.






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Change Proposal] Support Knowledge Base Packages #693

spong
openedon Jan 18, 2024

Summary

Requirement Details

Configuration

Behavior

Assignees

Labels

Type

Projects

Milestone

Relationships

Development