adding integration content including documentation and an integration…

… for the Observability catalog Signed-off-by: YANGDB <yang.db.dev@gmail.com>
opensearch-project · YANG-DB · Aug 22, 2023 · Aug 16, 2023 · Aug 16, 2023 · Aug 22, 2023
commit 9a8d7a3f14f2b273d2ba433307c34c1a5a5bda5a
@@ -0,0 +1,57 @@
+## What makes an integration?
+
+An integration has four parts:
+
+* A configuration that ties together the following three parts with some metadata.
+* A schema that defines the data format, currently based on SS4O. [Docs](https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema)
+* Assets that are loaded into Dashboards via the Saved Object API.
+    * In the future, more asset types will be supported, but for the moment we only support saved objects.
+* Statics: logos, screenshots, the works.
+
+For some examples of integrations, see the current list in the [Observability Integrations ](../../integrations).
+
+The [Nginx integration](../../integrations/observability/nginx) in particular is a prototypical example that we’ll reference throughout the guide.
+
+## How to create one?
+
+Getting a working pair of schemas and assets is challenging. Here’s 7 general steps to get it done. We assume you are trying to create an integration using an existing dashboards on the web (e.g. Beats) instead of making one from scratch.
+
+### Set up example infrastructure based on what’s already available
+
+Follow the docs from some target project to set up a sample infrastructure.
+
+### Collect sample data records from that infrastructure
+
+In particular, you want records as they’re submitted to OpenSearch, in JSON. The tooling in the next step generally depends on the data being in a JSON format.
+
+### Make a converter to convert records to SS4O
+
+In the future this can be automated with a code generation framework, but the general idea is to use some agent to make OTEL-compliant or ECS-compliant data. Some options:
+
+* **Jaeger**
+* **Fluentbit**
+* **Data Prepper**
+* **OTEL Collector**
+* **Logstash**
+
+For determining what actually needs to be done for conversion, there is an [open PR](https://github.com/opensearch-project/opensearch-catalog/pull/32) for a tool that lets you quickly check how your data differs from a selected SS4O schema. This can be a valuable debugging tool for making an SS4O converter.
+
+### Make an alternative infrastructure that uses this converter
+
+Docker is ideal here, if possible. Strictly speaking, you only need this for testing, making a “correct” alternative infrastructure with other users is out of scope. The goal is to just get schema-compliant data stored in an OpenSearch data stream.
+
+### Convert or remake assets from the original system using SS4O
+
+In the future this can be automated. For now, I think the main idea is to go through the panels in the existing dashboard(s) and update their queries for the new schema. If a field is needed in the panel that isn’t present anywhere in the schema, either remove it or ask @YANG-DB about adding it to the schema.
+
+New visualizations should be put in the Visualization Catalog in opensearch-catalog.
+
+### Compile the integration
+
+In the future this can be automated, but generally, model it on some of the existing integrations. Put all the relevant info in the directory and write a config.
+
+### Test the integration
+
+Try using the integration in OpenSearch Dashboards with the Integrations Plugin. Make sure all the visualizations work, the logo and screenshots render correctly, no weird display bugs, and similar things. This is also a good opportunity to test the integrations plugin for bugs.
+
+A simple sanity check is to use the UI’s “try it” button. There should be no visible errors when the button is pressed and the integration’s dashboard is opened.
@@ -0,0 +1,58 @@
+This is a supplement for the  [Integration creation guide](Integration-creation.md). It is intended to provide more technical details and references, rather than detailed steps for implementing integrations.
+
+## Integration Stages
+
+For a functioning integration, there are four stages that telemetry data goes through. For the current release, we are mostly focused on stages 2-4, but in the future we can look more carefully at stage 1.
+
+* **Stage 1.** The data is generated by an application and stored in some intermediary bucket storage, such as S3, Spark, or Prometheus. There may be multiple levels of mappings here, and for some architectures we may skip directly to stage 2.
+* **Stage 2.** The data is retrieved from either intermediary storage or the application directly, and indexed in OpenSearch as OTEL-compliant records.
+* **Stage 3.** OTEL-compliant records at rest in OpenSearch.
+* **Stage 4.** OTEL-compliant records are queried from OpenSearch and displayed in OpenSearch Dashboards as integration assets.
+
+For the majority of the integration work we are doing today, we are focused on stage 2 and 3. The purpose of this document is to gather resources for helping with each.
+
+## Stage 1:  Generating Raw Data
+
+The main focus of this stage is to set up some infrastructure that generates telemetry data. There are a few different repositories that we have that demonstrate this data collection:
+
+* The [OpenTelemetry Demo](https://github.com/opensearch-project/opentelemetry-demo/tree/main) repository has an extensive docker setup with many services being ingested into OpenSearch. This can be good to look at for making integrations for more complicated architectures.
+* There is an old [Nginx Demo](https://github.com/opensearch-project/observability/tree/main/integrations/nginx/test) that involves parsing and mapping data using a basic FluentBit filter. This is a good approach if the integration is being made for a standalone application. (TODO: @Swiddis has made an updated FluentBit config that is much easier to work with, and needs to update the demo. Ping him if you need it now.)
+* For AWS applications, there is no currently working example on-hand, but a resource that we’ve found is the [SIEM on OpenSearch](https://github.com/aws-samples/siem-on-amazon-opensearch-service) demo repository. There is also a page that has a lot of [examples of AWS log data](https://docs.aws.amazon.com/waf/latest/developerguide/logging-examples.html).
+
+## Stage 2: Converting Records to OTEL
+
+As OTEL is a widespread protocol, there are many tools that can ingest records and convert them to the format. There are three that I’m aware are currently being used with the project:
+
+* [FluentBit](https://fluentbit.io/) is relatively straightforward and can convert to arbitrary data formats, but requires understanding OTEL somewhat well to use.
+* [Jaeger Tracing](https://www.jaegertracing.io/docs/1.21/opentelemetry/) has an OTEL collection mode and can export to OpenSearch, but the compatibility with OpenSearch seems slightly off regarding timestamps.
+* The [OpenTelemetry Collector](https://opentelemetry.io/docs/collector/) is the most formally correct software, but it doesn’t currently have a native OpenSearch export mode. The author hasn’t yet been able to make it run.
+
+When setting up the collector to output to OpenSearch, it is recommended to make a reproducible configuration that runs in Docker. Retrieving sample data from a successful setup will be useful for testing integrations, we recommend storing that in the integration.
+
+Within integrations, a sample of the results of this encoding stage should be stored in the `data` directory.
+
+### Further Reading
+
+* [OpenTelemetry Receivers](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main) available for many applications, under the `receivers` directory.
+* [Some examples of current ElasticSearch Integrations](https://github.com/elastic/integrations/tree/main), under the `packages` directory.
+
+## Stage 3: OTEL-Compliant Records
+
+The current primary reference for what defines an OTEL data record is the [OTEL Semantic Conventions](https://github.com/open-telemetry/semantic-conventions/tree/main) repository. This can be used to check whether data provided is in an OTEL format. We are working on creating OpenSearch Mappings for these conventions, available in the [OpenSearch Catalog](https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema). However, note that the mapping files present today might not be perfectly aligned with OTEL, due to typos or otherwise, so until automated check functionality is added some level of manual double-checking is advised.
+
+We have an Integrations CLI tool that can automatically check whether provided data records are compliant with the catalog’s mapping files. This can help a lot with debugging conversion, and finding schemas to reference in an integration.
+
+Within integrations, this stage is encoded in the `schemas` directory.
+
+### Further Reading
+
+* [Beginner’s Guide to OpenTelemetry](https://logz.io/learn/opentelemetry-guide/).
+* [OpenTelemetry: Telemetry Schemas](https://opentelemetry.io/docs/specs/otel/schemas/).
+
+## Stage 4: Displaying OTEL Records
+
+Most dashboards that are in the wild today are not automatically compliant with our OTEL format. They will have to be either converted, or remade from scratch. If possible or feasible, remaking from scratch is preferred. To help with this, we’ve started preparing the [Visualization Catalog](https://github.com/opensearch-project/opensearch-catalog/issues/33) that will contain examples of visualizations that query OTEL records. We also have some tooling via the Integrations CLI that can verify whether a visualization is already compliant with OTEL.
+
+The target workflow is to be able to let an integration developer provide sample OTEL data records, and we can suggest visualizations for a dashboard based on what fields are present in the records. For the moment, we are focusing more on just gathering visualizations, so if you do make a new visualization, please consider making a PR to add it to the catalog.
+
+Within integrations, this stage is encoded in the `assets` directory.
@@ -0,0 +1,125 @@
+# OpenSearch Integrations
+
+### ***A journey for adding the data aware assets into the user’s workspace.***
+
+The integration initiative was envisioned as a simple, elegant and powerful way to allow users to view,  query and project their data.
+
+The existing way that users could configure their dashboards, for example is to undergo the following process every time:
+
+* Explore their index content and extract the structure - mapping file
+* Assemble the index pattern based on this mapping
+* Create visualization for different parts of the mapping fields
+* Compose the entire dashboard from these visualizations
+* Save the dashboard and allow export for other users to utilize (expecting the mapping to be the same)
+
+The repeated bootstrap required for every single index is a log, error-prone and consuming process that requires some degree of knowledge both of the data and of the OpenSearch API.
+
+At OpenSearch we are supporting many different type of use cases for a large variety of users:
+- Search related domain use cases - E-commerce products search for example
+- Observability monitoring and provisioning - Trace / Metrics analytics
+- Security Monitoring and Threats Analysis
+
+All these use cases are accompanied by a very strong and matured community that has contributed many resources and Knowledge to these domain.
+
+One important concept of such contributions are the domain schema used by the data signal collected for such events (security /  observability).
+
+The [Observability OpenTelemetry](https://opentelemetry.io/) is a vibrant and productive community that is constantly developing the monitoring and tracing agenda. The [OTEL protocol](https://opentelemetry.io/docs/specs/otel/) which is a key concept of the OpenTelemetry products allows the consolidation and unification of many types of observed data signals.
+
+At OpenSearch we adopted this protocol and developed the [Simple Schema For Observability](https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability) to help manifest these concepts into a concrete index mapping catalog.
+
+Once the schema is in place, the next phase would be to select a list of products / services that are very common is the user’s topology ( databases, web-servers, containers) and to provide our opinionated monitoring dashboards on top of them.
+
+Using the well-defined Observability schema structure simplified the assumptions on the fields names and semantics conventions purposes. It allows us to build a common dashboard that reflects the behavior and provisions the different aspect of the observed resource.
+
+The next part after we build the dashboards was to actually test them with an ingestion pipeline that simulates a real ingestion flow arriving from the user’s system.
+
+We support the following pipeline ingestion:
+
+- [OpenSearch Data-Prepper](https://github.com/opensearch-project/data-prepper)
+- [OTEL collector](https://github.com/open-telemetry/opentelemetry-collector)
+- [Fluent-Bit](https://docs.fluentbit.io/manual/pipeline/outputs/opensearch)
+
+Each one of these pipelines support the OTEL schema (or directly using the simple schema ) so that they index the signal documents into the [correct index](https://github.com/opensearch-project/opensearch-catalog/blob/main/docs/schema/observability/Naming-convention.md) which represents the Observed resource signals.
+
+Once this was in-place - we continued further and generalized this concept to allow more types of “structure aware assets” that can rely on the mapping structure.
+
+The generalization framework to convey these capabilities is called **integration** and it has the following characteristics:
+
+* Name & description
+* Source url & License
+* Schema spec (mapping / components mapping)
+* Sample data (for try-out experience)
+* Assets (Dashboards, Index-Patterns , Queries, Alerts )
+
+
+[Integration-creation.md](Integration-creation.md) is a process that we are aiming to become a simple task that will allow users and organizations to organize their domain into a well structured set of integrations that reflect their use-cases and their common resources.
+
+---
+**Here are some screenshots :**
+
+The next images show the user experience for onboarding the dashboards that reflects his system’s resources using the integration dialogs:
+
+
+![integrations-observability-catalog.png](../img/integrations-observability-catalog.png)
+*Selecting the resource to integrate*
+![cloud-integrations-filter.png](../img/cloud-integrations-filter.png)
+*Filtering to a specific schema aspect (cloud based)*
+![aws_s3_integration-preview.png](../img/aws_s3_integration-preview.png)
+*Review the Aws S3 Observability Integration*
+![aws_s3_integration-details.png](../img/aws_s3_integration-details.png)
+*Trying out the integration using sample data*
+![aws_s3_integration-dashboard.png](../img/aws_s3_integration-dashboard.png)
+*Viewing the main S3 Observability dashboard*
+
+This sample shows how the user can navigate and explore the existing integrations that are bundled with OpenSearch release and select the appropriate resource that are part of its system.
+
+* * *
+
+### Planned ahead
+
+As we continue the evolution of the integration framework we are planning to extend and expand in the following directions
+
+* Adding additional assets to the integrations including
+    * Schema based Alerts
+    * Saved Search templates (a type of “prepaid statement”)
+    * Predefined metrics
+    * Datasources for connecting data-lake and external data
+    * Industry/standard SLO/SLAs for different services
+
+
+Adding additional ingestion pipelines specifications (including usage tutorial) for users to take as the basis for their Observability ingestion pipeline definition. This includes both internal ingestion pipelines such as with OpenSearch Ingestion Service, as well as external, such as fluent bit lua convertors.
+
+We are also planning of expanding the scope of the integrations to allow users to define a [“Integration-Catalog”](https://github.com/opensearch-project/dashboards-observability/issues/900). This catalog will enable users to experiment and simplify loading of a group of related integrations ( catalog can represent the services of a cloud provider for example) .
+
+It can also be used for customers to predefine in advance different organizational scopes that will take form in a set of integrations with specific dashboards that match that organization’s business flows and responsibilities.
+
+We are very much in the phase of shaping the future of this important feature and we are hoping that the community will guide us on the next steps.
+
+Please feel free to [participate and contribute](https://github.com/opensearch-project/dashboards-observability/issues/new?assignees=&labels=integration%2C+untriaged&projects=&template=integration_request.md&title=%5BIntegration%5D) so that the next integration will help your organization reach productivity faster and easier .
+
+* * *
+
+### Additional information
+
+#### Integration with OpenTelemetry
+
+OpenTelemetry is a CNCF project designed to generate, collect, and describe telemetry data from distributed systems. The Observability plugin for OpenSearch Dashboards leverages OpenTelemetry's data collection capabilities, allowing users to send data from their applications to OpenSearch.
+> For additional information on setting-up OpenTelemetry Collector with OpenSearch see the [OpenTelemetry Demo app](https://github.com/opensearch-project/opentelemetry-demo).
+
+The plugin supports the OpenTelemetry Protocol (OTLP) using the [Simple Schema for Observability ](https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability), making it compatible with a wide range of instrumentation libraries and observability tools.
+
+This allows developers to continue using their existing OpenTelemetry-based observability stack, with the added benefit of advanced analytics and visualization capabilities provided by OpenSearch Dashboards.
+> For additional information see [OpenSearch OpenTelemetry Example Architecture ](https://github.com/opensearch-project/opentelemetry-demo/blob/main/.github/architecture.md)
+
+---
+[Integration RFC](https://github.com/opensearch-project/dashboards-observability/issues/644)
+- [Integration Development Tracking](https://github.com/opensearch-project/dashboards-observability/issues/668)
+-  [Integration schema catalog](https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema)
+
+[ Observability Schema catalog](https://github.com/opensearch-project/opensearch-catalog/tree/main/docs/schema/observability)
+- [Simple Schema Semantics](https://github.com/opensearch-project/opensearch-catalog/blob/main/docs/schema/observability/Semantic-Convention.md)
+- [Simple Schema index Naming](https://github.com/opensearch-project/opensearch-catalog/blob/main/docs/schema/observability/Naming-convention.md)
+
+- [Integration-creation ](Integration-creation.md)
+- [Integration-reference.md](Integration-reference.md)
+
@@ -0,0 +1,46 @@
+{
+  "name": "apache",
+  "version": "1.0.0",
+  "displayName": "Apache Dashboard",
+  "description": "Apache web logs collector",
+  "license": "Apache-2.0",
+  "type": "logs_apache",
+  "labels": ["log", "communication", "http"],
+  "author": "OpenSearch",
+  "sourceUrl": "https://github.com/opensearch-project/dashboards-observability/tree/main/server/adaptors/integrations/__data__/repository/apache/info",
+  "statics": {
+    "logo": {
+      "annotation": "Apache Logo",
+      "path": "logo.png"
+    },
+    "gallery": [
+      {
+        "annotation": "Apache Dashboard",
+        "path": "dashboard1.png"
+      }
+    ]
+  },
+  "components": [
+    {
+      "name": "communication",
+      "version": "1.0.0"
+    },
+    {
+      "name": "http",
+      "version": "1.0.0"
+    },
+    {
+      "name": "logs_apache",
+      "version": "1.0.0"
+    }
+  ],
+  "assets": {
+    "savedObjects": {
+      "name": "apache",
+      "version": "1.0.0"
+    }
+  },
+  "sampleData": {
+    "path": "sample.json"
+  }
+}