Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOC] Index APIs and Datastream Limitations #2811

Open
1 of 4 tasks
jaredbarranco opened this issue Feb 10, 2023 · 3 comments
Open
1 of 4 tasks

[DOC] Index APIs and Datastream Limitations #2811

jaredbarranco opened this issue Feb 10, 2023 · 3 comments
Assignees
Labels
1 - Backlog Issue: The issue is unassigned or assigned but not started

Comments

@jaredbarranco
Copy link

jaredbarranco commented Feb 10, 2023

What do you want to do?

  • Request a change to existing documentation
  • Add new documentation
  • Report a technical problem with the documentation
  • Other

Tell us about your request. Provide a summary of the request and all versions that are affected.
Our application stack collects log streams from several sources using Apache Kafka, then an AWS Lambda worker POSTs _bulk payloads against the OS API. One of our sources switched to using a datastream index and we immediately saw errors with the _bulk request.

The error message tells us that datastreams don't allow INDEX ops and require CREATE ops. The datastream documentation don't mention this at all but instead indicate: "To ingest data into a data stream, you can use the regular indexing APIs."

This issue is to add explicit documentation of the API limitations when using datastreams. Either on the _bulk API documentation or in the datastreams documentation.

What other resources are available? Provide links to related issues, POCs, steps for testing, etc.

PUT _index_template/ds-template
{
    "index_patterns": [
        "my-logs*"
    ],
    "data_stream": {
        "timestamp_field": {
            "name": "timestamp"
        }
    },
    "priority": 1,
    "template": {
        "settings": {
            "number_of_shards": 2,
            "number_of_replicas": 1
        },
        "mappings": {
            "properties": {
                "message" : {
                    "type": "text"
                },
                "timestamp": {
                    "type": "date",
                    "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd'T'HH:mm:ss.SSS'Z'||yyyy-MM-dd'T'HH:mm:ss.SSS'T'||yyyy-MM-dd||epoch_millis"
                }
            }
        }
    }
}

Then PUT a bulk payload to an index the pattern will match to

POST _bulk
{ "create": { "_index": "my-logs" } }
{ "message": "This is a log to test datastream creation via bulk API", "timestamp": "2023-02-09T23:04:08.162Z" }

Result:

{
  "took" : 130,
  "errors" : true,
  "items" : [
    {
      "index" : {
        "_index" : "my-logs",
        "_type" : "_doc",
        "_id" : null,
        "status" : 400,
        "error" : {
          "type" : "illegal_argument_exception",
          "reason" : "only write ops with an op_type of create are allowed in data streams"
        }
      }
    }
  ]
}

There is no current documentation on the Opensearch website I can find that tells us that Datastreams can only accept CREATE ops.

@FrcMoya
Copy link
Contributor

FrcMoya commented Feb 16, 2023

I had the same problem trying to make a Lambda function that collects all documents and does a _bulk operation. I found the problem in the Elasticsearch documentation.

Also, in my opinion the data stream documentation in OpenSearch is quite poor (there is no clear explanation about how the data stream works or that you need to apply an ism policy before creating the data stream). It would be nice to improve all this.

@hdhalter
Copy link
Contributor

HI @bowenlan-amzn, can you help us update the documentation for this issue?

@bowenlan-amzn
Copy link
Member

Not sure about this question. Need the data stream expert to help. @ketanv3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1 - Backlog Issue: The issue is unassigned or assigned but not started
Projects
None yet
Development

No branches or pull requests

6 participants