Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GUIDE] Add bulk guide #292

Merged
merged 5 commits into from
Apr 14, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Adds dynamic type to \_source field ([#158](https://github.com/opensearch-project/opensearch-go/issues/158))
- Adds testcases for Document API ([#280](https://github.com/opensearch-project/opensearch-go/issues/280))
- Adds `index_lifecycle` guide ([#287](https://github.com/opensearch-project/opensearch-go/pull/287))
- Adds `bulk` guide ([#292](https://github.com/opensearch-project/opensearch-go/pull/292))

### Changed

Expand Down
204 changes: 204 additions & 0 deletions guides/bulk.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# Bulk

In this guide, you'll learn how to use the OpenSearch Golang Client API to perform bulk operations. You'll learn how to index, update, and delete multiple documents in a single request.

## Setup

First, create a client instance with the following code:

```go
package main

import (
"github.com/opensearch-project/opensearch-go/v2"
"log"
)

func main() {
client, err := opensearch.NewDefaultClient()
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we bailing here in samples, e.g. log.Fatalf?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was guided by the rules from issue, there was written not to throw errors, or did I misunderstand?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I don't know what's best.

What's the user experience when NewDefaultClient() returns an error with the code as written?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the console there will be a reason why the client was not created and the program will complete successfully, I mean there will be no fatal termination. The user, having read the reason, understands that he did something wrong, and decides the origin reason and tries again.

Copy link
Member

@dblock dblock Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this example the code doesn't do anything else, however in the example below:

movies := "movies"
books := "books"
createMovieIndex, err := client.Indices.Create(movies)
if err != nil {
    log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", createMovieIndex)
createBooksIndex, err := client.Indices.Create(books)
if err != nil {
    log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", createBooksIndex)

You get an error the first time on client.Indices.Create(movies), then you get an error again the second time on client.Indices.Create(books). I believe we shouldn't be bailing after the first error, no? Furthermore, there should be a way to auto-cleanup whatever was successfully created even if it was only the first and not the second index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the first one, but again I will refer to the guide ... It's not difficult for me to change it for a fatal mistake. At the expense of the second, at the end of each guide there are cleaning topics, and the previously created modules were cleaned there

}
log.Printf("response: [%+v]", client)
}
```

Next, create an index named `movies` and another named `books` with the default settings:

```go
movies := "movies"
books := "books"

createMovieIndex, err := client.Indices.Create(movies)
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", createMovieIndex)

createBooksIndex, err := client.Indices.Create(books)
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", createBooksIndex)
```

## Bulk API

The `bulk` API action allows you to perform document operations in a single request. The body of the request is an array of objects that contains the bulk operations and the target documents to index, create, update, or delete.

### Indexing multiple documents

The following code creates two documents in the `movies` index and one document in the `books` index:

```go
res, err := client.Bulk(strings.NewReader(`{ "index": { "_index": "movies", "_id": 1 } }
{ "title": "Beauty and the Beast", "year": 1991 }
{ "index": { "_index": "movies", "_id": 2 } }
{ "title": "Beauty and the Beast - Live Action", "year": 2017 }
{ "index": { "_index": "books", "_id": 1 } }
{ "title": "The Lion King", "year": 1994 }
`))
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", res)
```

### Creating multiple documents

Similarly, instead of calling the `create` method for each document, you can use the `bulk` API to create multiple documents in a single request. The following code creates three documents in the `movies` index and one in the `books` index:

```go
res, err = client.Bulk(strings.NewReader(`{ "create": { "_index": "movies" } }
{ "title": "Beauty and the Beast 2", "year": 2030 }
{ "create": { "_index": "movies", "_id": 1 } }
{ "title": "Beauty and the Beast 3", "year": 2031 }
{ "create": { "_index": "movies", "_id": 2 } }
{ "title": "Beauty and the Beast 4", "year": 2049 }
{ "create": { "_index": "books" } }
{ "title": "The Lion King 2", "year": 1998 }
`))
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", res)
```

We omit the `_id` for each document and let OpenSearch generate them for us in this example, just like we can with the `create` method.

### Updating multiple documents

```go
res, err = client.Bulk(strings.NewReader(`{ "update": { "_index": "movies", "_id": 1 } }
{ "doc": { "year": 1992 } }
{ "update": { "_index": "movies", "_id": 1 } }
{ "doc": { "year": 2018 } }
`))
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", res)
```

Note that the updated data is specified in the `doc` with a full or partial JSON document, depending on how much of the document you want to update.

### Deleting multiple documents

If the document doesn’t exist, OpenSearch doesn’t return an error, but instead returns not_found under result. Delete actions don’t require documents on the next line

```go
res, err = client.Bulk(strings.NewReader(`{ "delete": { "_index": "movies", "_id": 1 } }
{ "delete": { "_index": "movies", "_id": 2 } }
`))
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", res)
```

### Mix and match operations

You can mix and match the different operations in a single request. The following code creates two documents, updates one document, and deletes another document:

```go
res, err = client.Bulk(strings.NewReader(`{ "create": { "_index": "movies", "_id": 3 } }
{ "title": "Beauty and the Beast 5", "year": 2050 }
{ "create": { "_index": "movies", "_id": 4 } }
{ "title": "Beauty and the Beast 6", "year": 2051 }
{ "update": { "_index": "movies", "_id": 3 } }
{ "doc": { "year": 2052 } }
{ "delete": { "_index": "movies", "_id": 4 } }
`))
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", res)
```

### Handling errors

The `bulk` API returns an array of responses for each operation in the request body. Each response contains a `status` field that indicates whether the operation was successful or not. If the operation was successful, the `status` field is set to a `2xx` code. Otherwise, the response contains an error message in the `error` field.

The following code shows how to look for errors in the response:

```go
type Response struct {
Took int `json:"took"`
Errors bool `json:"errors"`
Items []struct {
Delete struct {
Index string `json:"_index"`
Id string `json:"_id"`
Version int `json:"_version"`
Result string `json:"result"`
Shards struct {
Total int `json:"total"`
Successful int `json:"successful"`
Failed int `json:"failed"`
} `json:"_shards"`
SeqNo int `json:"_seq_no"`
PrimaryTerm int `json:"_primary_term"`
Status int `json:"status"`
} `json:"delete,omitempty"`
} `json:"items"`
}

res, err = client.Bulk(strings.NewReader(`{ "delete": { "_index": "movies", "_id": 10 } }
`))
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}

body, err := io.ReadAll(res.Body)
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}

var response Response
if err := json.Unmarshal(body, &response); err != nil {
log.Printf("error occurred: [%s]", err.Error())
}

for _, item := range response.Items {
if item.Delete.Status > 299 {
log.Printf("error occurred: [%s]", item.Delete.Result)
} else {
log.Printf("success: [%s]", item.Delete.Result)
}
}
```

## Cleanup

To clean up the resources created in this guide, delete the `movies` and `books` indices:

```go
deleteIndexes, err := client.Indices.Delete(
[]string{movies, books},
client.Indices.Delete.WithIgnoreUnavailable(true),
)
if err != nil {
log.Printf("error occurred: [%s]", err.Error())
}
log.Printf("response: [%+v]", deleteIndexes)
```