Skip to content

Commit

Permalink
Add micro benchmarks. (opensearch-project#537)
Browse files Browse the repository at this point in the history
* Align pool_maxsize for different connection pool implementations.

Signed-off-by: dblock <dblock@amazon.com>

* Added benchmarks.

Signed-off-by: dblock <dblock@amazon.com>

* Multi-threaded vs. async benchmarks.

Signed-off-by: dblock <dblock@amazon.com>

* Set pool size to the number of threads.

Signed-off-by: dblock <dblock@amazon.com>

* Added sync/async benchmark.

Signed-off-by: dblock <dblock@amazon.com>

* Report client-side latency.

Signed-off-by: dblock <dblock@amazon.com>

* Various updates to benchmarks, demonstrating threading improves throughput.

Signed-off-by: dblock <dblock@amazon.com>

* Bench info.

Signed-off-by: dblock <dblock@amazon.com>

* Fixup format.

Signed-off-by: dblock <dblock@amazon.com>

* Undo async maxsize.

Signed-off-by: dblock <dblock@amazon.com>

* Moved benchmarks folder.

Signed-off-by: dblock <dblock@amazon.com>

* Updated documentation and project description.

Signed-off-by: dblock <dblock@amazon.com>

---------

Signed-off-by: dblock <dblock@amazon.com>
Signed-off-by: roma2023 <romasaparhan19@gmail.com>
  • Loading branch information
dblock authored and roma2023 committed Dec 28, 2023
1 parent ea219a3 commit af0ae87
Show file tree
Hide file tree
Showing 11 changed files with 1,293 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
- Added point-in-time APIs (create_pit, delete_pit, delete_all_pits, get_all_pits) and Security Client APIs (health and update_audit_configuration) ([#502](https://github.com/opensearch-project/opensearch-py/pull/502))
- Added new guide for using index templates with the client ([#531](https://github.com/opensearch-project/opensearch-py/pull/531))
- Added `pool_maxsize` for `Urllib3HttpConnection` ([#535](https://github.com/opensearch-project/opensearch-py/pull/535))
- Added benchmarks ([#537](https://github.com/opensearch-project/opensearch-py/pull/537))
### Changed
- Generate `tasks` client from API specs ([#508](https://github.com/opensearch-project/opensearch-py/pull/508))
- Generate `ingest` client from API specs ([#513](https://github.com/opensearch-project/opensearch-py/pull/513))
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ For more information, see [opensearch.org](https://opensearch.org/) and the [API

## User Guide

To get started with the OpenSearch Python Client, see [User Guide](https://github.com/opensearch-project/opensearch-py/blob/main/USER_GUIDE.md).
To get started with the OpenSearch Python Client, see [User Guide](https://github.com/opensearch-project/opensearch-py/blob/main/USER_GUIDE.md). This repository also contains [working samples](https://github.com/opensearch-project/opensearch-py/tree/main/samples) and [benchmarks](https://github.com/opensearch-project/opensearch-py/tree/main/benchmarks).

## Compatibility with OpenSearch

Expand Down
63 changes: 63 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
- [Benchmarks](#benchmarks)
- [Start OpenSearch](#start-opensearch)
- [Install Prerequisites](#install-prerequisites)
- [Run Benchmarks](#run-benchmarks)

## Benchmarks

Python client benchmarks using [richbench](https://github.com/tonybaloney/rich-bench).

### Start OpenSearch

```
docker run -p 9200:9200 -e "discovery.type=single-node" opensearchproject/opensearch:latest
```

### Install Prerequisites

Install [poetry](https://python-poetry.org/docs/), then install package dependencies.

```
poetry install
```

Benchmarks use the code in this repository by specifying the dependency as `opensearch-py = { path = "..", develop=true, extras=["async"] }` in [pyproject.toml](pyproject.toml).

### Run Benchmarks

Run all benchmarks available as follows.

```
poetry run richbench . --repeat 1 --times 1
```

Outputs results from all the runs.

```
Benchmarks, repeat=1, number=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Benchmark ┃ Min ┃ Max ┃ Mean ┃ Min (+) ┃ Max (+) ┃ Mean (+) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 1 client vs. more clients (async) │ 1.640 │ 1.640 │ 1.640 │ 1.102 (1.5x) │ 1.102 (1.5x) │ 1.102 (1.5x) │
│ 1 thread vs. 32 threads (sync) │ 5.526 │ 5.526 │ 5.526 │ 1.626 (3.4x) │ 1.626 (3.4x) │ 1.626 (3.4x) │
│ 1 thread vs. 32 threads (sync) │ 4.639 │ 4.639 │ 4.639 │ 3.363 (1.4x) │ 3.363 (1.4x) │ 3.363 (1.4x) │
│ sync vs. async (8) │ 3.198 │ 3.198 │ 3.198 │ 0.966 (3.3x) │ 0.966 (3.3x) │ 0.966 (3.3x) │
└───────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
```

Run a specific benchmark, e.g. [bench_sync.py](bench_sync.py) by specifying `--benchmark [name]`.

```
poetry run richbench . --repeat 1 --times 1 --benchmark sync
```

Outputs results from one benchmark.

```
Benchmarks, repeat=1, number=1
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Benchmark ┃ Min ┃ Max ┃ Mean ┃ Min (+) ┃ Max (+) ┃ Mean (+) ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ 1 thread vs. 32 threads (sync) │ 6.804 │ 6.804 │ 6.804 │ 3.409 (2.0x) │ 3.409 (2.0x) │ 3.409 (2.0x) │
└────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
```
101 changes: 101 additions & 0 deletions benchmarks/bench_async.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
#!/usr/bin/env python

# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.

import asyncio
import uuid

from opensearchpy import AsyncHttpConnection, AsyncOpenSearch

host = "localhost"
port = 9200
auth = ("admin", "admin")
index_name = "test-index-async"
item_count = 100


async def index_records(client, item_count):
await asyncio.gather(
*[
client.index(
index=index_name,
body={
"title": f"Moneyball",
"director": "Bennett Miller",
"year": "2011",
},
id=uuid.uuid4(),
)
for j in range(item_count)
]
)


async def test_async(client_count=1, item_count=1):
clients = []
for i in range(client_count):
clients.append(
AsyncOpenSearch(
hosts=[{"host": host, "port": port}],
http_auth=auth,
use_ssl=True,
verify_certs=False,
ssl_show_warn=False,
connection_class=AsyncHttpConnection,
pool_maxsize=client_count,
)
)

if await clients[0].indices.exists(index_name):
await clients[0].indices.delete(index_name)

await clients[0].indices.create(index_name)

await asyncio.gather(
*[index_records(clients[i], item_count) for i in range(client_count)]
)

await clients[0].indices.refresh(index=index_name)
print(await clients[0].count(index=index_name))

await clients[0].indices.delete(index_name)

await asyncio.gather(*[client.close() for client in clients])


def test(item_count=1, client_count=1):
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(test_async(item_count, client_count))
loop.close()


def test_1():
test(1, 32 * item_count)


def test_2():
test(2, 16 * item_count)


def test_4():
test(4, 8 * item_count)


def test_8():
test(8, 4 * item_count)


def test_16():
test(16, 2 * item_count)


def test_32():
test(32, item_count)


__benchmarks__ = [(test_1, test_8, "1 client vs. more clients (async)")]
93 changes: 93 additions & 0 deletions benchmarks/bench_info_sync.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#!/usr/bin/env python

# SPDX-License-Identifier: Apache-2.0
#
# The OpenSearch Contributors require contributions made to
# this file be licensed under the Apache-2.0 license or a
# compatible open source license.

import logging
import sys
import time

from thread_with_return_value import ThreadWithReturnValue

from opensearchpy import OpenSearch

host = "localhost"
port = 9200
auth = ("admin", "admin")
request_count = 250


root = logging.getLogger()
# root.setLevel(logging.DEBUG)
# logging.getLogger("urllib3.connectionpool").setLevel(logging.DEBUG)

handler = logging.StreamHandler(sys.stdout)
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
handler.setFormatter(formatter)
root.addHandler(handler)


def get_info(client, request_count):
tt = 0
for n in range(request_count):
start = time.time() * 1000
rc = client.info()
total_time = time.time() * 1000 - start
tt += total_time
return tt


def test(thread_count=1, request_count=1, client_count=1):
clients = []
for i in range(client_count):
clients.append(
OpenSearch(
hosts=[{"host": host, "port": port}],
http_auth=auth,
use_ssl=True,
verify_certs=False,
ssl_show_warn=False,
pool_maxsize=thread_count,
)
)

threads = []
for thread_id in range(thread_count):
thread = ThreadWithReturnValue(
target=get_info, args=[clients[thread_id % len(clients)], request_count]
)
threads.append(thread)
thread.start()

latency = 0
for t in threads:
latency += t.join()

print(f"latency={latency}")


def test_1():
test(1, 32 * request_count, 1)


def test_2():
test(2, 16 * request_count, 2)


def test_4():
test(4, 8 * request_count, 3)


def test_8():
test(8, 4 * request_count, 8)


def test_32():
test(32, request_count, 32)


__benchmarks__ = [(test_1, test_32, "1 thread vs. 32 threads (sync)")]
Loading

0 comments on commit af0ae87

Please sign in to comment.