Add micro benchmarks. (opensearch-project#537)

* Align pool_maxsize for different connection pool implementations. Signed-off-by: dblock <dblock@amazon.com> * Added benchmarks. Signed-off-by: dblock <dblock@amazon.com> * Multi-threaded vs. async benchmarks. Signed-off-by: dblock <dblock@amazon.com> * Set pool size to the number of threads. Signed-off-by: dblock <dblock@amazon.com> * Added sync/async benchmark. Signed-off-by: dblock <dblock@amazon.com> * Report client-side latency. Signed-off-by: dblock <dblock@amazon.com> * Various updates to benchmarks, demonstrating threading improves throughput. Signed-off-by: dblock <dblock@amazon.com> * Bench info. Signed-off-by: dblock <dblock@amazon.com> * Fixup format. Signed-off-by: dblock <dblock@amazon.com> * Undo async maxsize. Signed-off-by: dblock <dblock@amazon.com> * Moved benchmarks folder. Signed-off-by: dblock <dblock@amazon.com> * Updated documentation and project description. Signed-off-by: dblock <dblock@amazon.com> --------- Signed-off-by: dblock <dblock@amazon.com> Signed-off-by: roma2023 <romasaparhan19@gmail.com>
roma2023 · Dec 28, 2023 · af0ae87 · af0ae87
1 parent ea219a3
commit af0ae87
Show file tree

Hide file tree

Showing 11 changed files with 1,293 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 - Added point-in-time APIs (create_pit, delete_pit, delete_all_pits, get_all_pits) and Security Client APIs (health and update_audit_configuration) ([#502](https://github.com/opensearch-project/opensearch-py/pull/502))
 - Added new guide for using index templates with the client ([#531](https://github.com/opensearch-project/opensearch-py/pull/531))
 - Added `pool_maxsize` for `Urllib3HttpConnection` ([#535](https://github.com/opensearch-project/opensearch-py/pull/535))
+- Added benchmarks ([#537](https://github.com/opensearch-project/opensearch-py/pull/537))
 ### Changed
 - Generate `tasks` client from API specs ([#508](https://github.com/opensearch-project/opensearch-py/pull/508))
 - Generate `ingest` client from API specs ([#513](https://github.com/opensearch-project/opensearch-py/pull/513))

diff --git a/README.md b/README.md
@@ -26,7 +26,7 @@ For more information, see [opensearch.org](https://opensearch.org/) and the [API
 
 ## User Guide
 
-To get started with the OpenSearch Python Client, see [User Guide](https://github.com/opensearch-project/opensearch-py/blob/main/USER_GUIDE.md).
+To get started with the OpenSearch Python Client, see [User Guide](https://github.com/opensearch-project/opensearch-py/blob/main/USER_GUIDE.md). This repository also contains [working samples](https://github.com/opensearch-project/opensearch-py/tree/main/samples) and [benchmarks](https://github.com/opensearch-project/opensearch-py/tree/main/benchmarks).
 
 ## Compatibility with OpenSearch
 

diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,63 @@
+- [Benchmarks](#benchmarks)
+  - [Start OpenSearch](#start-opensearch)
+  - [Install Prerequisites](#install-prerequisites)
+  - [Run Benchmarks](#run-benchmarks)
+
+## Benchmarks
+
+Python client benchmarks using [richbench](https://github.com/tonybaloney/rich-bench).
+
+### Start OpenSearch
+
+```
+docker run -p 9200:9200 -e "discovery.type=single-node" opensearchproject/opensearch:latest
+```
+
+### Install Prerequisites
+
+Install [poetry](https://python-poetry.org/docs/), then install package dependencies.
+
+```
+poetry install
+```
+
+Benchmarks use the code in this repository by specifying the dependency as `opensearch-py = { path = "..", develop=true, extras=["async"] }` in [pyproject.toml](pyproject.toml).
+
+### Run Benchmarks
+
+Run all benchmarks available as follows.
+
+```
+poetry run richbench . --repeat 1 --times 1
+```
+
+Outputs results from all the runs.
+
+```
+                                             Benchmarks, repeat=1, number=1                                              
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
+┃                         Benchmark ┃ Min     ┃ Max     ┃ Mean    ┃ Min (+)         ┃ Max (+)         ┃ Mean (+)        ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
+│ 1 client vs. more clients (async) │ 1.640   │ 1.640   │ 1.640   │ 1.102 (1.5x)    │ 1.102 (1.5x)    │ 1.102 (1.5x)    │
+│    1 thread vs. 32 threads (sync) │ 5.526   │ 5.526   │ 5.526   │ 1.626 (3.4x)    │ 1.626 (3.4x)    │ 1.626 (3.4x)    │
+│    1 thread vs. 32 threads (sync) │ 4.639   │ 4.639   │ 4.639   │ 3.363 (1.4x)    │ 3.363 (1.4x)    │ 3.363 (1.4x)    │
+│                sync vs. async (8) │ 3.198   │ 3.198   │ 3.198   │ 0.966 (3.3x)    │ 0.966 (3.3x)    │ 0.966 (3.3x)    │
+└───────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
+```
+
+Run a specific benchmark, e.g. [bench_sync.py](bench_sync.py) by specifying `--benchmark [name]`.
+
+```
+poetry run richbench . --repeat 1 --times 1 --benchmark sync
+```
+
+Outputs results from one benchmark.
+
+```
+                                            Benchmarks, repeat=1, number=1                                            
+┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
+┃                      Benchmark ┃ Min     ┃ Max     ┃ Mean    ┃ Min (+)         ┃ Max (+)         ┃ Mean (+)        ┃
+┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
+│ 1 thread vs. 32 threads (sync) │ 6.804   │ 6.804   │ 6.804   │ 3.409 (2.0x)    │ 3.409 (2.0x)    │ 3.409 (2.0x)    │
+└────────────────────────────────┴─────────┴─────────┴─────────┴─────────────────┴─────────────────┴─────────────────┘
+```
diff --git a/benchmarks/bench_async.py b/benchmarks/bench_async.py
@@ -0,0 +1,101 @@
+#!/usr/bin/env python
+
+# SPDX-License-Identifier: Apache-2.0
+#
+# The OpenSearch Contributors require contributions made to
+# this file be licensed under the Apache-2.0 license or a
+# compatible open source license.
+
+import asyncio
+import uuid
+
+from opensearchpy import AsyncHttpConnection, AsyncOpenSearch
+
+host = "localhost"
+port = 9200
+auth = ("admin", "admin")
+index_name = "test-index-async"
+item_count = 100
+
+
+async def index_records(client, item_count):
+    await asyncio.gather(
+        *[
+            client.index(
+                index=index_name,
+                body={
+                    "title": f"Moneyball",
+                    "director": "Bennett Miller",
+                    "year": "2011",
+                },
+                id=uuid.uuid4(),
+            )
+            for j in range(item_count)
+        ]
+    )
+
+
+async def test_async(client_count=1, item_count=1):
+    clients = []
+    for i in range(client_count):
+        clients.append(
+            AsyncOpenSearch(
+                hosts=[{"host": host, "port": port}],
+                http_auth=auth,
+                use_ssl=True,
+                verify_certs=False,
+                ssl_show_warn=False,
+                connection_class=AsyncHttpConnection,
+                pool_maxsize=client_count,
+            )
+        )
+
+    if await clients[0].indices.exists(index_name):
+        await clients[0].indices.delete(index_name)
+
+    await clients[0].indices.create(index_name)
+
+    await asyncio.gather(
+        *[index_records(clients[i], item_count) for i in range(client_count)]
+    )
+
+    await clients[0].indices.refresh(index=index_name)
+    print(await clients[0].count(index=index_name))
+
+    await clients[0].indices.delete(index_name)
+
+    await asyncio.gather(*[client.close() for client in clients])
+
+
+def test(item_count=1, client_count=1):
+    loop = asyncio.new_event_loop()
+    asyncio.set_event_loop(loop)
+    loop.run_until_complete(test_async(item_count, client_count))
+    loop.close()
+
+
+def test_1():
+    test(1, 32 * item_count)
+
+
+def test_2():
+    test(2, 16 * item_count)
+
+
+def test_4():
+    test(4, 8 * item_count)
+
+
+def test_8():
+    test(8, 4 * item_count)
+
+
+def test_16():
+    test(16, 2 * item_count)
+
+
+def test_32():
+    test(32, item_count)
+
+
+__benchmarks__ = [(test_1, test_8, "1 client vs. more clients (async)")]
diff --git a/benchmarks/bench_info_sync.py b/benchmarks/bench_info_sync.py
@@ -0,0 +1,93 @@
+#!/usr/bin/env python
+
+# SPDX-License-Identifier: Apache-2.0
+#
+# The OpenSearch Contributors require contributions made to
+# this file be licensed under the Apache-2.0 license or a
+# compatible open source license.
+
+import logging
+import sys
+import time
+
+from thread_with_return_value import ThreadWithReturnValue
+
+from opensearchpy import OpenSearch
+
+host = "localhost"
+port = 9200
+auth = ("admin", "admin")
+request_count = 250
+
+
+root = logging.getLogger()
+# root.setLevel(logging.DEBUG)
+# logging.getLogger("urllib3.connectionpool").setLevel(logging.DEBUG)
+
+handler = logging.StreamHandler(sys.stdout)
+handler.setLevel(logging.DEBUG)
+formatter = logging.Formatter("%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+handler.setFormatter(formatter)
+root.addHandler(handler)
+
+
+def get_info(client, request_count):
+    tt = 0
+    for n in range(request_count):
+        start = time.time() * 1000
+        rc = client.info()
+        total_time = time.time() * 1000 - start
+        tt += total_time
+    return tt
+
+
+def test(thread_count=1, request_count=1, client_count=1):
+    clients = []
+    for i in range(client_count):
+        clients.append(
+            OpenSearch(
+                hosts=[{"host": host, "port": port}],
+                http_auth=auth,
+                use_ssl=True,
+                verify_certs=False,
+                ssl_show_warn=False,
+                pool_maxsize=thread_count,
+            )
+        )
+
+    threads = []
+    for thread_id in range(thread_count):
+        thread = ThreadWithReturnValue(
+            target=get_info, args=[clients[thread_id % len(clients)], request_count]
+        )
+        threads.append(thread)
+        thread.start()
+
+    latency = 0
+    for t in threads:
+        latency += t.join()
+
+    print(f"latency={latency}")
+
+
+def test_1():
+    test(1, 32 * request_count, 1)
+
+
+def test_2():
+    test(2, 16 * request_count, 2)
+
+
+def test_4():
+    test(4, 8 * request_count, 3)
+
+
+def test_8():
+    test(8, 4 * request_count, 8)
+
+
+def test_32():
+    test(32, request_count, 32)
+
+
+__benchmarks__ = [(test_1, test_32, "1 thread vs. 32 threads (sync)")]