Add stress testing framework, with basic metrics example to demonstrate. by lalitb · Pull Request #3241 · open-telemetry/opentelemetry-cpp

lalitb · 2025-01-10T19:30:20Z

Changes

This PR adds a basic stress testing framework to validate the scalability and reliability of the functionality under high-concurrency and long-running workloads. Unlike Google Benchmark, which focuses on micro-benchmarking and latency measurements for isolated operations, this framework tries to simulate sustained, multi-threaded workloads to test a given workload. The idea is to complement the existing benchmarks by adding stress-tests to addressing long-duration and high-concurrency use-cases.

This is already implemented for .Net and Rust, and most of the ideas are taken from there. I felt the need for this to test some optimizations I am doing for metrics, but feel to comment if this doesn't seem helpful.

Also added a basic stress-testing example for metrics to demonstrate. Below are the results from the metrics stress test as an example:

$ ./stress_metrics
Starting stress test with 16 threads...
Throughput: 5009490 it/s | Avg: 4885764 | Min: 4734280 | Max: 5132395
 
Test completed:
Total iterations: 203373637
Duration: 42 seconds
Average throughput: 4885764 iterations/sec
$

It’s still in the early stages and will need further enhancements but should be a good starting point. Future improvements could include adding memory and CPU usage information alongside the existing throughput, as well as refining the initial warm-up period to sustain consistent data collection.

Implementation Details:

Worker Threads:
- The worker threads (default to number of cores) are spawned to execute the workload.
- Each worker thread executes the workload function (func) in a loop until a global STOP flag is set. (ctrl-c)
- Each thread maintains its own iteration count to minimize contention.

Throughput Monitoring:
- A separate controller thread monitors throughput by periodically summing up iteration counts across threads.
- Throughput is calculated over a sliding window (SLIDING_WINDOW_SIZE) and displayed dynamically.

Final Summary:
- At the end of the test, the program calculates and prints the total iterations, duration, and average throughput.

For significant contributions please make sure you have completed the following items:

CHANGELOG.md updated for non-trivial changes
Unit tests have been added
Changes in public API reviewed

[pull] main from open-telemetry:main

netlify · 2025-01-10T19:30:37Z

✅ Deploy Preview for opentelemetry-cpp-api-docs canceled.

Name	Link
🔨 Latest commit	`4bfadb5`
🔍 Latest deploy log	https://app.netlify.com/projects/opentelemetry-cpp-api-docs/deploys/6864c6a5fc903800086d096c

codecov · 2025-01-13T05:17:28Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.03%. Comparing base (cbfbb02) to head (17fcc54).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3241      +/-   ##
==========================================
- Coverage   90.06%   90.03%   -0.02%     
==========================================
  Files         220      220              
  Lines        7069     7069              
==========================================
- Hits         6366     6364       -2     
- Misses        703      705       +2

see 2 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…o stress-test

Copilot

Pull Request Overview

This PR introduces a multi-threaded stress testing framework and provides a basic OpenTelemetry metrics example to demonstrate its usage under high-concurrency workloads.

Adds a reusable C++ stress test library with throughput monitoring
Implements a metrics stress test example leveraging OpenTelemetry
Integrates the stress tests into the CMake build

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
stress/common/stress.h	Declares the `Stress` class and supporting data structures
stress/common/stress.cc	Implements thread spawning, monitoring, and graceful shutdown
stress/metrics/metrics.cc	Adds a sample metrics stress test using the new framework
stress/common/CMakeLists.txt	Builds `stress` as a static library
stress/metrics/CMakeLists.txt	Builds `stress_metrics` executable and links necessary targets
CMakeLists.txt	Hooks the `stress` directory into the main project build

Comments suppressed due to low confidence (1)

stress/common/stress.h:72

The new Stress framework lacks associated unit tests to validate its behavior. Consider adding tests to cover key functionality.

class Stress

Copilot · 2025-07-02T05:43:33Z

stress/common/stress.h

+  std::vector<std::thread> threads_;   // Vector to hold worker threads
+  std::vector<WorkerStats> stats_;     // Vector to hold statistics for each thread
+  const size_t numThreads_;            // Number of threads to run
+  std::atomic<bool> stopFlag_{false};  // signal to stop the test


The member variable stopFlag_ is never used, as the global STOP flag is used instead. Remove stopFlag_ or integrate it into the control flow.

Suggested change

std::atomic<bool> stopFlag_{false}; // signal to stop the test

// Removed unused stopFlag_ member variable

Copilot · 2025-07-02T05:43:34Z

stress/common/stress.cc

+// Global flags
+std::atomic<bool> STOP(
+    false);  // Global flag to stop the stress test when signaled (e.g., via Ctrl+C)
+std::atomic<bool> READY(false);  // Global flag to synchronize thread start


The READY flag is declared but never used. Either remove it or use it to synchronize thread start.

Copilot · 2025-07-02T05:43:34Z

stress/common/stress.cc

+void Stress::monitorThroughput()
+{
+  uint64_t lastTotalCount = 0;
+  auto lastTime           = std::chrono::steady_clock::now();


throughputHistory grows without bound in long-running tests, potentially causing high memory usage. Consider capping its size or computing rolling statistics without storing all entries.

Suggested change

auto lastTime = std::chrono::steady_clock::now();

auto lastTime = std::chrono::steady_clock::now();

const size_t MAX_HISTORY_SIZE = 100; // Maximum number of entries in throughputHistory

Copilot · 2025-07-02T05:43:34Z

stress/metrics/metrics.cc

+  return attributes_set;
+}
+
+void InitMetrics(const std::string /*&name*/)


The parameter 'name' is unused in InitMetrics. Either remove it or use it to parameterize the meter provider.

Suggested change

void InitMetrics(const std::string /*&name*/)

void InitMetrics(const std::string &name)

Copilot · 2025-07-02T05:43:34Z

stress/metrics/metrics.cc

+{
+  std::srand(static_cast<unsigned int>(std::time(nullptr)));  // Seed the random number generator
+  // Pre-generate a set of random attributes
+  size_t attribute_count = 1000;  // Number of attribute sets to pre-generate


[nitpick] Magic number 1000 used for attribute_count; consider making it a configurable constant or command-line parameter.

Suggested change

size_t attribute_count = 1000; // Number of attribute sets to pre-generate

size_t attribute_count = kDefaultAttributeCount; // Number of attribute sets to pre-generate

Reneg973 · 2025-10-29T05:58:47Z

stress/common/stress.cc

+      uint64_t throughput = currentCount / elapsed;
+      throughputHistory.push_back(throughput);
+
+      double avg   = 0;


avg can be of type uint64_t

Reneg973 · 2025-10-29T06:01:26Z

stress/common/stress.cc

+{
+  uint64_t lastTotalCount = 0;
+  auto lastTime           = std::chrono::steady_clock::now();
+  std::vector<uint64_t> throughputHistory;


seems not required, a history_counter is enough

Reneg973 · 2025-10-29T06:05:33Z

stress/common/stress.cc

+    std::this_thread::sleep_for(std::chrono::seconds(SLIDING_WINDOW_SIZE));
+
+    auto currentTime = std::chrono::steady_clock::now();
+    auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(currentTime - lastTime).count();


seconds could be a bit inaccurate

Reneg973 · 2025-10-29T06:12:54Z

stress/common/stress.cc

+  CPU_SET(threadIndex % std::thread::hardware_concurrency(), &cpuset);
+  pthread_setaffinity_np(pthread_self(), sizeof(cpuset), &cpuset);
+#endif
+


is the READY flag meant to start workers at the same time? Or is it superfluous as copilot mentioned?

lalitb and others added 5 commits December 18, 2024 11:04

Merge pull request #308 from open-telemetry/main

2564cc6

[pull] main from open-telemetry:main

Merge branch 'main' of github.com:lalitb/opentelemetry-cpp into main

9433197

initial commit

11bd32c

formar

22a178b

add docs

449f360

lalitb requested a review from a team as a code owner January 10, 2025 19:30

lalitb marked this pull request as draft January 10, 2025 19:30

lalitb added 2 commits January 11, 2025 01:00

Merge branch 'main' into stress-test

a385503

remove extra endif

f9b0814

lalitb and others added 9 commits January 12, 2025 21:34

maintainer mode build

5f9d0da

fix copyright

32d06ff

copyright

57d99c3

fix format

5a222fd

Add copyright and license information

03ffa54

fix msvc error

ab07553

Merge branch 'stress-test' of github.com:lalitb/opentelemetry-cpp int…

a2f17b1

…o stress-test

add newline

b56f996

Merge branch 'main' into stress-test

eead3a0

github-actions bot added Stale and removed Stale labels Mar 15, 2025

Merge branch 'main' into stress-test

4bfadb5

lalitb requested a review from Copilot July 2, 2025 05:41

Copilot AI reviewed Jul 2, 2025

View reviewed changes

Merge branch 'main' into stress-test

17fcc54

Reneg973 reviewed Oct 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stress testing framework, with basic metrics example to demonstrate.#3241

Add stress testing framework, with basic metrics example to demonstrate.#3241
lalitb wants to merge 18 commits intoopen-telemetry:mainfrom
lalitb:stress-test

lalitb commented Jan 10, 2025

Uh oh!

netlify bot commented Jan 10, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jan 13, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jul 2, 2025

Uh oh!

Copilot AI Jul 2, 2025

Uh oh!

Copilot AI Jul 2, 2025

Uh oh!

Copilot AI Jul 2, 2025

Uh oh!

Copilot AI Jul 2, 2025

Uh oh!

Reneg973 Oct 29, 2025

Uh oh!

Reneg973 Oct 29, 2025

Uh oh!

Reneg973 Oct 29, 2025

Uh oh!

Reneg973 Oct 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	std::atomic<bool> stopFlag_{false}; // signal to stop the test
	// Removed unused stopFlag_ member variable

	auto lastTime = std::chrono::steady_clock::now();
	auto lastTime = std::chrono::steady_clock::now();
	const size_t MAX_HISTORY_SIZE = 100; // Maximum number of entries in throughputHistory

	void InitMetrics(const std::string /&name/)
	void InitMetrics(const std::string &name)

	size_t attribute_count = 1000; // Number of attribute sets to pre-generate
	size_t attribute_count = kDefaultAttributeCount; // Number of attribute sets to pre-generate

Conversation

lalitb commented Jan 10, 2025

Changes

Implementation Details:

Uh oh!

netlify bot commented Jan 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for opentelemetry-cpp-api-docs canceled.

Uh oh!

codecov bot commented Jan 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jul 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reneg973 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reneg973 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reneg973 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reneg973 Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

netlify bot commented Jan 10, 2025 •

edited

Loading

codecov bot commented Jan 13, 2025 •

edited

Loading