Add performance testing framework #3

tyler5673 · 2025-12-12T18:51:16Z

This PR introduces a comprehensive performance testing infrastructure to measure SDK overhead versus API latency across all endpoints.

Core Infrastructure:

TimingHTTPClient - HTTP client wrapper that captures request/response timing at multiple layers
metrics.py - Statistical analysis utilities (P50/P95/P99 percentiles, mean, std dev, overhead calculation)
test_performance.py - 40 comprehensive test cases covering all endpoint combinations
Test Coverage:
Search API: 19 tests (filters, livecrawl, pagination, country/language options)
Agents API: 16 tests (agent types, tool combinations, verbosity levels)
Contents API: 5 tests (single/multiple URLs, HTML/Markdown formats)
Tooling:
run_performance_tests.sh - Convenience script with presets (--quick, --full, --search)
Configurable via environment variables (target, iterations, output format)
Supports both mock server (CI/CD) and custom server (real-world metrics)
CSV export for further analysis

Results
Testing against staging with 5 iterations per test:

Average SDK overhead: 4.0ms (0.5%)
Consistent performance across all endpoint types
Minimal impact even on fast queries (<500ms)

Example bash command to run perf test (5 iterations of each call) of our search API against staging:
./scripts/run_performance_tests.sh \ -t custom \ -u https://api-staging.you.com \ -k your_api_key_here \ -i 5 \ --search

output:

============================= test session starts ==============================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/tyler/Workspace/youdotcom-python-sdk/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/tyler/Workspace/youdotcom-python-sdk
configfile: pyproject.toml
plugins: asyncio-0.24.0, anyio-4.12.0
asyncio: mode=Mode.STRICT, default_loop_scope=function
collecting ... collected 19 items

tests/test_performance.py::TestSearchPerformance::test_search_basic PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_count PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_freshness_day PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_freshness_week PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_country_us PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_country_gb PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_language_en PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_language_es PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_safesearch_off PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_safesearch_moderate PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_safesearch_strict PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_pagination PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_web PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_news PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_all PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_html PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_markdown PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_all_filters PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_filters_and_livecrawl PASSED
========================================================================================================================
                                       Performance Test Results - Target: custom                                        
========================================================================================================================

Endpoint                                                  P50        P95        P99       Mean   Overhead      %
-------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ------
Search: basic query                                   649.9ms    911.3ms    946.2ms    704.8ms      4.1ms   0.6%
Search: with count=10                                 541.4ms    640.0ms    641.0ms    522.5ms      2.9ms   0.6%
Search: freshness=DAY                                 856.3ms    936.7ms    942.5ms    846.8ms      2.9ms   0.3%
Search: freshness=WEEK                                616.8ms    774.9ms    778.2ms    670.6ms      5.7ms   0.9%
Search: country=US                                    558.7ms    667.4ms    677.1ms    562.7ms      3.1ms   0.6%
Search: country=GB                                    526.7ms    706.2ms    720.7ms    576.9ms      3.2ms   0.6%
Search: language=EN                                   412.1ms    600.1ms    612.8ms    462.1ms      3.1ms   0.7%
Search: language=ES                                   560.3ms    639.2ms    646.7ms    580.4ms      3.7ms   0.7%
Search: safesearch=OFF                                510.2ms    558.7ms    565.6ms    493.3ms      4.6ms   1.0%
Search: safesearch=MODERATE                           377.9ms    493.2ms    514.8ms    400.9ms      3.2ms   0.8%
Search: safesearch=STRICT                             396.6ms    519.4ms    525.5ms    434.8ms      3.8ms   0.9%
Search: with pagination (offset=2)                    493.0ms    571.9ms    573.0ms    474.7ms      3.7ms   0.8%
Search: livecrawl=WEB                                   3.67s      4.11s      4.14s      3.74s      3.8ms   0.1%
Search: livecrawl=NEWS                                  2.71s      3.37s      3.38s      2.84s      6.4ms   0.2%
Search: livecrawl=ALL                                   3.76s      3.81s      3.82s      3.04s      3.4ms   0.1%
Search: livecrawl HTML format                           3.70s      3.77s      3.78s      2.95s      8.1ms   0.3%
Search: livecrawl Markdown format                       3.66s      4.24s      4.28s      3.43s      3.2ms   0.1%
Search: all filters combined                          713.1ms    813.4ms    815.4ms    727.3ms      3.3ms   0.5%
Search: filters + livecrawl                             1.20s      1.53s      1.53s      1.31s      3.0ms   0.2%
------------------------------------------------------------------------------------------------------------------------

Summary:
  Total test cases: 19
  Total iterations: 95
  Success rate: 95/95 (100.0%)

SDK Overhead Analysis:
  Average overhead: 4.0ms (0.5%)
  Min overhead: 2.9ms (best case)
  Max overhead: 8.1ms (worst case)
  Median overhead: 3.4ms

EdwardIrby · 2025-12-12T20:04:15Z

tests/timing_client.py

+    def __init__(self, *args, **kwargs):
+        super().__init__(*args, **kwargs)
+        self._timings: List[RequestTiming] = []
+        self._lock = Lock()


Is this aliasing to threading.Lock? Should we be using asyncio?

Yes, good call! That was intentional as IMO we'd want asyncio.Lock if we were doing async I/O inside the lock, but here we're just protecting a simple list mutation so everything remains in memory, and this eliminates the need for the extra helper methods reqd by asyncio. That being said, I don't see the need for async operation of the perf test at all, so I just removed the async client entirely since we no longer need it. LMK if you feel differently about me removing that functionality.

Nope I'm not a python expert just checking. I actually decided to modify some of my work based on what I saw here. https://linear.app/you/issue/DX-148/centralize-performance-tests-and-automate-weekly-performancemd-updates

EdwardIrby

Did we want to add pref regression checks?

tyler5673 · 2025-12-12T21:31:06Z

Did we want to add pref regression checks?

Yes, that's a great idea! Do you have something similar for MCP, if so what thresholds did you land on? CI for this repo will come together when @kevmalek 's schema unification changes are in, and I'd be happy to add a regression check for performance then

EdwardIrby · 2025-12-12T21:44:17Z

Did we want to add pref regression checks?

Yes, that's a great idea! Do you have something similar for MCP, if so what thresholds did you land on? CI for this repo will come together when @kevmalek 's schema unification changes are in, and I'd be happy to add a regression check for performance then

Nope I'm making it part of https://linear.app/you/issue/DX-148/centralize-performance-tests-and-automate-weekly-performancemd-updates

Add performance testing framework

fad04f5

tyler5673 marked this pull request as ready for review December 12, 2025 19:07

tyler5673 requested review from EdwardIrby, itsakhilyou and kevmalek December 12, 2025 19:07

EdwardIrby reviewed Dec 12, 2025

View reviewed changes

remove async functionality of timing client

4d5ae7a

EdwardIrby approved these changes Dec 12, 2025

View reviewed changes

tyler5673 merged commit e295ef9 into main Dec 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance testing framework #3

Add performance testing framework #3

Uh oh!

tyler5673 commented Dec 12, 2025 •

edited

Loading

Uh oh!

EdwardIrby Dec 12, 2025

Uh oh!

tyler5673 Dec 12, 2025

Uh oh!

EdwardIrby Dec 12, 2025

Uh oh!

EdwardIrby left a comment

Uh oh!

tyler5673 commented Dec 12, 2025

Uh oh!

EdwardIrby commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add performance testing framework #3

Add performance testing framework #3

Uh oh!

Conversation

tyler5673 commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EdwardIrby Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

tyler5673 Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

EdwardIrby Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

EdwardIrby left a comment

Choose a reason for hiding this comment

Uh oh!

tyler5673 commented Dec 12, 2025

Uh oh!

EdwardIrby commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tyler5673 commented Dec 12, 2025 •

edited

Loading