Skip to content

Conversation

@tyler5673
Copy link
Contributor

@tyler5673 tyler5673 commented Dec 12, 2025

This PR introduces a comprehensive performance testing infrastructure to measure SDK overhead versus API latency across all endpoints.

Core Infrastructure:

  • TimingHTTPClient - HTTP client wrapper that captures request/response timing at multiple layers
  • metrics.py - Statistical analysis utilities (P50/P95/P99 percentiles, mean, std dev, overhead calculation)
  • test_performance.py - 40 comprehensive test cases covering all endpoint combinations
    Test Coverage:
  • Search API: 19 tests (filters, livecrawl, pagination, country/language options)
  • Agents API: 16 tests (agent types, tool combinations, verbosity levels)
  • Contents API: 5 tests (single/multiple URLs, HTML/Markdown formats)
    Tooling:
  • run_performance_tests.sh - Convenience script with presets (--quick, --full, --search)
  • Configurable via environment variables (target, iterations, output format)
  • Supports both mock server (CI/CD) and custom server (real-world metrics)
  • CSV export for further analysis

Results
Testing against staging with 5 iterations per test:

  • Average SDK overhead: 4.0ms (0.5%)
  • Consistent performance across all endpoint types
  • Minimal impact even on fast queries (<500ms)

Example bash command to run perf test (5 iterations of each call) of our search API against staging:
./scripts/run_performance_tests.sh \ -t custom \ -u https://api-staging.you.com \ -k your_api_key_here \ -i 5 \ --search

output:

============================= test session starts ==============================
platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/tyler/Workspace/youdotcom-python-sdk/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/tyler/Workspace/youdotcom-python-sdk
configfile: pyproject.toml
plugins: asyncio-0.24.0, anyio-4.12.0
asyncio: mode=Mode.STRICT, default_loop_scope=function
collecting ... collected 19 items

tests/test_performance.py::TestSearchPerformance::test_search_basic PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_count PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_freshness_day PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_freshness_week PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_country_us PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_country_gb PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_language_en PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_language_es PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_safesearch_off PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_safesearch_moderate PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_safesearch_strict PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_pagination PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_web PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_news PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_all PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_html PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_livecrawl_markdown PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_all_filters PASSED
tests/test_performance.py::TestSearchPerformance::test_search_with_filters_and_livecrawl PASSED
========================================================================================================================
                                       Performance Test Results - Target: custom                                        
========================================================================================================================

Endpoint                                                  P50        P95        P99       Mean   Overhead      %
-------------------------------------------------- ---------- ---------- ---------- ---------- ---------- ------
Search: basic query                                   649.9ms    911.3ms    946.2ms    704.8ms      4.1ms   0.6%
Search: with count=10                                 541.4ms    640.0ms    641.0ms    522.5ms      2.9ms   0.6%
Search: freshness=DAY                                 856.3ms    936.7ms    942.5ms    846.8ms      2.9ms   0.3%
Search: freshness=WEEK                                616.8ms    774.9ms    778.2ms    670.6ms      5.7ms   0.9%
Search: country=US                                    558.7ms    667.4ms    677.1ms    562.7ms      3.1ms   0.6%
Search: country=GB                                    526.7ms    706.2ms    720.7ms    576.9ms      3.2ms   0.6%
Search: language=EN                                   412.1ms    600.1ms    612.8ms    462.1ms      3.1ms   0.7%
Search: language=ES                                   560.3ms    639.2ms    646.7ms    580.4ms      3.7ms   0.7%
Search: safesearch=OFF                                510.2ms    558.7ms    565.6ms    493.3ms      4.6ms   1.0%
Search: safesearch=MODERATE                           377.9ms    493.2ms    514.8ms    400.9ms      3.2ms   0.8%
Search: safesearch=STRICT                             396.6ms    519.4ms    525.5ms    434.8ms      3.8ms   0.9%
Search: with pagination (offset=2)                    493.0ms    571.9ms    573.0ms    474.7ms      3.7ms   0.8%
Search: livecrawl=WEB                                   3.67s      4.11s      4.14s      3.74s      3.8ms   0.1%
Search: livecrawl=NEWS                                  2.71s      3.37s      3.38s      2.84s      6.4ms   0.2%
Search: livecrawl=ALL                                   3.76s      3.81s      3.82s      3.04s      3.4ms   0.1%
Search: livecrawl HTML format                           3.70s      3.77s      3.78s      2.95s      8.1ms   0.3%
Search: livecrawl Markdown format                       3.66s      4.24s      4.28s      3.43s      3.2ms   0.1%
Search: all filters combined                          713.1ms    813.4ms    815.4ms    727.3ms      3.3ms   0.5%
Search: filters + livecrawl                             1.20s      1.53s      1.53s      1.31s      3.0ms   0.2%
------------------------------------------------------------------------------------------------------------------------

Summary:
  Total test cases: 19
  Total iterations: 95
  Success rate: 95/95 (100.0%)

SDK Overhead Analysis:
  Average overhead: 4.0ms (0.5%)
  Min overhead: 2.9ms (best case)
  Max overhead: 8.1ms (worst case)
  Median overhead: 3.4ms

@tyler5673 tyler5673 marked this pull request as ready for review December 12, 2025 19:07
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._timings: List[RequestTiming] = []
self._lock = Lock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this aliasing to threading.Lock? Should we be using asyncio?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good call! That was intentional as IMO we'd want asyncio.Lock if we were doing async I/O inside the lock, but here we're just protecting a simple list mutation so everything remains in memory, and this eliminates the need for the extra helper methods reqd by asyncio. That being said, I don't see the need for async operation of the perf test at all, so I just removed the async client entirely since we no longer need it. LMK if you feel differently about me removing that functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I'm not a python expert just checking. I actually decided to modify some of my work based on what I saw here. https://linear.app/you/issue/DX-148/centralize-performance-tests-and-automate-weekly-performancemd-updates

Copy link
Member

@EdwardIrby EdwardIrby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to add pref regression checks?

@tyler5673
Copy link
Contributor Author

Did we want to add pref regression checks?

Yes, that's a great idea! Do you have something similar for MCP, if so what thresholds did you land on? CI for this repo will come together when @kevmalek 's schema unification changes are in, and I'd be happy to add a regression check for performance then

@EdwardIrby
Copy link
Member

Did we want to add pref regression checks?

Yes, that's a great idea! Do you have something similar for MCP, if so what thresholds did you land on? CI for this repo will come together when @kevmalek 's schema unification changes are in, and I'd be happy to add a regression check for performance then

Nope I'm making it part of https://linear.app/you/issue/DX-148/centralize-performance-tests-and-automate-weekly-performancemd-updates

@tyler5673 tyler5673 merged commit e295ef9 into main Dec 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants