Skip to content

Improve performance of schema validation #1580

@chrisjsewell

Description

@chrisjsewell

There have been reports of significant slow down in build times for projects with large number of needs (~100,000), when moving from v5 to v6, potentially due to schema analysis (taking up to 280 seconds).

In #1574, I added needs_schema_validation_enabled, as a workaround to avoid this entirely, and am now waiting on feedback on build times,
but obviously we ideally want for people to use this with less performance overhead.

In #1579, I added a benchmark test for schema validation, so that we can analyse this.

Below is the result from:

tox -e py312-benchmark -- tests/benchmarks/test_schema_benchmark.py --benchmark-columns=min,max,mean
--------------------------------------- benchmark: 4 tests --------------------------------------
Name (time in ms)                       Min                   Max                  Mean          
-------------------------------------------------------------------------------------------------
test_schema_benchmark[10]            2.7564 (1.0)          3.0168 (1.0)          2.8645 (1.0)    
test_schema_benchmark[100]          24.9164 (9.04)        26.0315 (8.63)        25.2987 (8.83)   
test_schema_benchmark[1000]        238.9974 (86.71)      246.3840 (81.67)      243.4033 (84.97)  
test_schema_benchmark[10000]     2,566.9278 (931.27)   2,670.7933 (885.30)   2,630.2005 (918.22) 
-------------------------------------------------------------------------------------------------

and this is the result from:

tox -e py312-benchmark -- tests/benchmarks/test_schema_benchmark.py -k "10000" --benchmark-cprofile=function_name --benchmark-cprofile-dump=schema.profile
uv run --with=snakeviz snakeviz schema.profile-test_schema_benchmark\[10000\].prof 
Image

Two performance improvements initially come to mind:

  1. Move validator creation to before looping through each need; this would mean it only gets created once, rather than for each need
  2. Try out using https://pypi.org/project/jsonschema-rs/

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions