Skip to content

perf: eliminate duplicate by_index() call in ZIP extraction#54

Merged
bug-ops merged 1 commit intomainfrom
perf/zip-double-by-index
Feb 6, 2026
Merged

perf: eliminate duplicate by_index() call in ZIP extraction#54
bug-ops merged 1 commit intomainfrom
perf/zip-double-by-index

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Feb 6, 2026

Summary

  • Restructure ZipArchive::process_entry() from two by_index() calls per entry to one
  • Use single by_index() with three-way branching on entry type (directory/symlink/file)
  • Add ZipEntryAdapter::is_symlink_from_mode() for symlink detection from Unix mode bits
  • Remove unused to_entry_type() and is_symlink() methods

Benchmark Results

Benchmark Change Statistical
file_count_scaling/5000 -7.1% p < 0.05
many_small_files/1000 -4.6% p < 0.05
file_extraction/zip_small -11.8% p < 0.05
large_files/100MB noise p = 0.71
path_validation unchanged p = 0.60

Test plan

  • 750 tests pass (3 skipped, known zip crate symlink limitation)
  • Clippy clean, zero warnings
  • Documentation builds cleanly
  • Security audit: validation before disk writes preserved in all branches
  • No regressions in any benchmark category

Closes #52

Restructure ZipArchive::process_entry() from two by_index() calls per
entry to one. The previous approach called by_index() once for metadata
and again for extraction due to borrow checker constraints.

The new approach uses a single by_index() call with three-way branching
on entry type (directory/symlink/file), dropping the ZipFile borrow
explicitly before validator calls in directory and symlink branches.

Add ZipEntryAdapter::is_symlink_from_mode() to detect symlinks from
Unix mode bits without needing a live ZipFile reference. Remove the now
unused to_entry_type() and is_symlink() methods.

Benchmarks show 3-7% improvement for many-file ZIP workloads:
- file_count_scaling/5000: -7.1% (p < 0.05)
- many_small_files/1000: -4.6% (p < 0.05)
- file_extraction/zip_small: -11.8% (p < 0.05)

No regressions in single-file or validation benchmarks.

Closes #52
@github-actions github-actions bot added the core Changes to exarch-core label Feb 6, 2026
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 65.27778% with 25 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/exarch-core/src/formats/zip.rs 65.27% 25 Missing ⚠️

❌ Your patch status has failed because the patch coverage (65.27%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Impacted file tree graph

@@           Coverage Diff            @@
##             main      #54    +/-   ##
========================================
  Coverage   90.30%   90.31%            
========================================
  Files          62       58     -4     
  Lines       10347    10191   -156     
========================================
- Hits         9344     9204   -140     
+ Misses       1003      987    -16     
Flag Coverage Δ
exarch-python ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
crates/exarch-core/src/formats/zip.rs 78.79% <65.27%> (-0.57%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bug-ops bug-ops merged commit 928d14a into main Feb 6, 2026
20 checks passed
@bug-ops bug-ops deleted the perf/zip-double-by-index branch February 6, 2026 15:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Changes to exarch-core

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: ZIP extraction calls by_index() twice per entry

2 participants