Skip to content

Conversation

@jrey8343
Copy link

@jrey8343 jrey8343 commented Feb 7, 2026

Summary

  • Add 7 new fuzz targets covering the entire rust-url workspace (url, idna, percent-encoding, form_urlencoded, data-url)
  • Include roundtrip invariant checks, differential testing, and mutation testing
  • Add seed corpus, fuzzing dictionary, and CIFuzz workflow for continuous fuzzing on PRs

Fuzz Targets

Target Crate(s) Strategy
fuzz_url_parse_roundtrip url Parse → serialize → re-parse roundtrip + component consistency
fuzz_url_differential url Relative URL resolution, join/make_relative roundtrip
fuzz_url_setters url Mutation via all setters, post-mutation validity invariant
fuzz_idna idna domain_to_asciidomain_to_unicode roundtrip, Punycode roundtrip
fuzz_data_url data-url DataUrl::process + decode, forgiving base64
fuzz_form_urlencoded form_urlencoded Parse → serialize → re-parse roundtrip
fuzz_percent_encoding percent-encoding Encode → decode roundtrip across multiple AsciiSets

Motivation

This is part of an effort to integrate rust-url into OSS-Fuzz for continuous fuzzing. URL parsing is a classic fuzzing target — it processes untrusted input, implements a complex spec (WHATWG URL Standard), and the workspace includes several sub-crates (IDNA, Punycode, percent-encoding, form-urlencoded, data-url) that each independently parse untrusted data.

The existing url/fuzz/ targets only cover the url crate itself. These new workspace-level targets extend coverage to all sub-crates with invariant-checking strategies that are most likely to surface real bugs.

Test plan

  • All targets compile with cargo check in fuzz/
  • Targets run successfully with cargo fuzz run <target> -- -max_total_time=60
  • CIFuzz workflow triggers on PRs

Add 7 fuzz targets covering the entire rust-url workspace:

- fuzz_url_parse_roundtrip: URL parse/serialize roundtrip invariant checking
- fuzz_url_differential: relative URL resolution and make_relative roundtrip
- fuzz_url_setters: URL mutation via setters with validity invariant checks
- fuzz_idna: IDNA domain_to_ascii/domain_to_unicode roundtrip + Punycode
- fuzz_data_url: data: URL processing and base64 decoding
- fuzz_form_urlencoded: form-urlencoded parse/serialize roundtrip
- fuzz_percent_encoding: percent encode/decode roundtrip across ASCII sets

Also includes:
- Seed corpus with representative URL samples
- Fuzzing dictionary for URL/IDNA/data-url tokens
- CIFuzz workflow to fuzz all pull requests automatically
- fuzz_percent_encoding: use NON_ALPHANUMERIC for roundtrip assertions
  since it encodes '%', preventing spurious decode mismatches
- fuzz_url_differential: use char_indices() to split UTF-8 input on
  valid character boundaries, preventing panics on multi-byte chars
- fuzz.dict: replace C-style escapes (\t, \n, \r, \\) with \xHH hex
  escapes required by libfuzzer dictionary format
@jrey8343
Copy link
Author

jrey8343 commented Feb 8, 2026

Hi — apologies for pushing this PR without coordinating more closely first. I should have discussed with you before adding fuzzing infrastructure.

That said, the fuzz targets did uncover a couple of bugs (#1101, #1102) which I've submitted a fix for in #1103. Hopefully that demonstrates some value.

If you'd prefer a lighter-weight approach, I'd be happy to set up ClusterFuzzLite instead — it runs in your GitHub Actions CI so you'd have full control. Just let me know what works best for the project and I'm happy to help however I can.

@Manishearth
Copy link
Member

Thank you for doing all this! I don't have time to review this soon but I appreciate the effort! I'm in favor of merging these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants