Skip to content

Conversation

@jrey8343
Copy link

@jrey8343 jrey8343 commented Feb 7, 2026

Summary

This PR fixes two bugs found through OSS-Fuzz that caused file:// URLs to fail roundtrip tests (parse → serialize → parse).

Fixes #1101 - File URLs with hosts and paths containing multiple slashes were losing their host component during roundtrip
Fixes #1102 - set_host("localhost") on file:// URLs didn't normalize localhost to empty host like the parser does

Changes

Bug #1101: Path structure preservation

Problem: When parsing file URLs like file://host//path, the path normalization was stripping all leading slashes, causing the host to be lost on re-parse.

Fix: Modified parse_path() in parser.rs to preserve path structure when a host component is present. Leading slash normalization now only applies to hostless file:// URLs.

Example:

let url1 = Url::parse("file://.host//path").unwrap();
let serialized = url1.to_string();
let url2 = Url::parse(&serialized).unwrap();
assert_eq!(url1, url2); // Now passes

Bug #1102: localhost normalization in set_host()

Problem: The URL parser normalizes "localhost" to empty host for file:// URLs per WHATWG spec, but set_host() wasn't applying the same normalization, causing asymmetric behavior.

Fix: Modified set_host() in lib.rs to normalize "localhost" to None for file:// URLs, matching the parser's behavior.

Example:

let mut url = Url::parse("file:///path").unwrap();
url.set_host(Some("localhost")).unwrap();
let reparsed = Url::parse(&url.to_string()).unwrap();
assert_eq!(url, reparsed); // Now passes

Impact

  • ✅ All existing tests pass (14,095 tests)
  • ✅ Fuzzer verified with 1M+ iterations, no crashes
  • ✅ Improved WHATWG spec compliance
  • ✅ Resolved 4 Web Platform Tests that were previously expected failures:
    • file://spider///
    • file://monkey/ with pathname set to \\\\
    • file:///unicorn with pathname set to //\\/
    • file:///unicorn with pathname set to //monkey/..//

Testing

Added comprehensive test suite in url/tests/roundtrip_bugs.rs that reproduces both bugs and verifies the fixes.

WHATWG Spec Compliance

Both fixes align with the WHATWG URL Standard:


Found while integrating rust-url with OSS-Fuzz for continuous fuzzing.

Add 7 fuzz targets covering the entire rust-url workspace:

- fuzz_url_parse_roundtrip: URL parse/serialize roundtrip invariant checking
- fuzz_url_differential: relative URL resolution and make_relative roundtrip
- fuzz_url_setters: URL mutation via setters with validity invariant checks
- fuzz_idna: IDNA domain_to_ascii/domain_to_unicode roundtrip + Punycode
- fuzz_data_url: data: URL processing and base64 decoding
- fuzz_form_urlencoded: form-urlencoded parse/serialize roundtrip
- fuzz_percent_encoding: percent encode/decode roundtrip across ASCII sets

Also includes:
- Seed corpus with representative URL samples
- Fuzzing dictionary for URL/IDNA/data-url tokens
- CIFuzz workflow to fuzz all pull requests automatically
- fuzz_percent_encoding: use NON_ALPHANUMERIC for roundtrip assertions
  since it encodes '%', preventing spurious decode mismatches
- fuzz_url_differential: use char_indices() to split UTF-8 input on
  valid character boundaries, preventing panics on multi-byte chars
- fuzz.dict: replace C-style escapes (\t, \n, \r, \\) with \xHH hex
  escapes required by libfuzzer dictionary format
This commit fixes two bugs found through fuzzing that caused file:// URLs
to fail roundtrip tests (parse → serialize → parse).

Bug servo#1101: File URLs with hosts and paths starting with multiple slashes
were losing their host component during roundtrip. The path normalization
logic was too aggressive in stripping leading slashes, which changed how
the URL was interpreted on re-parsing.

Fix: Preserve path structure when a host component is present, only
normalizing leading slashes for hostless file:// URLs.

Bug servo#1102: Calling set_host("localhost") on file:// URLs didn't apply
the same normalization as the parser, which converts "localhost" to an
empty host per WHATWG spec.

Fix: Normalize "localhost" to empty host in set_host() for file:// URLs,
matching parser behavior.

Both fixes improve WHATWG URL spec compliance and resolve 4 previously
failing Web Platform Tests:
- file://spider///
- file://monkey/ with pathname set to \\\\
- file:///unicorn with pathname set to //\\/
- file:///unicorn with pathname set to //monkey/..//
@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (main@a66f422). Learn more about missing BASE report.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1103   +/-   ##
=======================================
  Coverage        ?   86.41%           
=======================================
  Files           ?       27           
  Lines           ?     5337           
  Branches        ?        0           
=======================================
  Hits            ?     4612           
  Misses          ?      725           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Manishearth
Copy link
Member

Could you rebase this fix off of the fuzz PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

set_host("localhost") on file:// URL produces non-roundtripping serialization file:// URL parse roundtrip mismatch

2 participants