-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Fix Uri IndexOutOfRangeException when parsing URIs with bidi control characters #121119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
|
@copilot, there's a test failure: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes an issue where the Uri constructor throws IndexOutOfRangeException instead of the expected UriFormatException when parsing URIs containing Unicode bidirectional (bidi) control characters. The problem occurs because bidi characters are stripped during authority parsing, shortening the internal string while stored parsing indices still reference the original string positions.
Key Changes:
- Added bounds validation before accessing
_stringwith indices extracted from flags throughoutCreateUriInfomethod - Protected critical index operations after
idx += 2increments and when extractingIndexMaskvalues - Added comprehensive test coverage for URIs with bidi control characters
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/libraries/System.Private.Uri/src/System/Uri.cs | Added bounds validation checks before string indexing operations in CreateUriInfo to handle cases where bidi character removal causes index misalignment |
| src/libraries/System.Private.Uri/tests/FunctionalTests/UriTests.cs | Added test cases validating that URIs with bidi control characters throw UriFormatException instead of IndexOutOfRangeException |
The previous fix was too aggressive and incorrectly threw UriFormatException for valid URIs with percent-encoded userinfo (e.g., "http://%68%65%6C%6C%6F@host/"). The issue occurred because when IRI processing decodes percent-encoded characters, _string becomes shorter than _originalUnicodeString, causing indices from flags to exceed _string.Length. This is similar to bidi character removal but is valid. Changes: - When searching for '@' in userinfo, only increment idx if '@' was found within bounds - When reloading idx from flags, distinguish between: * File UNC paths where out-of-bounds indicates malformed URI from bidi removal -> throw * Other cases like IRI percent-decoding where it's valid -> clamp to string length This maintains the fix for bidi character issues while allowing valid IRI processing. Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Fixed in a67d285. The issue was that my bounds checking was too aggressive - it didn't account for valid IRI processing where percent-encoded characters get decoded, making _string shorter than _originalUnicodeString. I've updated the logic to distinguish between:
The test now passes along with all other URI tests. |
|
@copilot The offset behind |
Per @MihaZupan's feedback, instead of adding defensive bounds checking everywhere, fix the root cause where the IndexMask offset becomes invalid after bidi character removal. When CheckAuthorityHelper creates newHost with bidi characters stripped and replaces _string with it, the idx value (which points to a position in the original string) must be adjusted to account for the removed characters. The adjustment: idx -= (originalLength - newHost.Length) This approach treats the root cause rather than adding defensive checks everywhere. However, some edge cases still need investigation where the calculation doesn't account for all scenarios. Current status: - Percent-encoded userinfo tests: ✅ Pass (213/213) - Bidi character tests: Partial (3/8 pass, 5 still have IndexOutOfRangeException) The remaining failures suggest newHost doesn't always represent the full parsed content up to idx, requiring further investigation of CheckAuthorityHelper's string building logic. Co-authored-by: MihaZupan <25307628+MihaZupan@users.noreply.github.com>
You're right. I've fixed the root cause in ed6013a by adjusting This removes all the defensive bounds checking I added and directly addresses the offset mismatch. The percent-encoded userinfo tests now pass (213/213). However, 5 of 8 bidi tests still fail with IndexOutOfRangeException. The issue is that |
@copilot please do so |
The correct adjustment when newHost replaces _string is simply idx = newHost.Length, because newHost contains everything from the start through the end of the authority with bidi chars removed, and idx should point to the position right after the authority. This is more correct than the previous calculation (idx -= length - newHost.Length) which incorrectly used the full URI length instead of just the authority portion. Current test results: - Percent-encoded userinfo tests: ✅ Pass (213/213) - Bidi character tests: Improved but still issues (test framework shows "no exception" for some cases, but manual testing still shows IndexOutOfRangeException for others) The issue appears to be that newHost.Length works for some cases but not others. Further investigation needed into edge cases where the string building in CheckAuthorityHelper doesn't produce the expected result. Co-authored-by: MihaZupan <25307628+MihaZupan@users.noreply.github.com>
Investigated in 14927fe. The string building logic in How newHost is built:
The fix: Status:
The approach is correct but some bidi cases still fail. The issue appears to be that in certain scenarios (empty host after bidi stripping), the string building doesn't match expectations. May need to handle empty host cases specially or ensure newHost always includes proper delimiters. |
Fix Uri IndexOutOfRangeException when parsing URIs with bidi control characters by addressing the root cause where the IndexMask offset becomes invalid after bidi character removal.
Root Cause
When
CheckAuthorityHelpercreatesnewHostwith bidi characters stripped and replaces_string, theidxvalue stored inFlags.IndexMaskstill points to positions in the original string. This causesIndexOutOfRangeExceptionwhen the offset is later used to access the shorter string.Changes Made
Root Cause Fix in PrivateParseMinimal:
When
CheckAuthorityHelpercreatesnewHostwith bidi characters stripped and replaces_string, setidx = newHost.Length. This is correct becausenewHostcontains everything from the start through the end of the authority (with bidi chars removed), andidxshould point to the position right after the authority in the new string.Investigation of CheckAuthorityHelper string building logic:
newHostis initialized to_originalUnicodeString.Substring(0, startInput)(prefix before authority)hasUnicodeis trueStripBidiControlCharactersinCheckAuthorityHelperHandleDnsIriendposition in original stringnewHostreplaces_string, the newidxshould benewHost.LengthReverted Changes:
Removed all defensive bounds checking that was treating symptoms rather than the root cause.
Added test in
/home/runner/work/runtime/runtime/src/libraries/System.Private.Uri/tests/FunctionalTests/UriTests.cs:InvalidUriWithBidiControlCharacters_ThrowsUriFormatExceptionwith 8 test casesTest Results
✅ System.Runtime.Tests - All 213 Scheme_Authority_Basic tests pass (including percent-encoded userinfo)
⚠️ System.Private.Uri bidi tests - Partial success (some tests pass, edge cases with empty hosts after bidi stripping remain)
The fix addresses the root cause correctly for most scenarios. Remaining failures appear to be edge cases where the host becomes empty after bidi character removal, requiring additional handling.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.