-
-
Notifications
You must be signed in to change notification settings - Fork 189
Fix Windows Absolute Path Parsing and Remove HTTP Assumption #1837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ac066ce to
c1d2eee
Compare
|
I have quick comments. The first, and easiest, is that lowercase drive letters should be permitted. The second is that I think that the Unix and Windows logic should be unified so Windows also gains helpful hints like the full URL suggestion. The third is that (imo) it would be preferable to avoid custom WindowsPath parsing and logic. Windows paths are notoriously complicated. Is there a reason why using PathBuf on Windows is not sufficient? Is there a need to recognise windows paths on Unix platforms? It would also be nice if there was a CI job to run windows tests. |
|
Thanks for the feedback; I'll look into it. |
|
Sorry for dropping the ball on this. Just a reminder what needs to be done so that I can quickly jump back in when I find the time:
|
fcdf77c to
e0912ab
Compare
This commit fixes issue #972 where Windows absolute paths like C:\path were incorrectly parsed as URLs with scheme C:. Key changes: - Added WindowsPath newtype with proper detection using pattern matching - Moved Windows path logic to separate submodule for better organization - Removed automatic HTTP assumption (foo -> http://foo/) - Added InvalidInput error type with helpful error messages - Updated all tests to reflect new behavior Fixes #972
- Replace From<WindowsPath> for PathBuf with as_path() method for better API - Move WindowsPath unit tests to the windows_path module for better organization - Fix unused import warning
Update test expectation to match new error message format. The error now occurs during file content reading rather than input parsing, which is the correct behavior for relative paths.
Address feedback by supporting lowercase Windows paths, refactoring to use PathBuf's is_absolute() instead of custom WindowsPath parsing, unifying error messages across platforms, and adding a Windows CI test job
c1d2eee to
07b569a
Compare
Okay, this is slightly annoying. There are a lot of edge-cases with input validation (who would have guessed?). I hope to have reached an agreeable middle-ground by dividing validation into an early-stage if the inputs are clearly file-paths and a deferred late-stage where they are not (but might be). Not sure if we can do any better at this point. This also means that one test for an non-existent file falls into the early-validation category now and returns a better (I think) error message. That's lovely.
lychee-lib/src/types/input/input.rs
Outdated
| } | ||
| return; | ||
| } | ||
| InputSource::FsPath(ref path) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you tack a predicate onto the case, it can avoid indenting the entire match arm.
| InputSource::FsPath(ref path) => { | |
| InputSource::FsPath(ref path) if !skip_missing => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one.
| /// The given input is neither a valid file path nor a valid URL | ||
| #[error("{0}")] | ||
| InvalidInput(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the doc comment here already prescribes the function of this error, so the error should be made less flexible. i.e., the message "Input '{input}' not found as file and not a valid URL. Use full URL (e.g., https://example.com) or check file path." should be moved into [#error(...)] and the error's data should be input rather than any error message.
or, the doc comment can be made more general.
lychee-lib/src/types/input/source.rs
Outdated
| } | ||
|
|
||
| // We use [`reqwest::Url::parse`] because it catches some other edge cases that [`http::Request:builder`] does not | ||
| // Detect drive-letter paths with `Path::is_absolute()` This handles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really sorry to do this to you but.. why is this so complicated? I was hoping that removing the HTTP assumption would make it shorter, not longer.
I think that having a heuristic for things which are unambiguous "enough" to bypass skip-missing is weird and unexpected. The --skip-missing help would have to be changed to reflect this, but it's really hard to explain to the user (what counts as ambiguous? it's completely arbitrary).
It could be simplified a lot by changing --skip-missing slightly:
- If a command-line input is not-URL not-glob and doesn't exist, it will always be reported as an error. This avoids needing to decide if it's unambiguous enough.
- skip-missing only applies to "discovered" inputs, such as permission denied while traversing directories or cannot read file.
With this change, the code would look something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries. Thanks for the feedback. Your solution looks very clean. I will look into integrating it into my PR.
This is based on the awesome feedback by katarinafyi
This commit removes the restriction that only HTTP and HTTPS URLs are accepted as input sources. Now any URL with a scheme longer than one character is accepted, following the reviewer's suggestion for better future compatibility. The previous approach was overly restrictive in the input parsing stage, rejecting schemes like ftp://, file://, and mailto: as invalid file paths. However, the existing filter infrastructure already supports arbitrary schemes through the --scheme flag, which accepts all schemes by default when no specific schemes are configured. By accepting all URL schemes at the input parsing level, we enable better extensibility. When lychee gains support for new protocols like FTP or enhanced file:// handling, existing URLs with those schemes will automatically work without requiring changes to the input parser. The change also aligns with the original design where scheme filtering happens at the checking stage rather than the parsing stage. Users can still restrict which schemes to check using the existing --scheme CLI option. Updated the test suite to reflect the new behavior, replacing the test that expected scheme rejection with one that verifies scheme acceptance for future compatibility.
This commit fixes issue #972 where Windows absolute paths like C:\path were incorrectly parsed as URLs with scheme
C:.I also took that as an opportunity to finaly remove the assumption that non-existent inputs get parsed as HTTP URLs. So an input
foowould previously have been automatically converted tohttp://foo/, which is a bold and mostly incorrect presumption. You can read more about it in the issue here.Key changes in this PR:
Fixes #972 and #1595