Skip to content

Conversation

@mre
Copy link
Member

@mre mre commented Sep 3, 2025

This commit fixes issue #972 where Windows absolute paths like C:\path were incorrectly parsed as URLs with scheme C:.

I also took that as an opportunity to finaly remove the assumption that non-existent inputs get parsed as HTTP URLs. So an input foo would previously have been automatically converted to http://foo/, which is a bold and mostly incorrect presumption. You can read more about it in the issue here.

Key changes in this PR:

  • Added WindowsPath newtype with proper detection using pattern matching
  • Moved Windows path logic to separate submodule for better organization
  • Removed automatic HTTP assumption (foo -> http://foo/)
  • Added InvalidInput error type with helpful error messages
  • Updated all tests to reflect new behavior

Fixes #972 and #1595

@katrinafyi
Copy link
Contributor

I have quick comments. The first, and easiest, is that lowercase drive letters should be permitted. The second is that I think that the Unix and Windows logic should be unified so Windows also gains helpful hints like the full URL suggestion. The third is that (imo) it would be preferable to avoid custom WindowsPath parsing and logic. Windows paths are notoriously complicated. Is there a reason why using PathBuf on Windows is not sufficient? Is there a need to recognise windows paths on Unix platforms?

It would also be nice if there was a CI job to run windows tests.

@mre
Copy link
Member Author

mre commented Sep 8, 2025

Thanks for the feedback; I'll look into it.

@mre mre added the triage label Sep 23, 2025
@mre
Copy link
Member Author

mre commented Oct 9, 2025

Sorry for dropping the ball on this. Just a reminder what needs to be done so that I can quickly jump back in when I find the time:

  • rebase
  • support lowercase Windows paths
  • Try to unify Unix and Windows logic
  • Try to use PathBuf for Windows instead of WindowsPath

@thomas-zahner thomas-zahner force-pushed the master branch 2 times, most recently from fcdf77c to e0912ab Compare October 21, 2025 12:53
mre added 6 commits December 16, 2025 10:50
This commit fixes issue #972 where Windows absolute paths like C:\path were incorrectly parsed as URLs with scheme C:.

Key changes:
- Added WindowsPath newtype with proper detection using pattern matching
- Moved Windows path logic to separate submodule for better organization
- Removed automatic HTTP assumption (foo -> http://foo/)
- Added InvalidInput error type with helpful error messages
- Updated all tests to reflect new behavior

Fixes #972
- Replace From<WindowsPath> for PathBuf with as_path() method for better API
- Move WindowsPath unit tests to the windows_path module for better organization
- Fix unused import warning
Update test expectation to match new error message format.
The error now occurs during file content reading rather than input parsing,
which is the correct behavior for relative paths.
  Address feedback by supporting lowercase Windows paths, refactoring to
  use PathBuf's is_absolute() instead of custom WindowsPath parsing,
  unifying error messages across platforms, and adding a Windows CI test
  job
@mre mre force-pushed the absolute-local-windows-paths branch from c1d2eee to 07b569a Compare December 16, 2025 11:14
mre added 7 commits December 16, 2025 13:26
Okay, this is slightly annoying. There are a lot of edge-cases with
input validation (who would have guessed?). I hope to have reached an
agreeable middle-ground by dividing validation into an early-stage if
the inputs are clearly file-paths and a deferred late-stage where they
are not (but might be). Not sure if we can do any better at this point.

This also means that one test for an non-existent file falls into the
early-validation category now and returns a better (I think) error
message. That's lovely.
}
return;
}
InputSource::FsPath(ref path) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you tack a predicate onto the case, it can avoid indenting the entire match arm.

Suggested change
InputSource::FsPath(ref path) => {
InputSource::FsPath(ref path) if !skip_missing => {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one.

Comment on lines 119 to 121
/// The given input is neither a valid file path nor a valid URL
#[error("{0}")]
InvalidInput(String),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the doc comment here already prescribes the function of this error, so the error should be made less flexible. i.e., the message "Input '{input}' not found as file and not a valid URL. Use full URL (e.g., https://example.com) or check file path." should be moved into [#error(...)] and the error's data should be input rather than any error message.

or, the doc comment can be made more general.

}

// We use [`reqwest::Url::parse`] because it catches some other edge cases that [`http::Request:builder`] does not
// Detect drive-letter paths with `Path::is_absolute()` This handles
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really sorry to do this to you but.. why is this so complicated? I was hoping that removing the HTTP assumption would make it shorter, not longer.

I think that having a heuristic for things which are unambiguous "enough" to bypass skip-missing is weird and unexpected. The --skip-missing help would have to be changed to reflect this, but it's really hard to explain to the user (what counts as ambiguous? it's completely arbitrary).

It could be simplified a lot by changing --skip-missing slightly:

  • If a command-line input is not-URL not-glob and doesn't exist, it will always be reported as an error. This avoids needing to decide if it's unambiguous enough.
  • skip-missing only applies to "discovered" inputs, such as permission denied while traversing directories or cannot read file.

With this change, the code would look something like this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. Thanks for the feedback. Your solution looks very clean. I will look into integrating it into my PR.

mre added 2 commits December 18, 2025 16:23
This is based on the awesome feedback by katarinafyi
mre added 3 commits December 18, 2025 16:52
This commit removes the restriction that only HTTP and HTTPS URLs are accepted as input sources. Now any URL with a scheme longer than one character is accepted, following the reviewer's suggestion for better future compatibility.

The previous approach was overly restrictive in the input parsing stage, rejecting schemes like ftp://, file://, and mailto: as invalid file paths. However, the existing filter infrastructure already supports arbitrary schemes through the --scheme flag, which accepts all schemes by default when no specific schemes are configured.

By accepting all URL schemes at the input parsing level, we enable better extensibility. When lychee gains support for new protocols like FTP or enhanced file:// handling, existing URLs with those schemes will automatically work without requiring changes to the input parser.

The change also aligns with the original design where scheme filtering happens at the checking stage rather than the parsing stage. Users can still restrict which schemes to check using the existing --scheme CLI option.

Updated the test suite to reflect the new behavior, replacing the test that expected scheme rejection with one that verifies scheme acceptance for future compatibility.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Absolute local paths on Windows not recognized

3 participants