-
Notifications
You must be signed in to change notification settings - Fork 22
fix: replaces spaces with %20 for filenames
#164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| mime_guess = "2.0.4" | ||
| rustls = "0.20.6" | ||
| rustls-pemfile = "1.0.0" | ||
| serde = { version = "1.0.139", features = ["derive"] } | ||
| serde_json = "1.0.82" | ||
| structopt = { version = "0.3.26", default-features = false } | ||
| termcolor = "1.1.3" | ||
| tokio = { version = "1.19.2", features = ["fs", "rt-multi-thread", "signal", "macros"] } | ||
| tokio-rustls = "0.23.4" | ||
| toml = "0.5.9" | ||
| serde = { version = "1.0.139", features = ["derive"] } | ||
| serde_json = "1.0.82" | ||
| structopt = { version = "0.3.26", default-features = false } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorganizes dependencies to be alphabetically sorted
src/addon/file_server/mod.rs
Outdated
|
|
||
| if let Some(path_and_query) = uri_parts.path_and_query { | ||
| let path = path_and_query.path(); | ||
| let path = path.replace("%20", " "); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As of today, spaces are the only characters causing this issue, so I will start by just replacing them instead of applying a full change to the whole path. The path here is then used to resolve the filename in the scoped file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just replacing spaces with %20 and vice versa doesn't solve the general problem. For example, things still break with a file called ?test.txt.
I think the safest thing to do here is to pull in the urlencoding crate or something similar to get a reasonably-edge-case-free transformation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed the comment about not wanting to encode slashes... that's kinda tricky.
You probably would have to url-encode each path segment separately, but I think that gets even messier if you want to support filenames that aren't valid UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I started to think about those scenarios as well. Theres many static file servers out there, I think I will have to dig further into best practices for this use case on URLs and find a better solution.
Perhaps another solution could be: Splitting by slashes, encode each resulting element and then join with slashes.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's what I would go for. I ended up just writing an implementation of this and turns it it's way uglier than I thought it would be.
My general strategy was:
- Encode each segment that is valid unicode with standard UTF-8 percent-encoding
- Encode any segments that are not valid unicode in the platform's native representation
That "in the platform's native representation" part is painful though, because windows paths are UTF-16. Rust's standard library abstracts this away pretty well, but as far as I can tell the only thing to do when they aren't valid unicode is to handle the UTF-16 nonsense explicitly.
I wrote up a rough implementation of this using the percent_encoding crate (I abandoned the urlencoding crate due to this issue):
use percent_encoding::{utf8_percent_encode, percent_encode, AsciiSet, NON_ALPHANUMERIC};
const PERCENT_ENCODE_SET: &AsciiSet = &NON_ALPHANUMERIC
.remove(b'-')
.remove(b'_')
.remove(b'.')
.remove(b'~');
let file_path: PathBuf = ...;
let url_path = file_path.iter().flat_map(|component| {
let segment = match component.to_str() {
Some(component) => utf8_percent_encode(component, PERCENT_ENCODE_SET),
None => {
let bytes = {
#[cfg(windows)]
{
// Not having [T; N]: IntoIterator sucks
let mut bytes = Vec::with_capacity(component.len() * 2);
for wc in component.encode_wide() {
let wc = wc.to_be_bytes();
bytes.push(wc[0]);
bytes.push(wc[1]);
}
bytes
}
#[cfg(not(windows))]
component.as_bytes()
};
percent_encode(&bytes, PERCENT_ENCODE_SET)
}
};
std::iter::once("/").chain(segment)
}).collect::<String>();Going the other direction has another fun windows-specific case to handle, where the percent-decoded path segment doesn't break into pairs:
let segments = url.path_segments().ok_or(Err(...))?;
let file_path = segments.map(|segment| {
match percent_decode_str(segment).decode_utf8() {
Ok(component) => Ok(OsStr::new(&component[..]).to_owned()),
Err(_) => {
let bytes = percent_decode_str(segment).collect::<Vec<u8>>();
#[cfg(windows)]
{
let wide_bytes = bytes.iter().chunks(2).map(|slice| {
match slice.try_into::<[u8; 2]>() {
Ok(pair) => Ok(u16::from_be_bytes(pair)),
// This path segment decodes as an odd number of bytes,
// which means we can't convert it to an OsString on
// windows, and it will never represent a valid path.
// Should probably just return a 404 in this situation.
Err(err) => Err(...)
}
}).collect::<Result<Vec<u16>, _>>()?;
Ok(OsString::from_wide(&wide_bytes))
}
#[cfg(not(windows))]
Ok(OsStr::from_bytes(&bytes).to_owned())
}
}
}).collect::<Result<PathBuf, _>>();I know OsString uses WTF-8 internally, and my understanding is that if OsString/OsStr exposed that representation directly as a byte slice we could just use that without any platform-specific shenanigans. Unfortunately, afaict no such API exists. I'd be happy to find out I'm wrong on this one though.
also uhhh... I don't have a windows machine to test on. Very likely that there are logic bugs lurking in here. Probably want to write some tests and run CI on windows if you aren't already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Benjamin-L thanks so much for these snippets and the explanation, its very helpful to be in context!
I agree with you on the approach you propose here, if you want to give it a shot in a PR please do so, I'm glad to work with you on this.
I think I will have some time to take care of this between later today and tomorrow.
Thanks again!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty busy the next few days, and probably don't have the time to familiarize myself with the http-server codebase enough to put together a PR. I'd be happy to review one though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries! I will keep pushing on this PR soon and ping you!
| hyper-rustls = { version = "0.23.0", features = ["webpki-roots"] } | ||
| local-ip-address = "0.4.5" | ||
| mime_guess = "2.0.4" | ||
| percent-encoding = "2.1.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Benjamin-L I'm using this crate based on your suggestion!
| fn make_dir_entry_link(root_dir: &Path, entry_path: &Path) -> String { | ||
| // format!("/{}", &entry_path[current_dir_path.len()..]) | ||
| let root_dir = root_dir.to_str().unwrap(); | ||
| let entry_path = entry_path.to_str().unwrap(); | ||
| let path_buf = PathBuf::from_str(&entry_path[root_dir.len()..]).unwrap(); | ||
|
|
||
| entry_path[root_dir.len()..].to_string() | ||
| encode_uri(&path_buf) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the code that turns the file path into a valid URI string. This is then passed to the File Explorer to provide links to navigate.
| if let Some(path_and_query) = uri_parts.path_and_query { | ||
| let path = path_and_query.path(); | ||
|
|
||
| return Ok(PathBuf::from_str(path)?); | ||
| return decode_uri(path); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the code that turns the URI back into a file path
src/utils/url_encode.rs
Outdated
| .remove(b'.') | ||
| .remove(b'~'); | ||
|
|
||
| pub fn encode_uri(file_path: &PathBuf) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the implementation suggested here
|
@Benjamin-L here we have the implementation as suggested with your snippet! Please let me know your thoughts! |
|
Oooof, I just realized I made all these comments directly on the commit rather than as a review on this PR. I didn't even know that was a feature github supported. Sorry! |
No worries! |
Suggested on: 9ab6a80#r79215624
Suggested on: 9ab6a80#r79215901
|
@Benjamin-L very good insights and suggestions, tysm! I just applied all of your comments. Theres still an issue for windows compilation due to the way path components are built and passed to Please let me know any improvements you find! |
|
@Benjamin-L to avoid leaving this PR stale and have all of our progress in the source code, I will merge this PR and continue working on fixing Windows related issues in a separate PR. Thanks so much for your time and patience! |
Replaces spaces with
%20in order to support filenames with spaces.The whole request path is not being URL Encoded given that slashes
must remain.
Resolve: #163