Skip to content

Conversation

@LeoBorai
Copy link
Member

Replaces spaces with %20 in order to support filenames with spaces.
The whole request path is not being URL Encoded given that slashes
must remain.

Resolve: #163

Comment on lines 44 to 54
mime_guess = "2.0.4"
rustls = "0.20.6"
rustls-pemfile = "1.0.0"
serde = { version = "1.0.139", features = ["derive"] }
serde_json = "1.0.82"
structopt = { version = "0.3.26", default-features = false }
termcolor = "1.1.3"
tokio = { version = "1.19.2", features = ["fs", "rt-multi-thread", "signal", "macros"] }
tokio-rustls = "0.23.4"
toml = "0.5.9"
serde = { version = "1.0.139", features = ["derive"] }
serde_json = "1.0.82"
structopt = { version = "0.3.26", default-features = false }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reorganizes dependencies to be alphabetically sorted


if let Some(path_and_query) = uri_parts.path_and_query {
let path = path_and_query.path();
let path = path.replace("%20", " ");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today, spaces are the only characters causing this issue, so I will start by just replacing them instead of applying a full change to the whole path. The path here is then used to resolve the filename in the scoped file system.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just replacing spaces with %20 and vice versa doesn't solve the general problem. For example, things still break with a file called ?test.txt.

I think the safest thing to do here is to pull in the urlencoding crate or something similar to get a reasonably-edge-case-free transformation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed the comment about not wanting to encode slashes... that's kinda tricky.

You probably would have to url-encode each path segment separately, but I think that gets even messier if you want to support filenames that aren't valid UTF-8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I started to think about those scenarios as well. Theres many static file servers out there, I think I will have to dig further into best practices for this use case on URLs and find a better solution.

Perhaps another solution could be: Splitting by slashes, encode each resulting element and then join with slashes.

What do you think?

Copy link

@olivia-fl olivia-fl Jul 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's what I would go for. I ended up just writing an implementation of this and turns it it's way uglier than I thought it would be.

My general strategy was:

  • Encode each segment that is valid unicode with standard UTF-8 percent-encoding
  • Encode any segments that are not valid unicode in the platform's native representation

That "in the platform's native representation" part is painful though, because windows paths are UTF-16. Rust's standard library abstracts this away pretty well, but as far as I can tell the only thing to do when they aren't valid unicode is to handle the UTF-16 nonsense explicitly.

I wrote up a rough implementation of this using the percent_encoding crate (I abandoned the urlencoding crate due to this issue):

use percent_encoding::{utf8_percent_encode, percent_encode, AsciiSet, NON_ALPHANUMERIC};

const PERCENT_ENCODE_SET: &AsciiSet = &NON_ALPHANUMERIC
    .remove(b'-')
    .remove(b'_')
    .remove(b'.')
    .remove(b'~');

let file_path: PathBuf = ...;

let url_path = file_path.iter().flat_map(|component| {
    let segment = match component.to_str() {
        Some(component) => utf8_percent_encode(component, PERCENT_ENCODE_SET),
        None => {
            let bytes = {
                #[cfg(windows)]
                {
                    // Not having [T; N]: IntoIterator sucks
                    let mut bytes = Vec::with_capacity(component.len() * 2);
                    for wc in component.encode_wide() {
                        let wc = wc.to_be_bytes();
                        bytes.push(wc[0]);
                        bytes.push(wc[1]);
                    }
                    bytes
                }
                #[cfg(not(windows))]
                component.as_bytes()
            };
            percent_encode(&bytes, PERCENT_ENCODE_SET)
        }
    };
    std::iter::once("/").chain(segment)
}).collect::<String>();

Going the other direction has another fun windows-specific case to handle, where the percent-decoded path segment doesn't break into pairs:

let segments = url.path_segments().ok_or(Err(...))?;
let file_path = segments.map(|segment| {
    match percent_decode_str(segment).decode_utf8() {
        Ok(component) => Ok(OsStr::new(&component[..]).to_owned()),
        Err(_) => {
            let bytes = percent_decode_str(segment).collect::<Vec<u8>>();
            #[cfg(windows)]
            {
                let wide_bytes = bytes.iter().chunks(2).map(|slice| {
                    match slice.try_into::<[u8; 2]>() {
                        Ok(pair) => Ok(u16::from_be_bytes(pair)),
                        // This path segment decodes as an odd number of bytes,
                        // which means we can't convert it to an OsString on
                        // windows, and it will never represent a valid path.
                        // Should probably just return a 404 in this situation.
                        Err(err) => Err(...)
                    }
                }).collect::<Result<Vec<u16>, _>>()?;
                Ok(OsString::from_wide(&wide_bytes))
            }
            #[cfg(not(windows))]
            Ok(OsStr::from_bytes(&bytes).to_owned())
        }
    }
}).collect::<Result<PathBuf, _>>();

I know OsString uses WTF-8 internally, and my understanding is that if OsString/OsStr exposed that representation directly as a byte slice we could just use that without any platform-specific shenanigans. Unfortunately, afaict no such API exists. I'd be happy to find out I'm wrong on this one though.


also uhhh... I don't have a windows machine to test on. Very likely that there are logic bugs lurking in here. Probably want to write some tests and run CI on windows if you aren't already.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Benjamin-L thanks so much for these snippets and the explanation, its very helpful to be in context!
I agree with you on the approach you propose here, if you want to give it a shot in a PR please do so, I'm glad to work with you on this.

I think I will have some time to take care of this between later today and tomorrow.

Thanks again!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty busy the next few days, and probably don't have the time to familiarize myself with the http-server codebase enough to put together a PR. I'd be happy to review one though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! I will keep pushing on this PR soon and ping you!

hyper-rustls = { version = "0.23.0", features = ["webpki-roots"] }
local-ip-address = "0.4.5"
mime_guess = "2.0.4"
percent-encoding = "2.1.0"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Benjamin-L I'm using this crate based on your suggestion!

Comment on lines 199 to 205
fn make_dir_entry_link(root_dir: &Path, entry_path: &Path) -> String {
// format!("/{}", &entry_path[current_dir_path.len()..])
let root_dir = root_dir.to_str().unwrap();
let entry_path = entry_path.to_str().unwrap();
let path_buf = PathBuf::from_str(&entry_path[root_dir.len()..]).unwrap();

entry_path[root_dir.len()..].to_string()
encode_uri(&path_buf)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the code that turns the file path into a valid URI string. This is then passed to the File Explorer to provide links to navigate.

Comment on lines 66 to 70
if let Some(path_and_query) = uri_parts.path_and_query {
let path = path_and_query.path();

return Ok(PathBuf::from_str(path)?);
return decode_uri(path);
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's the code that turns the URI back into a file path

.remove(b'.')
.remove(b'~');

pub fn encode_uri(file_path: &PathBuf) -> String {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the implementation suggested here

@LeoBorai
Copy link
Member Author

@Benjamin-L here we have the implementation as suggested with your snippet! Please let me know your thoughts!

@olivia-fl
Copy link

olivia-fl commented Jul 23, 2022

Oooof, I just realized I made all these comments directly on the commit rather than as a review on this PR. I didn't even know that was a feature github supported. Sorry!

@LeoBorai
Copy link
Member Author

Oooof, I just realized I made all these comments directly on the commit rather than as a review on this PR. I didn't even know that was a feature github supported. Sorry!

No worries!

@LeoBorai
Copy link
Member Author

@Benjamin-L very good insights and suggestions, tysm! I just applied all of your comments. Theres still an issue for windows compilation due to the way path components are built and passed to percent_encode function, I'm thinking about using a block and conditional compilation to solve it.

Please let me know any improvements you find!

@LeoBorai
Copy link
Member Author

@Benjamin-L to avoid leaving this PR stale and have all of our progress in the source code, I will merge this PR and continue working on fixing Windows related issues in a separate PR. Thanks so much for your time and patience!

@LeoBorai LeoBorai merged commit e5c4aaa into main Jul 28, 2022
@LeoBorai LeoBorai deleted the fix/filsystem-paths-not-urlencoded branch July 28, 2022 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filesystem paths are not urlencoded

3 participants