-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix ListingTableUrl to decode percent
#3750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| parking_lot = "0.12" | ||
| parquet = { version = "24.0.0", features = ["arrow", "async"] } | ||
| paste = "^1.0" | ||
| percent-encoding = "2.2.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
datafusion/core uses url crate by servo.
percent-encoding is also published by the same repository.
| /// Creates a new [`ListingTableUrl`] from a url and optional glob expression | ||
| fn new(url: Url, glob: Option<Pattern>) -> Self { | ||
| let prefix = Path::parse(url.path()).expect("should be URL safe"); | ||
| let decoded_path = percent_encoding::percent_decode_str(url.path()).decode_utf8_lossy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the url.path(), it is still percent encoded string.
percent_decode_str decodes it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will cause the parse call below to fail in the event of non-ASCII characters.
I think it should just be a case of change Path::parse to Path::from
|
@alamb PTAL? |
| let child = Path::parse("/foob/bar").unwrap(); | ||
| assert!(url.strip_prefix(&child).is_none()); | ||
|
|
||
| let url = ListingTableUrl::parse("file:///with space/foo/bar").unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a test of a non-ASCII character such as a percent encoded emoji or something
We should also test non-URL safe characters like a percent encoded ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I added some test cases at 439f6dc
if it's not enough or wrong to test, please tell me about it.
tustvold
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me 👍
|
I think this just needs a cargo fmt and then we can get it in 😃 |
|
@tustvold Oh sorry, I added a commit for that😺 |
xudong963
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @unvalley
|
Benchmark runs are scheduled for baseline = ac1631a and contender = 2352f3e. 2352f3e is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #3589
Rationale for this change
What changes are included in this PR?
percent_encodingcrate todatafusion/coreto decode percent encoded url path.Are there any user-facing changes?