Skip to content

ListingTableUrl Inconsistent Percent Encoding #8009

Closed
@tustvold

Description

@tustvold

Describe the bug

The behaviour of ListingTableUrl with respect to paths containing percent characters is rather confusing, and I suspect not entirely intentional.

Consider a filesystem containing a file named bar%2Ffoo, there is actually no obvious way to address this file.

let url = ListingTableUrl::parse("file:///foo/bar%2Ffoo").unwrap();
assert_eq!(url.prefix.as_ref(), "foo/bar/foo");

let url = ListingTableUrl::parse("file:///foo/a%252Fb.txt").unwrap();
assert_eq!(url.prefix.as_ref(), "foo/a%252Fb.txt");

let dir = tempdir().unwrap();
let path = dir.path().join("bar%2Ffoo");
std::fs::File::create(&path).unwrap();

let url = ListingTableUrl::parse(path.to_str().unwrap()).unwrap();
assert!(url.prefix.as_ref().ends_with("bar%252Ffoo"), "{}", url.prefix);

To Reproduce

No response

Expected behavior

The "correct" behaviour is that a file URL should be URL-encoded. That is according to the URL specification the correct way to reference this path would be file:///foo/a%252Fb.txt. Similarly the non-URL version should be foo/a%2Fb.txt.

That being said various tools instead interpret the URL path verbatim:

$ touch 'a%2Fb.txt'

$ aws --endpoint-url=http://localhost:4566 s3 cp 'a%2Fb.txt' s3://tustvold/

$ aws --endpoint-url=http://localhost:4566 s3 ls s3://tustvold/
2023-10-31 15:40:13          0 a%2Fb.txt

$ aws --endpoint-url=http://localhost:4566 s3 cp 's3://tustvold/a%2Fb.txt' foo.txt
aws --endpoint-url=http://localhost:4566 s3 cp 's3://tustvold/a%2Fb.txt' foo.txt

$ gsutil cp a\%2Fb.txt gs://tustvold

$ gsutil cp gs://tustvold/a\%2Fb.txt test

I'm not entirely sure how to classify DataFusion's current behaviour other than confusing. I think we should probably strive to replicate tools like the aws-cli and gsutil.

Additional context

The current behaviour was added in #3750

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions