Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multipart uploads do not properly handle UTF-8 encoded names #76

Open
badvision opened this issue Sep 28, 2022 · 3 comments
Open

Multipart uploads do not properly handle UTF-8 encoded names #76

badvision opened this issue Sep 28, 2022 · 3 comments

Comments

@badvision
Copy link

When building the multipart request, the charset of the filename part is not set as UTF-8 and instead is ISO-8859-1 and the filename characters that are 3-byte UTF-8 (such as Korean and Chinese glyphs) are squashed to ? characters.

@joerghoh
Copy link
Contributor

Strings in Java are UTF-8; are you sure that this squashing is not done much earlier? For example if you are on a windows platform and provide the filename via command line.

@badvision
Copy link
Author

badvision commented Sep 28, 2022 via email

@badvision
Copy link
Author

Specifically, the content type for the multipart of filename indicates it is text with ISO-1189-1 encoding, even though the request itself has UTF-8 encoding specified. Most tests don't pick up on this encoding snafu because lower ascii is the same in both encodings. Only multi-byte characters in the filename reveal this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants