-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
screenshot: long filename handling improvements #10052
base: master
Are you sure you want to change the base?
Conversation
That's done |
b8eb5bb
to
0738e2e
Compare
0738e2e
to
8d62bb6
Compare
8d62bb6
to
a89f2dd
Compare
I'd take the UTF-8 fix without the win32-specific changes. There's #12119 for that now. |
a89f2dd
to
ca0c65a
Compare
Forced pushed to remove the Win32 changes and to make a couple of the screenshot file writing functions use |
player/screenshot.c
Outdated
@@ -127,6 +127,30 @@ static void append_filename(char **s, const char *f) | |||
talloc_free(append); | |||
} | |||
|
|||
static void trim_invalid_utf8(char *s, size_t len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you want to "fix" invalid names? If the name is invalid then mpv should fail when it tries to use it.
There's no end to trying to "fix" bad names, and mpv should not try that, except if there's a very good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truncate_long_base_filename
can leave the basename with an invalid codepoint so trim_invalid_utf8
is used to remove such a thing. (this is described in the PR text above)
ca0c65a
to
25c1a4c
Compare
25c1a4c
to
a7b5097
Compare
For screenshot filenames, it was possible for the basename to be longer than what filesystems generally support. On Linux, this is 255 bytes. On Windows, this is 255 wchar_t units. Thus basenames are truncated to under 255 bytes so that the basename + extension are <= 255 with `truncate_long_base_filename`. It also makes sure not to produce an invalid UTF-8 codepoint in the filename. For testing, filling `screenshot-template=` with 3-byte or 4-byte UTF-8 codepoints is best. Such as "ウ" (3-byte) or "🌂" (4-byte). Example: 84 * strlen("ウ") + strlen(".jpg") == 256 The last "ウ" is removed and the basename string will be filled with 83 "ウ" characters and ".jpg" totalling 253 bytes.
a7b5097
to
03e91b6
Compare
// If truncation produces an invalid UTF-8 codepoint, then chop that off. | ||
static void truncate_long_base_filename(char *s, const size_t ext_len) | ||
{ | ||
const size_t max_utf8_bytes = 255 - (ext_len + 1); // ext_len+1 for '.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
255 is a magic number. It should not be hardcoded. It should probably be MAX_PATH.
On windows MAX_PATH can be more than 255, because it has enough space to hold UTF8 of 260 wchar_t
elements.
Also, if extlen
is 255 or more, then max_utf8_bytes
will wrap around to a huge number...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be MAX_PATH
because MAX_PATH
is not the basename length. MAX_PATH
is to hold drive-letter + ":\"
+ basename of 255 256(??) wchar_t
/char
elements + NUL terminator.
Per Maximum Path Length Limitation for Windows:
This type of path is composed of components separated by backslashes, each up to the value returned in the lpMaximumComponentLength parameter of the GetVolumeInformation function (this value is commonly 255 characters).
Bothering to check GetVolumeInformation
isn't worth doing though.
All relevant filesystem use 255 for segments of filename (including for non-Windows OSes).
If ext_len
is 255 let mpv blow up because that's an absurd case to care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not be
MAX_PATH
Which is why I said probably, I.e. you should figure whether it should be MAX_PATH or something else, like MAX_NAME.
The point is that 255 should not be hardcoded. It should be appropriate for the current platform, and if it's not MAX_PATH and not MAX_NAME then you should figure out what it needs to be, without calling APIs.
It should probably be some existing constant of the platform, and not hardcoded inside this function.
If
ext_len
is 255 let mpv blow up because that's an absurd case to care about.
In your applications maybe. Not in mpv.
You mean that the example you gave which should be fixed are not absurd cases, and so we should really care about them, like this?
screenshot-template="a🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂"
Luckily for you though, it won't blow up, but it will also not work.
It's not rocket science. Please fix it correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is why I said probably, I.e. you should figure whether it should be MAX_PATH or something else, like MAX_NAME.
Which is what I did and why 255.
The point is that 255 should not be hardcoded. It should be appropriate for the current platform, and if it's not MAX_PATH and not MAX_NAME then you should figure out what it needs to be, without calling APIs.
Don't hardcode but also figure it out without calling APIs? Okay...
If ext_len is 255 let mpv blow up because that's an absurd case to care about.
In your applications maybe. Not in mpv.
The extension comes from mpv. If mpv decides to use 255 character long extensions then that is mpv's fault.
screenshot-template="a🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂🌂"
Luckily for you though, it won't blow up, but it will also not work.
It's not rocket science. Please fix it correctly.
You're missing something if you think that won't work, or why it was listed as an example for testing removal of invalidated UTF-8 codepoints due to truncation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should probably be some existing constant of the platform
I don't know if it's available.
We have this:
Lines 539 to 547 in ef4c6df
// Windows' MAX_PATH/PATH_MAX/FILENAME_MAX is fixed to 260, but this limit | |
// applies to unicode paths encoded with wchar_t (2 bytes on Windows). The UTF-8 | |
// version could end up bigger in memory. In the worst case each wchar_t is | |
// encoded to 3 bytes in UTF-8, so in the worst case we have: | |
// wcslen(wpath) * 3 <= strlen(utf8path) | |
// Thus we need MP_PATH_MAX as the UTF-8/char version of PATH_MAX. | |
// Also make sure there's free space for the terminating \0. | |
// (For codepoints encoded as UTF-16 surrogate pairs, UTF-8 has the same length.) | |
#define MP_PATH_MAX (FILENAME_MAX * 3 + 1) |
But it's only used privately at this C file when enumerating files in a directory.
Not sure how to solve this in general. I don't think we should change the global MAX_PATH either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NAME_MAX
seems to available in limits.h
for Linux/macos as 255. It's also in my mingw64/msys2's limits.h
but behind a _POSIX_
ifdef. BSDs have MAXNAMELEN
which is 255 as far as I can tell.
Something like this, hardcoding 255, or calling out to GetVolumeInformation
/pathconf(_PC_NAME_MAX)
#include <limits.h>
#ifndef NAME_MAX
#ifdef MAXNAMELEN
#define NAME_MAX MAXNAMELEN
#else
#define NAME_MAX 255
#endif
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For win32 you can use _MAX_FNAME
https://learn.microsoft.com/en-us/cpp/c-runtime-library/path-field-limits
Note that _MAX_FNAME
includes space for terminating null, while NAME_MAX
does not.
EDIT: And just as I mentioned in the other PR. Shouldn't we set long paths support in manifest on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that
_MAX_FNAME
includes space for terminating null, whileNAME_MAX
does not.
I didn't include it because of that but yeah a Windows specific define could be (_MAX_FNAME-1)
which is 255
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: And just as I mentioned in the other PR. Shouldn't we set long paths support in manifest on Windows?
In my opinion, yes..
After an IRC discussion it turned out that choosing the right amount to truncate at is hard because it's not straightforward bytes on every OS. I see three options:
|
Subtitle text via
Converting the basename with edit: doesn't append the basename back to the path or anything so would need to be edited to do that
|
You have to admit that this is a niche usecase. Users may very well write a script to correctly take screenshots named after subtitle text if they want to do that.
Sure, but this is a good example for platform-specific complicated support code that I'd like to avoid.
Terrible idea IMO. |
That's much more work than just throwing
I'd like to avoid it too especially since it could cause file access issues if you were to have a filename on NTFS that'd be longer than 255 UTF-8 bytes in Linux.
Networked file shares and also file access issues from Linux -> Windows again. |
Honestly, I think long/unsupported filenames should be rejected with an error for user to act on. Truncating it implicitly doesn't really help anyone. |
For screenshot filenames, it was possible for the basename to be
longer than what filesystems generally support.
On Linux, this is 255 bytes. On Windows, this is 255 wchar_t units.
Thus basenames are truncated to under 255 bytes so that the
basename + extension are <= 255 with
truncate_long_base_filename
.It also makes sure not to produce an invalid UTF-8 codepoint in the filename.
For testing, filling
screenshot-template=
with 3-byte or 4-byteUTF-8 codepoints is best. Such as "ウ" (3-byte) or "🌂" (4-byte).
Example: 84 * strlen("ウ") + strlen(".jpg") == 256
The last "ウ" is removed and the basename string will be
filled with 83 "ウ" characters and ".jpg" totalling 253 bytes.
I only tested on Windows 10 21H2 x64 and also here's some copy & paste
screenshot-template
s