-
Notifications
You must be signed in to change notification settings - Fork 579
gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a readonly global variable which contains an upgraded empty string (eg UTF8-on). When concatenated into a string it (like any other upgraded string) causes the resulting string to also be upgraded. Similar to calling utf8::upgrade() on the string, but suitable for use in double quoted strings or regex patterns or other places where adding a utf8::upgrade() call might be awkward. Assuming this gets merged I can think of some test code that could be cleaned up by using this, and I definitely would use it in one-liners and what not as well.
It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode. (and why do we upgrade when appending a zero-length string?) |
Anyone who sufficiently understands the Perl unicode model to correctly use this does not need it (thus it would only provide a footgun for people who shouldn't be using it), and Perl does not guarantee anything about how concatenation affects the internal state of a string. |
To clarify my comment; forcing upgrades on strings is only useful for working around bugs, and though such bugs are common even within Perl core, I don't feel it would be prudent to present additional features suggesting that it is a reliable and coherent operation to those unfamiliar with the Unicode string model, which I would conservatively estimate is 99% of Perl users. And regarding this specific operation, it would be perfectly reasonable and likely that Perl would choose differently in different versions how the internal storage of a string should be affected by appending an empty string of any sort. |
Yeah, that. |
I don't really follow. This is just the same as using a
We upgrade whenever we append a UTF8-on string with a non-UT8-string. It doesn't matter that its empty. It would be strange it if it didn't work like this actually. Consider code like this:
it would be pretty weird if those didn't output the same thing don't you think? |
I have to say I am equally unsure why this would be a useful feature. This appears to be a baked-in variable that contains an empty string. It's hard to explain to people how this variable differs from the simple The only real difference comes in terms of deep internals that most end-user programmers ought to not be touching - or really even have any awareness of. It's possible this value is useful in a few special-case situations like when you are writing unit tests that check your XS code correctly handles string values of both kinds of internal encoding, but aside from that special-purposes time I don't know when it would ever be useful to have. |
@leonerd yes, I was thinking this would mostly be useful for testing, especially perl itself. It is distinct from the normal empty string, its the unicode empty string. Anyway, if people really think its a problem having this then we dont have to have it, but i would have rewritten a bunch of our tests to use it if we had it, and reduced the complexity of the generated test code while doing so. (Which makes understanding and debugging what you broke easier.) Part of the problem is that many of these tests are fresh_perl type, so being brief is helpful, and at the same time "putting it in a module" or whatnot isnt really an attractive option. I mean, personally who cares if we have a variable like this? Its in reserved namespace, it is helpful to perl core devs (at least me to start, and i bet others over time), and it is as useful as utf8::upgrade() is, and we expose that. |
They do return the same thing under the model we (try to) educate people to use:
I'm not really suggesting we break compatibility with non-unicode_strings code by changing the behaviour of appending an upgraded empty string, but appending this value only has an effect on non-unicode_strings code, a model we tell our users to avoid. If tests are made easier with this type of value, they can add Or write a trivial function that returns the upgraded string. |
On Mon, Oct 31, 2022 at 05:18:39PM -0700, Tony Cook wrote:
If tests are made easier with this type of value, they can add `utf8::upgrade(my $upgrade = "");` at the top and use that value.
Or write a trivial function that returns the upgraded string.
Or just substr("\x{100}",0,0)
…--
A major Starfleet emergency breaks out near the Enterprise, but
fortunately some other ships in the area are able to deal with it to
everyone's satisfaction.
-- Things That Never Happen in "Star Trek" #13
|
I like this idea, but others dont. closing and deleting. |
This is a readonly global variable which contains an upgraded empty string (eg UTF8-on). When concatenated into a string it (like any other upgraded string) causes the resulting string to also be upgraded. Similar to calling utf8::upgrade() on the string, but suitable for use in double quoted strings or regex patterns or other places where adding a utf8::upgrade() call might be awkward.
Assuming this gets merged I can think of some test code that could be cleaned up by using this, and I definitely would use it in one-liners and what not as well.