Skip to content

gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

demerphq
Copy link
Collaborator

This is a readonly global variable which contains an upgraded empty string (eg UTF8-on). When concatenated into a string it (like any other upgraded string) causes the resulting string to also be upgraded. Similar to calling utf8::upgrade() on the string, but suitable for use in double quoted strings or regex patterns or other places where adding a utf8::upgrade() call might be awkward.

Assuming this gets merged I can think of some test code that could be cleaned up by using this, and I definitely would use it in one-liners and what not as well.

This is a readonly global variable which contains an upgraded empty
string (eg UTF8-on). When concatenated into a string it (like any other
upgraded string) causes the resulting string to also be upgraded.
Similar to calling utf8::upgrade() on the string, but suitable for use
in double quoted strings or regex patterns or other places where adding
a utf8::upgrade() call might be awkward.

Assuming this gets merged I can think of some test code that could
be cleaned up by using this, and I definitely would use it in one-liners
and what not as well.
@tonycoz
Copy link
Contributor

tonycoz commented Oct 31, 2022

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

(and why do we upgrade when appending a zero-length string?)

@Grinnz
Copy link
Contributor

Grinnz commented Oct 31, 2022

Anyone who sufficiently understands the Perl unicode model to correctly use this does not need it (thus it would only provide a footgun for people who shouldn't be using it), and Perl does not guarantee anything about how concatenation affects the internal state of a string.

@Grinnz
Copy link
Contributor

Grinnz commented Oct 31, 2022

To clarify my comment; forcing upgrades on strings is only useful for working around bugs, and though such bugs are common even within Perl core, I don't feel it would be prudent to present additional features suggesting that it is a reliable and coherent operation to those unfamiliar with the Unicode string model, which I would conservatively estimate is 99% of Perl users. And regarding this specific operation, it would be perfectly reasonable and likely that Perl would choose differently in different versions how the internal storage of a string should be affected by appending an empty string of any sort.

@Leont
Copy link
Contributor

Leont commented Oct 31, 2022

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

Yeah, that.

@demerphq
Copy link
Collaborator Author

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

I don't really follow. This is just the same as using a \N{U+...} escape in a string, except it doesn't inject anything into the string. Any user could do the same thing by saying my $UTF8= ""; utf8::upgrade($UTF8); under every version of perl that supports Unicode. So I dont get it. Why would it be a problem to have a var pre-available like this given that anyone can construct one anytime? I imagine its use likely to be oneliners and maybe fresh_perl() tests, if that.

(and why do we upgrade when appending a zero-length string?)

We upgrade whenever we append a UTF8-on string with a non-UT8-string. It doesn't matter that its empty.

It would be strange it if it didn't work like this actually. Consider code like this:

my $str = "\x{100}";  
my $alpha= "\xDF" . $str;
chop($str);
chop($alpha);
my $bravo= "\xDF" . $str;
print "ss"=~/$alpha/i ? "yes" : "no";
print "ss"=~/$bravo/i ? "yes" : "no";

it would be pretty weird if those didn't output the same thing don't you think?

@leonerd
Copy link
Contributor

leonerd commented Oct 31, 2022

I have to say I am equally unsure why this would be a useful feature. This appears to be a baked-in variable that contains an empty string. It's hard to explain to people how this variable differs from the simple "".

The only real difference comes in terms of deep internals that most end-user programmers ought to not be touching - or really even have any awareness of. It's possible this value is useful in a few special-case situations like when you are writing unit tests that check your XS code correctly handles string values of both kinds of internal encoding, but aside from that special-purposes time I don't know when it would ever be useful to have.

@demerphq
Copy link
Collaborator Author

@leonerd yes, I was thinking this would mostly be useful for testing, especially perl itself. It is distinct from the normal empty string, its the unicode empty string. Anyway, if people really think its a problem having this then we dont have to have it, but i would have rewritten a bunch of our tests to use it if we had it, and reduced the complexity of the generated test code while doing so. (Which makes understanding and debugging what you broke easier.) Part of the problem is that many of these tests are fresh_perl type, so being brief is helpful, and at the same time "putting it in a module" or whatnot isnt really an attractive option. I mean, personally who cares if we have a variable like this? Its in reserved namespace, it is helpful to perl core devs (at least me to start, and i bet others over time), and it is as useful as utf8::upgrade() is, and we expose that.

@tonycoz
Copy link
Contributor

tonycoz commented Nov 1, 2022

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

I don't really follow. This is just the same as using a \N{U+...} escape in a string, except it doesn't inject anything into the string. Any user could do the same thing by saying my $UTF8= ""; utf8::upgrade($UTF8); under every version of perl that supports Unicode. So I dont get it. Why would it be a problem to have a var pre-available like this given that anyone can construct one anytime? I imagine its use likely to be oneliners and maybe fresh_perl() tests, if that.

(and why do we upgrade when appending a zero-length string?)

We upgrade whenever we append a UTF8-on string with a non-UT8-string. It doesn't matter that its empty.

It would be strange it if it didn't work like this actually. Consider code like this:

my $str = "\x{100}";  
my $alpha= "\xDF" . $str;
chop($str);
chop($alpha);
my $bravo= "\xDF" . $str;
print "ss"=~/$alpha/i ? "yes" : "no";
print "ss"=~/$bravo/i ? "yes" : "no";

it would be pretty weird if those didn't output the same thing don't you think?

They do return the same thing under the model we (try to) educate people to use:

tony@venus:.../git/perl5$ cat ../20459.pl
use v5.12;
my $str = "\x{100}";
my $alpha= "\xDF" . $str;
#chop($str);
$str = "";  # no longer UTF8 on
chop($alpha);
my $bravo= "\xDF" . $str;
print "ss"=~/$alpha/i ? "yes" : "no";
print "ss"=~/$bravo/i ? "yes" : "no";
tony@venus:.../git/perl5$ ./perl -Ilib -l ../20459.pl 
yes
yes

I'm not really suggesting we break compatibility with non-unicode_strings code by changing the behaviour of appending an upgraded empty string, but appending this value only has an effect on non-unicode_strings code, a model we tell our users to avoid.

If tests are made easier with this type of value, they can add utf8::upgrade(my $upgrade = ""); at the top and use that value.

Or write a trivial function that returns the upgraded string.

@iabyn
Copy link
Contributor

iabyn commented Nov 7, 2022 via email

@demerphq
Copy link
Collaborator Author

demerphq commented Feb 7, 2023

I like this idea, but others dont. closing and deleting.

@demerphq demerphq closed this Feb 7, 2023
@demerphq demerphq deleted the yves/force_upgrade_var branch February 7, 2023 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants