gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459

demerphq · 2022-10-30T15:06:09Z

This is a readonly global variable which contains an upgraded empty string (eg UTF8-on). When concatenated into a string it (like any other upgraded string) causes the resulting string to also be upgraded. Similar to calling utf8::upgrade() on the string, but suitable for use in double quoted strings or regex patterns or other places where adding a utf8::upgrade() call might be awkward.

Assuming this gets merged I can think of some test code that could be cleaned up by using this, and I definitely would use it in one-liners and what not as well.

This is a readonly global variable which contains an upgraded empty string (eg UTF8-on). When concatenated into a string it (like any other upgraded string) causes the resulting string to also be upgraded. Similar to calling utf8::upgrade() on the string, but suitable for use in double quoted strings or regex patterns or other places where adding a utf8::upgrade() call might be awkward. Assuming this gets merged I can think of some test code that could be cleaned up by using this, and I definitely would use it in one-liners and what not as well.

tonycoz · 2022-10-31T04:02:45Z

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

(and why do we upgrade when appending a zero-length string?)

Grinnz · 2022-10-31T04:21:03Z

Anyone who sufficiently understands the Perl unicode model to correctly use this does not need it (thus it would only provide a footgun for people who shouldn't be using it), and Perl does not guarantee anything about how concatenation affects the internal state of a string.

Grinnz · 2022-10-31T13:21:29Z

To clarify my comment; forcing upgrades on strings is only useful for working around bugs, and though such bugs are common even within Perl core, I don't feel it would be prudent to present additional features suggesting that it is a reliable and coherent operation to those unfamiliar with the Unicode string model, which I would conservatively estimate is 99% of Perl users. And regarding this specific operation, it would be perfectly reasonable and likely that Perl would choose differently in different versions how the internal storage of a string should be affected by appending an empty string of any sort.

Leont · 2022-10-31T15:48:07Z

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

Yeah, that.

demerphq · 2022-10-31T17:59:01Z

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

I don't really follow. This is just the same as using a \N{U+...} escape in a string, except it doesn't inject anything into the string. Any user could do the same thing by saying my $UTF8= ""; utf8::upgrade($UTF8); under every version of perl that supports Unicode. So I dont get it. Why would it be a problem to have a var pre-available like this given that anyone can construct one anytime? I imagine its use likely to be oneliners and maybe fresh_perl() tests, if that.

(and why do we upgrade when appending a zero-length string?)

We upgrade whenever we append a UTF8-on string with a non-UT8-string. It doesn't matter that its empty.

It would be strange it if it didn't work like this actually. Consider code like this:

my $str = "\x{100}";  
my $alpha= "\xDF" . $str;
chop($str);
chop($alpha);
my $bravo= "\xDF" . $str;
print "ss"=~/$alpha/i ? "yes" : "no";
print "ss"=~/$bravo/i ? "yes" : "no";

it would be pretty weird if those didn't output the same thing don't you think?

leonerd · 2022-10-31T19:26:05Z

I have to say I am equally unsure why this would be a useful feature. This appears to be a baked-in variable that contains an empty string. It's hard to explain to people how this variable differs from the simple "".

The only real difference comes in terms of deep internals that most end-user programmers ought to not be touching - or really even have any awareness of. It's possible this value is useful in a few special-case situations like when you are writing unit tests that check your XS code correctly handles string values of both kinds of internal encoding, but aside from that special-purposes time I don't know when it would ever be useful to have.

demerphq · 2022-10-31T19:46:38Z

@leonerd yes, I was thinking this would mostly be useful for testing, especially perl itself. It is distinct from the normal empty string, its the unicode empty string. Anyway, if people really think its a problem having this then we dont have to have it, but i would have rewritten a bunch of our tests to use it if we had it, and reduced the complexity of the generated test code while doing so. (Which makes understanding and debugging what you broke easier.) Part of the problem is that many of these tests are fresh_perl type, so being brief is helpful, and at the same time "putting it in a module" or whatnot isnt really an attractive option. I mean, personally who cares if we have a variable like this? Its in reserved namespace, it is helpful to perl core devs (at least me to start, and i bet others over time), and it is as useful as utf8::upgrade() is, and we expose that.

tonycoz · 2022-11-01T00:18:22Z

It seems like it would encourage the pre-5.12/unicode_strings model of Perl Unicode.

I don't really follow. This is just the same as using a \N{U+...} escape in a string, except it doesn't inject anything into the string. Any user could do the same thing by saying my $UTF8= ""; utf8::upgrade($UTF8); under every version of perl that supports Unicode. So I dont get it. Why would it be a problem to have a var pre-available like this given that anyone can construct one anytime? I imagine its use likely to be oneliners and maybe fresh_perl() tests, if that.

(and why do we upgrade when appending a zero-length string?)

We upgrade whenever we append a UTF8-on string with a non-UT8-string. It doesn't matter that its empty.

It would be strange it if it didn't work like this actually. Consider code like this:
my $str = "\x{100}";  
my $alpha= "\xDF" . $str;
chop($str);
chop($alpha);
my $bravo= "\xDF" . $str;
print "ss"=~/$alpha/i ? "yes" : "no";
print "ss"=~/$bravo/i ? "yes" : "no";
it would be pretty weird if those didn't output the same thing don't you think?

They do return the same thing under the model we (try to) educate people to use:

tony@venus:.../git/perl5$ cat ../20459.pl
use v5.12;
my $str = "\x{100}";
my $alpha= "\xDF" . $str;
#chop($str);
$str = "";  # no longer UTF8 on
chop($alpha);
my $bravo= "\xDF" . $str;
print "ss"=~/$alpha/i ? "yes" : "no";
print "ss"=~/$bravo/i ? "yes" : "no";
tony@venus:.../git/perl5$ ./perl -Ilib -l ../20459.pl 
yes
yes

I'm not really suggesting we break compatibility with non-unicode_strings code by changing the behaviour of appending an upgraded empty string, but appending this value only has an effect on non-unicode_strings code, a model we tell our users to avoid.

If tests are made easier with this type of value, they can add utf8::upgrade(my $upgrade = ""); at the top and use that value.

Or write a trivial function that returns the upgraded string.

iabyn · 2022-11-07T13:42:14Z

On Mon, Oct 31, 2022 at 05:18:39PM -0700, Tony Cook wrote: If tests are made easier with this type of value, they can add `utf8::upgrade(my $upgrade = "");` at the top and use that value. Or write a trivial function that returns the upgraded string.

Or just substr("\x{100}",0,0)

…

-- A major Starfleet emergency breaks out near the Enterprise, but fortunately some other ships in the area are able to deal with it to everyone's satisfaction. -- Things That Never Happen in "Star Trek" #13

demerphq · 2023-02-07T13:49:12Z

I like this idea, but others dont. closing and deleting.

github-actions bot added hasConflicts and removed hasConflicts labels Nov 20, 2022

github-actions bot removed the hasConflicts label Dec 20, 2022

github-actions bot added hasConflicts and removed hasConflicts labels Jan 19, 2023

demerphq closed this Feb 7, 2023

demerphq deleted the yves/force_upgrade_var branch February 7, 2023 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459

gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459

Uh oh!

demerphq commented Oct 30, 2022

Uh oh!

tonycoz commented Oct 31, 2022

Uh oh!

Grinnz commented Oct 31, 2022 •

edited

Loading

Uh oh!

Grinnz commented Oct 31, 2022

Uh oh!

Leont commented Oct 31, 2022

Uh oh!

demerphq commented Oct 31, 2022

Uh oh!

leonerd commented Oct 31, 2022

Uh oh!

demerphq commented Oct 31, 2022

Uh oh!

tonycoz commented Nov 1, 2022

Uh oh!

iabyn commented Nov 7, 2022 via email

Uh oh!

demerphq commented Feb 7, 2023

Uh oh!

Uh oh!

gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459

gv.c - add ${^FORCE_UPGRADE} for triggering upgrade by concatenation #20459

Uh oh!

Conversation

demerphq commented Oct 30, 2022

Uh oh!

tonycoz commented Oct 31, 2022

Uh oh!

Grinnz commented Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Grinnz commented Oct 31, 2022

Uh oh!

Leont commented Oct 31, 2022

Uh oh!

demerphq commented Oct 31, 2022

Uh oh!

leonerd commented Oct 31, 2022

Uh oh!

demerphq commented Oct 31, 2022

Uh oh!

tonycoz commented Nov 1, 2022

Uh oh!

iabyn commented Nov 7, 2022 via email

Uh oh!

demerphq commented Feb 7, 2023

Uh oh!

Uh oh!

Grinnz commented Oct 31, 2022 •

edited

Loading