Description
Preamble:
The goal of this ticket is an attempt to document the current behavior of the various case modifiers and how they interact with each other.
Various people have already looked at it and a lot have been written about it.
This ticket tries to summarize that and put all the information in one place.
The contents of this ticket is based on:
- Interaction of \U/\L and \u/\l escapes are undocumented #5467
- \U ... \Q ... \E ... \E #8846
- Bug report: error with \L, \l, \U and \u operators #11145
- \Q buggy, eg /x on \Q#foo\E doesn't match '#foo', # becomes special #13257
- # parsed incorrectly in qr/\Q...\E/x #18981
- \U and \L in interpolated strings don't actually stack #19670
- https://www.nntp.perl.org/group/perl.perl5.porters/2012/01/msg181429.html : "\Questions about the \Future of \Escapes"
- https://www.nntp.perl.org/group/perl.perl5.porters/2011/11/msg179078.html : "What's the difference between qr/\U\x{39}/ and qr/\U\x{3a}/ ?"
- https://www.nntp.perl.org/group/perl.perl5.porters/2013/08/msg206466.html : "changes to \Q \F \Etc and "casemod escapes"
- https://www.nntp.perl.org/group/perl.perl5.porters/2022/07/msg264466.html : "Escape sequences \L, \U fall when we use them together"
Description
Reading the various tickets shows there are (at least) two cases:
- using the case-modifiers inside a double-quoted string (
qq
) - using the case-modifiers inside a regex and/or inside
qr
In this ticket I'm focusing only on double-quoted strings and ignoring the case modifiers inside a regex.
Case modifiers can be divided into three groups: (names borrowed from an older message of @demerphq )
- "non-case" modifiers: \Q (quotemeta())
- "inner" case modifiers: \U (uc()), \L (lc()), \F (fc())
- "outer" case modifiers: \u (ucfirst()), \l (lcfirst())
A test script to show/test/document the behavior is included at the end.
(Note: this also includes 'crazy cases' and cases that don't make a lot of sense.)
An attempt at a text based description of the various "rules":
-
escapes sequences (
\n
,\t
,\x..
,\N{...}
, ....) are applied before case modifiers -
an "inner" case modifier overrules an "outer" case modifiers but it was first applied, it also has two special cases:
\U\l
is treated as\l\U
(i.e.lcfirst(uc(...))
;\L\u
is treated as\u\L
(i.e.ucfirst(lc(...))
;\F\l
and\F\u
are not special cased and are treated asfc(lcfirst(...))
andfc(ucfirst(...))
;\Lfoo\ubar
really is treated as:lc("foo" . ucfirst("bar"))
;1
-
\E
following a inner or outer case modifiers cancels it, but with a special case due to special casing in rule 1: -
an "inner" case modifier implicitly ends another "inner" case modifier (i.e. no stacking) when it's not a 'cancelled modifier' (see rule 2)
\Ufoo\Lbar
is treated asuc("foo") . lc("bar")
\Ufoo\L\Ebar
is treated asuc("foo" . "bar")
\Ufoo\L\u\Ebar
due to the special casing is equivalent to:\Ufoo\u\L\Ebar
which makes it equivalent to:\Ufoo\ubar
-
quotemeta modifier can stack:
\Q\Q.\E\E
is treated asquotemeta(quotemeta("."))
-
an "inner" case modifier ends the quotemeta modifier when another "inner" case modifier was applied:
\Ua\Qb\Lc
is equivalent to:uc("a" . quotemeta("b")) . lc("c")
-
an "inner" case modifier does not end the quotemeta modifier when another "inner" case modifier wasn't applied:
a\Qb\Lc
is equivalent to:"a" . quotemeta("b" . lc("c"))
-
'Immediately' repeating an inner case modifier is an error, unless it was a 'cancelled modifier' (see rule 2):
\U\L
is an error;\U\U
is an error;\U\Q\U
is an error;\U\u\U
is an error;\U\l\U
is an error with a confusing message;2\U\L\E
is not an error;
-
Repeating an outer case modifier is not an error:
\u\l\u\l
is not an error
Test script (click to view)
#!/usr/bin/perl -l
use strict;
use warnings;
use feature "fc";
use Test::More;
# Definitions:
# - "non-case" modifiers: \Q (quotemeta())
# - "inner" case modifiers: \U (uc()), \L (lc()), \F (fc())
# - "outer" case modifiers: \u (ucfirst()), \l (lcfirst())
# For the `fc()` tests use a characters where:
# - `lc($x) ne fc($x)` and
# - 'lc($x) ne $x'
my $fc_char = "\N{GREEK CAPITAL LETTER ALPHA WITH PSILI AND VARIA AND PROSGEGRAMMENI}";
isnt($fc_char, lc($fc_char), "Test \$fc_char ne lc(\$fc_char)");
isnt(lc($fc_char), fc($fc_char), "Test lc(\$fc_char) ne fc(\$fc_char)");
# Basic tests: no stacking, no mixing;
is("aa\UbB", "aa" . uc("bB"), "Basic test (..\\U..)");
is("aa\LbB", "aa" . lc("bB"), "Basic test (..\\L..)");
is("aa\FbB$fc_char", "aa" . fc("bB$fc_char"), "Basic test (..\\F..)");
is("aa\ubB", "aa" . ucfirst("bB"), "Basic test (..\\u..)");
is("aa\lCc", "aa" . lcfirst("Cc"), "Basic test (..\\l..)");
is("aa\Q1+2", "aa" . quotemeta("1+2"), "Basic test (..\\Q..)");
# Basic tests: with \E
is("aa\UbB\EcC", "aa" . uc("bB") . "cC", "Basic test with \\E (..\\U..\\E..)");
is("aa\LbB\EcC", "aa" . lc("bB") . "cC", "Basic test with \\E (..\\L..\\E..)");
is("aa\FbB$fc_char\EcC", "aa" . fc("bB$fc_char") . "cC", "Basic test with \\E (..\\F..\\E..)");
is("aa\Q1+2\E3+4", "aa" . quotemeta("1+2") . "3+4", "Basic test with \\E (..\\Q..\\E..)");
# \E cancels a \u, \l, \U, \L, \F
is("aa\UbB\U\EcC\EdD", "aa" . uc("bB" . "cC") . "dD", "\\E cancel a \\U (..\\U..\\U\\E..\\E");
is("aa\UbB\L\EcC\EdD", "aa" . uc("bB" . "cC") . "dD", "\\E cancel a \\L (..\\U..\\L\\E..\\E");
is("aa\UbB\F\EcC\EdD", "aa" . uc("bB" . "cC") . "dD", "\\E cancel a \\F (..\\U..\\F\\E..\\E");
is("aa\UbB\u\EcC\EdD", "aa" . uc("bB" . "cC") . "dD", "\\E cancel a \\u (..\\U..\\u\\E..\\E");
is("aa\UbB\l\EcC\EdD", "aa" . uc("bB" . "cC") . "dD", "\\E cancel a \\l (..\\U..\\l\\E..\\E");
is("aa\LbB\U\EcC\EdD", "aa" . lc("bB" . "cC") . "dD", "\\E cancel a \\U (..\\L..\\U\\E..\\E");
is("aa\LbB\L\EcC\EdD", "aa" . lc("bB" . "cC") . "dD", "\\E cancel a \\L (..\\L..\\L\\E..\\E");
is("aa\LbB\F\EcC\EdD", "aa" . lc("bB" . "cC") . "dD", "\\E cancel a \\F (..\\L..\\F\\E..\\E");
is("aa\LbB\u\EcC\EdD", "aa" . lc("bB" . "cC") . "dD", "\\E cancel a \\u (..\\L..\\u\\E..\\E");
is("aa\LbB\l\EcC\EdD", "aa" . lc("bB" . "cC") . "dD", "\\E cancel a \\l (..\\L..\\l\\E..\\E");
is("aa\FbB\U\EcC\EdD", "aa" . fc("bB" . "cC") . "dD", "\\E cancel a \\U (..\\F..\\U\\E..\\E");
is("aa\FbB\L\EcC\EdD", "aa" . fc("bB" . "cC") . "dD", "\\E cancel a \\L (..\\F..\\L\\E..\\E");
is("aa\FbB\F\EcC\EdD", "aa" . fc("bB" . "cC") . "dD", "\\E cancel a \\F (..\\F..\\F\\E..\\E");
is("aa\FbB\u\EcC\EdD", "aa" . fc("bB" . "cC") . "dD", "\\E cancel a \\u (..\\F..\\u\\E..\\E");
is("aa\FbB\l\EcC\EdD", "aa" . fc("bB" . "cC") . "dD", "\\E cancel a \\l (..\\F..\\l\\E..\\E");
# Immediately repeating case modifiers is an error
eval q#"\U\U"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\U\\U)");
eval q#"\U\L"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\U\\L)");
eval q#"\U\F"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\U\\F)");
eval q#"\L\U"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\L\\U)");
eval q#"\L\L"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\L\\L)");
eval q#"\L\F"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\L\\F)");
eval q#"\F\U"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\F\\U)");
eval q#"\F\L"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\F\\L)");
eval q#"\F\F"#; isnt($@, "", "immediately repeating inner modifiers is an error (\\F\\F)");
# repeating an inner modifier after an outer modifier is an error
eval q#"\U\u\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\U)");
eval q#"\U\l\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\U)"); # special
eval q#"\U\Q\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\U)");
eval q#"\U\u\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\L)");
eval q#"\U\l\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\L)"); # special
eval q#"\U\Q\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\L)");
eval q#"\U\u\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\F)");
eval q#"\U\l\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\F)"); # special
eval q#"\U\Q\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\F)");
eval q#"\L\u\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\U)"); # special
eval q#"\L\l\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\U)");
eval q#"\L\Q\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\U)");
eval q#"\L\u\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\L)"); # special
eval q#"\L\l\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\L)");
eval q#"\L\Q\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\L)");
eval q#"\L\u\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\F)"); # special
eval q#"\L\l\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\F)");
eval q#"\L\Q\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\F)");
eval q#"\F\u\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\U)");
eval q#"\F\l\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\U)");
eval q#"\F\Q\U"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\U)");
eval q#"\F\u\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\L)");
eval q#"\F\l\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\L)");
eval q#"\F\Q\L"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\L)");
eval q#"\F\u\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\u\\F)");
eval q#"\F\l\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\l\\F)");
eval q#"\F\Q\F"#; isnt($@, "", "repeating inner modifiers is an error (\\U\\Q\\F)");
# For the special cases the error message is a bit misleading:
# - the parsers changes "\U\l" into "\l\U" so the pattern becomes "\l\U\U" and it shows
# that in the error instead of the original pattern ("\U\l\U")
# - the parsers changes "\L\u" into "\u\L" so the pattern becomes "\u\L\L" and it shows
# that in the error instead of the original pattern ("\u\L\L")
eval q#"\U\l\U"#; like($@, qr/\\l\\U\\U/, "check (misleading) error message (\\U\\l\\U)");
eval q#"\U\l\L"#; like($@, qr/\\l\\U\\L/, "check (misleading) error message (\\U\\l\\L)");
eval q#"\U\l\F"#; like($@, qr/\\l\\U\\F/, "check (misleading) error message (\\U\\l\\F)");
eval q#"\L\u\U"#; like($@, qr/\\u\\L\\U/, "check (misleading) error message (\\L\\u\\U)");
eval q#"\L\u\L"#; like($@, qr/\\u\\L\\L/, "check (misleading) error message (\\L\\u\\L)");
eval q#"\L\u\F"#; like($@, qr/\\u\\L\\F/, "check (misleading) error message (\\L\\u\\F)");
# A cancelled repeating case modifier is not an error
eval q#"\U\U\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\U\\U\\E)");
eval q#"\U\L\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\U\\L\\E)");
eval q#"\U\F\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\U\\F\\E)");
eval q#"\L\U\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\L\\U\\E)");
eval q#"\L\L\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\L\\L\\E)");
eval q#"\L\F\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\L\\F\\E)");
eval q#"\F\U\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\F\\U\\E)");
eval q#"\F\L\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\F\\L\\E)");
eval q#"\F\F\E"#; is($@, "", "cancelled repeated inner modifiers not an error (\\F\\F\\E)");
# Cancelling an outer modifier resulting in an repeated inner modifier is an error but with exceptions
eval q#"\U\u\E\U"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\U\\u\\E\\U)");
eval q#"\U\u\E\L"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\U\\u\\E\\L)");
eval q#"\U\u\E\F"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\U\\u\\E\\F)");
eval q#"\L\l\E\U"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\L\\l\\E\\U)");
eval q#"\L\l\E\L"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\L\\l\\E\\L)");
eval q#"\L\l\E\F"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\L\\l\\E\\F)");
eval q#"\F\u\E\U"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\F\\u\\E\\U)");
eval q#"\F\u\E\L"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\F\\u\\E\\L)");
eval q#"\F\u\E\F"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\F\\u\\E\\F)");
eval q#"\F\l\E\U"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\F\\l\\E\\U)");
eval q#"\F\l\E\L"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\F\\l\\E\\L)");
eval q#"\F\l\E\F"#; isnt($@, "", "cancelled outer modifier resulting in repeated in inner modifier is an error (\\F\\l\\E\\F)");
# Exceptions:
# - parser turns '\U\l' into '\l\U' so the pattern '\U\l\E\U' becomes: '\l\U\E\U` which makes the final pattern '\l\U' which is
# not an error.
# - parser turns '\L\u' into '\u\L' so the pattern '\L\u\E\L' becomes: '\u\L\E\L` which makes the final pattern '\u\L' which is
# not an error.
eval q#"\U\l\E\U"#; is($@, "", "specical case: cancelled outer modifier not an error (\\U\\l\\E\\U)");
eval q#"\U\l\E\L"#; is($@, "", "specical case: cancelled outer modifier not an error (\\U\\l\\E\\L)");
eval q#"\U\l\E\F"#; is($@, "", "specical case: cancelled outer modifier not an error (\\U\\l\\E\\F)");
eval q#"\L\u\E\U"#; is($@, "", "specical case: cancelled outer modifier not an error (\\L\\u\\E\\U)");
eval q#"\L\u\E\L"#; is($@, "", "specical case: cancelled outer modifier not an error (\\L\\u\\E\\L)");
eval q#"\L\u\E\F"#; is($@, "", "specical case: cancelled outer modifier not an error (\\L\\u\\E\\F)");
is("aa\U\l\E\UbBcC", "aa" . lcfirst(uc("bBcC")), "special case: cancelled outer modifier (\\U\\l\\E\\U");
is("aa\U\l\E\LbBcC", "aa" . lcfirst(lc("bBcC")), "special case: cancelled outer modifier (\\U\\l\\E\\L");
is("aa\U\l\E\FbBcC", "aa" . lcfirst(fc("bBcC")), "special case: cancelled outer modifier (\\U\\l\\E\\F");
is("aa\L\u\E\UbBcC", "aa" . ucfirst(uc("bBcC")), "special case: cancelled outer modifier (\\L\\u\\E\\U");
is("aa\L\u\E\LbBcC", "aa" . ucfirst(lc("bBcC")), "special case: cancelled outer modifier (\\L\\u\\E\\L");
is("aa\L\u\E\FbBcC", "aa" . ucfirst(fc("bBcC")), "special case: cancelled outer modifier (\\L\\u\\E\\F");
# esccape sequences takes predence over \U, \L, \F (and \u, \l but can't think of a way to test those)
# \t
is("aa\UbB\tcC", "aa" . uc("bB\tcC"), "\\t applied before case modifiers (..\\U..\\t..");
is("aa\LbB\tcC", "aa" . lc("bB\tcC"), "\\t applied before case modifiers (..\\L..\\t..");
is("aa\FbB\tcC", "aa" . fc("bB\tcC"), "\\t applied before case modifiers (..\\F..\\t..");
# \n
is("aa\UbB\ncC", "aa" . uc("bB\ncC"), "\\n applied before case modifiers (..\\U..\\n..");
is("aa\LbB\ncC", "aa" . lc("bB\ncC"), "\\n applied before case modifiers (..\\L..\\n..");
is("aa\FbB\ncC", "aa" . fc("bB\ncC"), "\\n applied before case modifiers (..\\F..\\n..");
# \r
is("aa\UbB\rcC", "aa" . uc("bB\rcC"), "\\r applied before case modifiers (..\\U..\\r..");
is("aa\LbB\rcC", "aa" . lc("bB\rcC"), "\\r applied before case modifiers (..\\L..\\r..");
is("aa\FbB\rcC", "aa" . fc("bB\rcC"), "\\r applied before case modifiers (..\\F..\\r..");
# \f
is("aa\UbB\fcC", "aa" . uc("bB\fcC"), "\\f applied before case modifiers (..\\U..\\f..");
is("aa\LbB\fcC", "aa" . lc("bB\fcC"), "\\f applied before case modifiers (..\\L..\\f..");
is("aa\FbB\fcC", "aa" . fc("bB\fcC"), "\\f applied before case modifiers (..\\F..\\f..");
# \b
is("aa\UbB\bcC", "aa" . uc("bB\bcC"), "\\b applied before case modifiers (..\\U..\\b..");
is("aa\LbB\bcC", "aa" . lc("bB\bcC"), "\\b applied before case modifiers (..\\L..\\b..");
is("aa\FbB\bcC", "aa" . fc("bB\bcC"), "\\b applied before case modifiers (..\\F..\\b..");
# \a
is("aa\UbB\acC", "aa" . uc("bB\acC"), "\\a applied before case modifiers (..\\U..\\a..");
is("aa\LbB\acC", "aa" . lc("bB\acC"), "\\a applied before case modifiers (..\\L..\\a..");
is("aa\FbB\acC", "aa" . fc("bB\acC"), "\\a applied before case modifiers (..\\F..\\a..");
# \e
is("aa\UbB\ecC", "aa" . uc("bB\ecC"), "\\e applied before case modifiers (..\\U..\\e..");
is("aa\LbB\ecC", "aa" . lc("bB\ecC"), "\\e applied before case modifiers (..\\L..\\e..");
is("aa\FbB\ecC", "aa" . fc("bB\ecC"), "\\e applied before case modifiers (..\\F..\\e..");
# \x{..}
is("aa\UbB\x{61}cC", "aa" . uc("bB\x{61}cC"), "\\x{..} apllied before case modifiers (..\\U..\\x{..}..)");
is("aa\LbB\x{41}cC", "aa" . lc("bB\x{41}cC"), "\\x{..} apllied before case modifiers (..\\L..\\x{..}..)");
is("aa\FbB\x{41}cC", "aa" . fc("bB\x{41}cC"), "\\x{..} apllied before case modifiers (..\\F..\\x{..}..)");
# \x..
is("aa\UbB\x61cC", "aa" . uc("bB\x61cC"), "\\x.. apllied before case modifiers (..\\U..\\x61..)");
is("aa\LbB\x41cC", "aa" . lc("bB\x41cC"), "\\x.. apllied before case modifiers (..\\L..\\x41..)");
is("aa\FbB\x41cC", "aa" . fc("bB\x41cC"), "\\x.. apllied before case modifiers (..\\F..\\x41..)");
# \N{..}
is("aa\UbB\N{LATIN SMALL LETTER A}cC", "aa" . uc("bB\N{LATIN SMALL LETTER A}cC"), "\\N{..} apllied before case modifiers (..\\U..\\N{..}..)");
is("aa\LbB\N{LATIN CAPITAL LETTER A}cC", "aa" . lc("bB\N{LATIN CAPITAL LETTER A}cC"), "\\N{..} apllied before case modifiers (..\\L..\\N{..}..)");
is("aa\FbB\N{LATIN CAPITAL LETTER A}cC", "aa" . fc("bB\N{LATIN CAPITAL LETTER A}cC"), "\\N{..} apllied before case modifiers (..\\F..\\N{..}..)");
# \N{..}
is("aa\UbB\N{U+0061}cC", "aa" . uc("bB\N{U+0061}cC"), "\\N{U+....} apllied before case modifiers (..\\U..\\N{U+....}..)");
is("aa\LbB\N{U+0041}cC", "aa" . lc("bB\N{U+0041}cC"), "\\N{U+....} apllied before case modifiers (..\\L..\\N{U+....}..)");
is("aa\FbB\N{U+0041}cC", "aa" . fc("bB\N{U+0041}cC"), "\\N{U+....} apllied before case modifiers (..\\F..\\N{U+....}..)");
# \c
is("aa\UbB\cbcC", "aa" . uc("bB\cbcC"), "\\c. apllied before case modifiers (..\\U..\\c...)");
is("aa\LbB\cbcC", "aa" . lc("bB\cbcC"), "\\c. apllied before case modifiers (..\\L..\\c...)");
is("aa\FbB\cbcC", "aa" . fc("bB\cbcC"), "\\c. apllied before case modifiers (..\\F..\\c...)");
is("aa\UbB\cBcC", "aa" . uc("bB\cBcC"), "\\c. apllied before case modifiers (..\\U..\\c...)");
is("aa\LbB\cBcC", "aa" . lc("bB\cBcC"), "\\c. apllied before case modifiers (..\\L..\\c...)");
is("aa\FbB\cBcC", "aa" . fc("bB\cBcC"), "\\c. apllied before case modifiers (..\\F..\\c...)");
# \o{.....}
is("aa\UbB\o{23072}cC", "aa" . uc("bB\o{23072}cC"), "\\o{...} apllied before case modifiers (..\\U..\\o{...}..)");
is("aa\LbB\o{23072}cC", "aa" . lc("bB\o{23072}cC"), "\\o{...} apllied before case modifiers (..\\L..\\o{...}..)");
is("aa\FbB\o{23072}cC", "aa" . fc("bB\o{23072}cC"), "\\o{...} apllied before case modifiers (..\\F..\\o{...}..)");
# \o...
is("aa\UbB\141cC", "aa" . uc("bB\141cC"), "\\... (octal) apllied before case modifiers (..\\U..\\.....)");
is("aa\LbB\101cC", "aa" . lc("bB\101cC"), "\\... (octal) apllied before case modifiers (..\\L..\\.....)");
is("aa\FbB\101cC", "aa" . fc("bB\101cC"), "\\... (octal) apllied before case modifiers (..\\F..\\.....)");
# "inner" case modifiers take precedence over "outer" case modifiers but
# with some caveats:
# - special case exist for "\U\l"
# - special case exist for "\L\u"
# - "outer" case modifier is applied before the "inner" case modifier
is("aa\UbB\lCc", "aa" . uc("bB" . lcfirst("Cc")), "inner modifier overrides outer modifier (..\\U..\\l..)");
is("aa\LbB\udD", "aa" . lc("bB" . ucfirst("dD")), "inner modifier overrides outer modifier (..\\L..\\u..)");
is("aa\FbB\l${fc_char}Cc", "aa" . fc("bB" . lcfirst("${fc_char}Cc")), "inner modifier overrides outer modifier (..\\F..\\l..)");
is("aa\FbB\u${fc_char}dD", "aa" . fc("bB" . ucfirst("${fc_char}dD")), "inner modifier overrides outer modifier (..\\F..\\u..)");
# special cases: \U\l and \L\u
is("aa\U\lbB", "aa" . lcfirst(uc("bB")), "inner modifier does not override outer modifier (..\\U\\l..)");
is("aa\L\ubB", "aa" . ucfirst(lc("bB")), "inner modifier does not override outer modifier (..\\L\\u..)");
# not-special: \F\l and \F\u
is("aa\F\l${fc_char}Cc", "aa" . fc(lcfirst("${fc_char}Cc")), "inner modifier overrides outer modifier (..\\F\\l..)");
is("aa\F\u${fc_char}bB", "aa" . fc(ucfirst("${fc_char}bB")), "inner modifier overrides outer modifier (..\\F\\u..)");
# To test that the "outer" case modifier was applied the 'LATIN SMALL LETTER DOTLESS I'
# can be used:
# "\N{LATIN SMALL LETTER DOTLESS I}" = "\x{0131}"
# lc("\N{LATIN SMALL LETTER DOTLESS I}") = "\x{0131}"
# fc("\N{LATIN SMALL LETTER DOTLESS I}") = "\x{0131}"
# uc("\N{LATIN SMALL LETTER DOTLESS I}") = "I" (== "\x{49}")
# lc(uc("\N{LATIN SMALL LETTER DOTLESS I}")) = "i" (== "\x{69}")
# fc(uc("\N{LATIN SMALL LETTER DOTLESS I}")) = "i" (== "\x{69}")
# In other words:
# lc("\N{LATIN SMALL LETTER DOTLESS I}") ne lc(uc("\N{LATIN SMALL LETTER DOTLESS I}"))
isnt(lc("\N{LATIN SMALL LETTER DOTLESS I}"), lc(uc("\N{LATIN SMALL LETTER DOTLESS I}")), "lc() not equal to lc(uc()) for 'LATIN SMALL LETTER DOTLESS I'");
isnt(fc("\N{LATIN SMALL LETTER DOTLESS I}"), fc(uc("\N{LATIN SMALL LETTER DOTLESS I}")), "fc() not equal to fc(uc()) for 'LATIN SMALL LETTER DOTLESS I'");
is("aa\LbB\u\N{LATIN SMALL LETTER DOTLESS I}", "aa" . lc("bB" . ucfirst("\N{LATIN SMALL LETTER DOTLESS I}")), "..\\L..\\u.. first converts to upercase");
is("aa\FbB\u\N{LATIN SMALL LETTER DOTLESS I}", "aa" . fc("bB" . ucfirst("\N{LATIN SMALL LETTER DOTLESS I}")), "..\\F..\\u.. first converts to upercase");
# There does not appear to be a character where `uc(lc($x)) ne uc($x))` :-(
# -> can't test that "\U..\l.." does a `lcfist()`
# "inner" case modifiers do not stack
is("aa\UbB\LcC", "aa" . uc("bB") . lc("cC"), "no stacking for inner case modifiers (..\\U..\\L..)");
is("aa\UbB\FcC", "aa" . uc("bB") . fc("cC"), "no stacking for inner case modifiers (..\\U..\\F..)");
is("aa\LbB\UcC", "aa" . lc("bB") . uc("cC"), "no stacking for inner case modifiers (..\\L..\\U..)");
is("aa\LbB\FcC", "aa" . lc("bB") . fc("cC"), "no stacking for inner case modifiers (..\\L..\\F..)");
is("aa\FbB\LcC", "aa" . fc("bB") . lc("cC"), "no stacking for inner case modifiers (..\\F..\\L..)");
is("aa\FbB\UcC", "aa" . fc("bB") . uc("cC"), "no stacking for inner case modifiers (..\\F..\\U..)");
is("aa\UbB\UcC\EdD", "aa" . uc("bB") . uc("cC") . "dD", "no stacking for inner case modifiers (..\\U..\\U..\\E..)");
is("aa\UbB\LcC\EdD", "aa" . uc("bB") . lc("cC") . "dD", "no stacking for inner case modifiers (..\\U..\\L..\\E..)");
is("aa\UbB\FcC\EdD", "aa" . uc("bB") . fc("cC") . "dD", "no stacking for inner case modifiers (..\\U..\\F..\\E..)");
is("aa\LbB\LcC\EdD", "aa" . lc("bB") . lc("cC") . "dD", "no stacking for inner case modifiers (..\\L..\\L..\\E..)");
is("aa\LbB\UcC\EdD", "aa" . lc("bB") . uc("cC") . "dD", "no stacking for inner case modifiers (..\\L..\\U..\\E..)");
is("aa\LbB\FcC\EdD", "aa" . lc("bB") . fc("cC") . "dD", "no stacking for inner case modifiers (..\\L..\\F..\\E..)");
is("aa\FbB\FcC\EdD", "aa" . fc("bB") . fc("cC") . "dD", "no stacking for inner case modifiers (..\\F..\\F..\\E..)");
is("aa\FbB\LcC\EdD", "aa" . fc("bB") . lc("cC") . "dD", "no stacking for inner case modifiers (..\\F..\\L..\\E..)");
is("aa\FbB\UcC\EdD", "aa" . fc("bB") . uc("cC") . "dD", "no stacking for inner case modifiers (..\\F..\\U..\\E..)");
# Quotemeta does stack
is("aa\Q1+2\Q3+4\E5+6\E7+8", "aa" . quotemeta("1+2" . quotemeta("3+4") . "5+6") . "7+8", "stacking quotemeta (..\\Q..\\Q..\\E..\\E..)");
# inner modifier does not end quotemeta
is("aa\Qb+b\Uc+C", "aa" . quotemeta("b+b" . uc("c+C")), "inner modifier not ending quotemeta (..\\Q..\\U..)");
is("aa\Qb+b\Lc+C", "aa" . quotemeta("b+b" . lc("c+C")), "inner modifier not ending quotemeta (..\\Q..\\L..)");
is("aa\Qb+b\Fc+C", "aa" . quotemeta("b+b" . fc("c+C")), "inner modifier not ending quotemeta (..\\Q..\\F..)");
is("aa\Qb+B\Uc+C\Ed+D\Ef+F", "aa" . quotemeta("b+B" . uc("c+C") . "d+D") . "f+F", "\\E ends inner modifier (..\\Q..\\U..\\E..\\E..)");
is("aa\Qb+B\Lc+C\Ed+D\Ef+F", "aa" . quotemeta("b+B" . lc("c+C") . "d+D") . "f+F", "\\E ends inner modifier (..\\Q..\\L..\\E..\\E..)");
is("aa\Qb+B\Fc+C\Ed+D\Ef+F", "aa" . quotemeta("b+B" . fc("C+C") . "d+D") . "f+F", "\\E ends inner modifier (..\\Q..\\E..\\E..\\E..)");
# quotemeta doesn't terminate inner case modifier
is("aa\Ub+B\Qc+C", "aa" . uc("b+B" . quotemeta("c+C")), "quotemeta doesn't terminate inner modifier (..\\U..\\Q..)");
is("aa\Lb+B\Qc+C", "aa" . lc("b+B" . quotemeta("c+C")), "quotemeta doesn't terminate inner modifier (..\\U..\\Q..)");
is("aa\Fb+B\Qc+C", "aa" . fc("b+B" . quotemeta("c+C")), "quotemeta doesn't terminate inner modifier (..\\U..\\Q..)");
is("aa\Ub+B\Qc+C\Ed+D", "aa" . uc("b+B" . quotemeta("c+C") . "d+D"), "\\E ends quotemeta (..\\U..\\Q..\\E..)");
is("aa\Lb+B\Qc+C\Ed+D", "aa" . lc("b+B" . quotemeta("c+C") . "d+D"), "\\E ends quotemeta (..\\L..\\Q..\\E..)");
is("aa\Fb+B\Qc+C\Ed+D", "aa" . fc("b+B" . quotemeta("c+C") . "d+D"), "\\E ends quotemeta (..\\F..\\Q..\\E..)");
# inner modifier ends quotemeta
is("aa\Ub+B\Qc+C\Ud+D", "aa" . uc("b+B" . quotemeta("c+C")) . uc("d+D"), "inner modifier ends quotemeta (..\\U..\\Q..\\U..)");
is("aa\Ub+B\Qc+C\Ld+D", "aa" . uc("b+B" . quotemeta("c+C")) . lc("d+D"), "inner modifier ends quotemeta (..\\U..\\Q..\\L..)");
is("aa\Ub+B\Qc+C\Fd+D", "aa" . uc("b+B" . quotemeta("c+C")) . fc("d+D"), "inner modifier ends quotemeta (..\\U..\\Q..\\F..)");
is("aa\Lb+B\Qc+C\Ud+D", "aa" . lc("b+B" . quotemeta("c+C")) . uc("d+D"), "inner modifier ends quotemeta (..\\L..\\Q..\\U..)");
is("aa\Lb+B\Qc+C\Ld+D", "aa" . lc("b+B" . quotemeta("c+C")) . lc("d+D"), "inner modifier ends quotemeta (..\\L..\\Q..\\L..)");
is("aa\Lb+B\Qc+C\Fd+D", "aa" . lc("b+B" . quotemeta("c+C")) . fc("d+D"), "inner modifier ends quotemeta (..\\L..\\Q..\\F..)");
is("aa\Fb+B\Qc+C\Ud+D", "aa" . fc("b+B" . quotemeta("c+C")) . uc("d+D"), "inner modifier ends quotemeta (..\\F..\\Q..\\U..)");
is("aa\Fb+B\Qc+C\Ld+D", "aa" . fc("b+B" . quotemeta("c+C")) . lc("d+D"), "inner modifier ends quotemeta (..\\F..\\Q..\\L..)");
is("aa\Fb+B\Qc+C\Fd+D", "aa" . fc("b+B" . quotemeta("c+C")) . fc("d+D"), "inner modifier ends quotemeta (..\\F..\\Q..\\F..)");
# empty var is not an error
my $foo = "";
is("aa\U$foo\UbB", "aa" . uc("bB"), "repeating modifier after empty var is not an error (..\\U\\$foo\\\U..)");
is("aa\U$foo\LbB", "aa" . lc("bB"), "repeating modifier after empty var is not an error (..\\U\\$foo\\\L..)");
is("aa\U$foo\FbB", "aa" . fc("bB"), "repeating modifier after empty var is not an error (..\\U\\$foo\\\F..)");
is("aa\L$foo\UbB", "aa" . uc("bB"), "repeating modifier after empty var is not an error (..\\L\\$foo\\\U..)");
is("aa\L$foo\LbB", "aa" . lc("bB"), "repeating modifier after empty var is not an error (..\\L\\$foo\\\L..)");
is("aa\L$foo\FbB", "aa" . fc("bB"), "repeating modifier after empty var is not an error (..\\L\\$foo\\\F..)");
is("aa\F$foo\UbB", "aa" . uc("bB"), "repeating modifier after empty var is not an error (..\\F\\$foo\\\U..)");
is("aa\F$foo\LbB", "aa" . lc("bB"), "repeating modifier after empty var is not an error (..\\F\\$foo\\\L..)");
is("aa\F$foo\FbB", "aa" . fc("bB"), "repeating modifier after empty var is not an error (..\\F\\$foo\\\F..)");
# outer case modifier immediately after inner case modifier not an error
is("aa\U\ubB", "aa" . uc("bB"), "outer modifier after inner modifier not an error (..\\U\\u..)");
is("aa\U\lbB", "aa" . lcfirst(uc("bB")), "outer modifier after inner modifier not an error (..\\U\\l..)"); # special, translated to \l\U
is("aa\L\uCc", "aa" . ucfirst(lc("Cc")), "outer modifier after inner modifier not an error (..\\L\\u..)"); # special, translated to \u\L
is("aa\L\lCc", "aa" . lc("Cc"), "outer modifier after inner modifier not an error (..\\L\\l..)");
is("aa\F\ubB", "aa" . fc("bB"), "outer modifier after inner modifier not an error (..\\F\\u..)");
is("aa\F\lbB", "aa" . fc("bB"), "outer modifier after inner modifier not an error (..\\F\\l..)");
done_testing();
Footnotes
-
This can be tested/seen when testing with the
LATIN SMALL LETTER DOTLESS I
character:
↩$ perl -wle ' my $x = "\N{LATIN SMALL LETTER DOTLESS I}"; print "\Lfoo\u$x" eq "\Lfoo$x" ? "True" : "False"; print "\Lfoo\u$x" eq lc("foo$x") ? "True" : "False"; print "\Lfoo\u$x" eq lc("foo" . ucfirst($x)) ? "True" : "False";' False False True
-
Without looking at the code: what happens first is that
\U\l
is replaced with\l\U
, so the string becomes:\l\U\E\Ubar
which causes it to cancel the first\U
and not the\l
.
This can also be seen in some error messages:$ perl -e '"\U\l\Ubar"' syntax error at -e line 1, near "\l\U\U"