Description
Preamble
When doing some testing for @khwilliamson it was observed that there is a segfault on OmniOS when a particular locale and particular character is used.
The purpose of this ticket is to describe the issue and the diagnostics steps.
Test environment
Running on OmniOS:
$ cat /etc/release
OmniOS v11 r151042
Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
Copyright (c) 2017-2022 OmniOS Community Edition (OmniOSce) Association.
All rights reserved. Use is subject to licence terms.
and on:
$ cat /etc/release
OmniOS v11 r151044
Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
Copyright (c) 2017-2022 OmniOS Community Edition (OmniOSce) Association.
All rights reserved. Use is subject to licence terms.
Perl version: blead (commit 4d21c02)
Configure args:
./Configure -des -Dusedevel -Duseithreads -DDEBUGGING -Doptimize='-g -O0'
Initial testing
Initial symptoms: on khw's testing branch the lib/locale_threads.t test was segfaulting.
The line that triggered the segfault:
add_trials('LC_COLLATE',
'quotemeta join "", sort reverse map { chr } (0..255)');
The short version: it runs that particular code for all installed locales.
Standalone segfault:
$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my $x = quotemeta join "", sort reverse map { chr } (0..255);'
Segmentation Fault
And also:
$ LD_LIBRARY_PATH=. LC_ALL='' LANG='' LC_COLLATE='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my $x = quotemeta join "", sort reverse map { chr } (0..255);'
Segmentation Fault
Minimal test
After some attempts the code was reduced to:
$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my @x = sort map { chr } (0..255);'
Segmentation Fault
Testing/reducing the character range revealed it only happened when chr 168 is included:
$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my @x = sort map { chr } (0..167, 169..255);print "OK"'
OK
And one final reduction:
$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; print chr(168) cmp "A";'
Segmentation Fault
Testing other locales
Checking the other installed locales:
Testing code
Code:
#!/usr/bin/perl -l
use strict;
use warnings;
# list copied from `locale -a`
my @locales = qw#
C
C.UTF-8
POSIX
af_ZA.UTF-8
ar_AE.UTF-8
ar_BH.UTF-8
ar_DZ.UTF-8
ar_EG.UTF-8
ar_IQ.UTF-8
ar_JO.UTF-8
ar_KW.UTF-8
ar_LB.UTF-8
ar_LY.UTF-8
ar_MA.UTF-8
ar_OM.UTF-8
ar_QA.UTF-8
ar_SA.UTF-8
ar_TN.UTF-8
ar_YE.UTF-8
as_IN.UTF-8
az_AZ.UTF-8
be_BY.UTF-8
bg_BG.ISO8859-5
bg_BG.UTF-8
bn_BD.UTF-8
bn_IN.UTF-8
bo_CN.UTF-8
bo_IN.UTF-8
bs_BA.UTF-8
ca_ES.ISO8859-15
ca_ES.UTF-8
cs_CZ.ISO8859-2
cs_CZ.UTF-8
da_DK.ISO8859-1
da_DK.ISO8859-15
da_DK.UTF-8
de_AT.ISO8859-15
de_AT.UTF-8
de_BE.UTF-8
de_CH.ISO8859-1
de_CH.UTF-8
de_DE.ISO8859-1
de_DE.ISO8859-15
de_DE.UTF-8
de_LI.UTF-8
de_LU.UTF-8
el_CY.UTF-8
el_GR.ISO8859-7
el_GR.UTF-8
en_AU.ISO8859-1
en_AU.UTF-8
en_BW.UTF-8
en_BZ.UTF-8
en_CA.ISO8859-1
en_CA.UTF-8
en_GB.ISO8859-1
en_GB.ISO8859-15
en_GB.UTF-8
en_HK.UTF-8
en_IE.ISO8859-15
en_IE.UTF-8
en_IN.UTF-8
en_JM.UTF-8
en_MH.UTF-8
en_MT.UTF-8
en_NA.UTF-8
en_NZ.ISO8859-1
en_NZ.UTF-8
en_PH.UTF-8
en_PK.UTF-8
en_SG.UTF-8
en_TT.UTF-8
en_US.ISO8859-1
en_US.ISO8859-15
en_US.UTF-8
en_ZA.UTF-8
en_ZW.UTF-8
es_AR.ISO8859-1
es_AR.UTF-8
es_BO.ISO8859-1
es_BO.UTF-8
es_CL.ISO8859-1
es_CL.UTF-8
es_CO.ISO8859-1
es_CO.UTF-8
es_CR.UTF-8
es_DO.UTF-8
es_EC.ISO8859-1
es_EC.UTF-8
es_ES.ISO8859-1
es_ES.ISO8859-15
es_ES.UTF-8
es_GQ.UTF-8
es_GT.ISO8859-1
es_GT.UTF-8
es_HN.UTF-8
es_MX.ISO8859-1
es_MX.UTF-8
es_NI.ISO8859-1
es_NI.UTF-8
es_PA.ISO8859-1
es_PA.UTF-8
es_PE.ISO8859-1
es_PE.UTF-8
es_PR.UTF-8
es_PY.UTF-8
es_SV.ISO8859-1
es_SV.UTF-8
es_US.UTF-8
es_UY.ISO8859-1
es_UY.UTF-8
es_VE.ISO8859-1
es_VE.UTF-8
et_EE.UTF-8
fi_FI.ISO8859-15
fi_FI.UTF-8
fil_PH.UTF-8
fr_BE.ISO8859-15
fr_BE.UTF-8
fr_CA.ISO8859-1
fr_CA.UTF-8
fr_CF.UTF-8
fr_CH.ISO8859-1
fr_CH.UTF-8
fr_FR.ISO8859-1
fr_FR.ISO8859-15
fr_FR.UTF-8
fr_GN.UTF-8
fr_LU.UTF-8
fr_MC.UTF-8
fr_MG.UTF-8
fr_ML.UTF-8
fr_NE.UTF-8
fr_SN.UTF-8
ga_IE.UTF-8
gu_IN.UTF-8
he_IL.UTF-8
hi_IN.UTF-8
hr_HR.ISO8859-2
hr_HR.UTF-8
hu_HU.ISO8859-2
hu_HU.UTF-8
hy_AM.UTF-8
id_ID.UTF-8
ii_CN.UTF-8
is_IS.ISO8859-1
is_IS.UTF-8
it_CH.ISO8859-1
it_CH.UTF-8
it_IT.ISO8859-1
it_IT.ISO8859-15
it_IT.UTF-8
ja_JP.UTF-8
ka_GE.UTF-8
kk_KZ.UTF-8
km_KH.UTF-8
kn_IN.UTF-8
ko_KR.UTF-8
kok_IN.UTF-8
lt_LT.ISO8859-13
lt_LT.UTF-8
lv_LV.ISO8859-13
lv_LV.UTF-8
mk_MK.ISO8859-5
mk_MK.UTF-8
ml_IN.UTF-8
mn_CN.UTF-8
mn_MN.UTF-8
mr_IN.UTF-8
ms_MY.UTF-8
mt_MT.UTF-8
nb_NO.UTF-8
ne_IN.UTF-8
ne_NP.UTF-8
nl_BE.ISO8859-15
nl_BE.UTF-8
nl_NL.ISO8859-15
nl_NL.UTF-8
nn_NO.UTF-8
or_IN.UTF-8
pa_IN.UTF-8
pa_PK.UTF-8
pl_PL.ISO8859-2
pl_PL.UTF-8
pt_BR.UTF-8
pt_GW.UTF-8
pt_MZ.UTF-8
pt_PT.ISO8859-15
pt_PT.UTF-8
ro_MD.UTF-8
ro_RO.UTF-8
ru_MD.UTF-8
ru_RU.ISO8859-5
ru_RU.KOI8-R
ru_RU.UTF-8
ru_UA.UTF-8
sa_IN.UTF-8
si_LK.UTF-8
sk_SK.UTF-8
sl_SI.UTF-8
sq_AL.ISO8859-2
sq_AL.UTF-8
sr_BA.UTF-8
sr_ME.UTF-8
sr_RS.UTF-8
sv_FI.ISO8859-15
sv_FI.UTF-8
sv_SE.ISO8859-1
sv_SE.ISO8859-15
sv_SE.UTF-8
ta_IN.UTF-8
ta_LK.UTF-8
te_IN.UTF-8
th_TH.ISO8859-11
th_TH.UTF-8
tr_TR.ISO8859-9
tr_TR.UTF-8
ug_CN.UTF-8
uk_UA.UTF-8
ur_IN.UTF-8
ur_PK.UTF-8
vi_VN.UTF-8
zh_CN.GB18030
zh_CN.UTF-8
zh_HK.UTF-8
zh_MO.UTF-8
zh_SG.UTF-8
zh_TW.UTF-8
#;
foreach my $locale (@locales) {
print $locale;
system("LD_LIBRARY_PATH=. LC_ALL= LANG= LC_COLLATE='$locale' ./perl -Ilib -wle 'use locale; my \@x = sort map { chr } (0..255)");
}
Running it:
$ perl ~/check-locales.pl
...
ur_PK.UTF-8
vi_VN.UTF-8
sh: 15207: Memory fault
zh_CN.GB18030
zh_CN.UTF-8
...
Only fails for vi_VN.UTF-8
C reproducer
foo.c:
#include <stdio.h>
#include <string.h>
#include <locale.h>
int main (int argc, char ** argv) {
char src[100] = "A \xA8 B";
char dst[100];
if (argc < 2) {
printf("Specify locale name as argument\n");
return 1;
}
locale_t loc = newlocale(LC_COLLATE_MASK, argv[1], (locale_t) 0);
if (! loc) {
printf("newlocale failed?\n");
return 1;
}
size_t res = strxfrm_l(dst, src, 100, loc);
printf("res = %d\n", res);
printf("src = %s\n", src);
printf("dst = %s\n", dst);
}
Compiling and running:
- vi-VN.UTF-8 locale:
$ gcc foo.c && ./a.out vi_VN.UTF-8
Segmentation Fault
- en_US.UTF-8 locale:
$ gcc foo.c && ./a.out en_US.UTF-8 | less
res = 48
src = A <A8> B
dst = 0g060614.1111.>11>.1F0R0R33.0011000P002X000P0012
Segfault for vi_VN.UTF-8, no segfault for other locales.
Additional information
The segfault is not really related to strxfrm_l
/POSIX 2008 locales.
Even before that it segfaulted:
$ git checkout v5.12.0
$ git clean -dxfq && ./Configure -des -Dusedevel -Duseithreads -DDEBUGGING -Doptimize='-g -O0' -Dcc=gcc && make
$ LD_LIBRARY_PATH=. LC_ALL= LANG= LC_COLLATE='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; print "\xA8" cmp "A";'
Segmentation Fault
On a side-note: what did work on older versions:
$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; print "\xA8" cmp "A";'
1
C-reproducer for strxfrm
:
foo2.c:
#include <stdio.h>
#include <string.h>
#include <locale.h>
int main (int argc, char ** argv) {
char src[100] = "A \xA8 B";
char dst[100];
if (argc < 2) {
printf("Specify locale name as argument\n");
return 1;
}
if (! setlocale(LC_COLLATE, argv[1])) {
printf("setlocale failed?\n");
return 1;
}
size_t res = strxfrm(dst, src, 100);
printf("res = %d\n", res);
printf("src = %s\n", src);
printf("dst = %s\n", dst);
}
Running:
$ gcc foo2.c && ./a.out vi_VN.UTF-8
Segmentation Fault