Skip to content

OmniOS (Solaris based): segfault with vi_VN.UTF-8 and character 168 ("\xA8", "\250") #20578

Open
@bram-perl

Description

@bram-perl

Preamble

When doing some testing for @khwilliamson it was observed that there is a segfault on OmniOS when a particular locale and particular character is used.
The purpose of this ticket is to describe the issue and the diagnostics steps.

Test environment

Running on OmniOS:

$ cat /etc/release
  OmniOS v11 r151042
  Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
  Copyright (c) 2017-2022 OmniOS Community Edition (OmniOSce) Association.
  All rights reserved. Use is subject to licence terms.

and on:

$ cat /etc/release
  OmniOS v11 r151044
  Copyright (c) 2012-2017 OmniTI Computer Consulting, Inc.
  Copyright (c) 2017-2022 OmniOS Community Edition (OmniOSce) Association.
  All rights reserved. Use is subject to licence terms.

Perl version: blead (commit 4d21c02)
Configure args:

./Configure -des -Dusedevel -Duseithreads -DDEBUGGING -Doptimize='-g -O0'

Initial testing

Initial symptoms: on khw's testing branch the lib/locale_threads.t test was segfaulting.
The line that triggered the segfault:

            add_trials('LC_COLLATE',
                       'quotemeta join "", sort reverse map { chr } (0..255)');

The short version: it runs that particular code for all installed locales.

Standalone segfault:

$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my $x = quotemeta join "", sort reverse map { chr } (0..255);'
Segmentation Fault

And also:

$ LD_LIBRARY_PATH=. LC_ALL='' LANG='' LC_COLLATE='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my $x = quotemeta join "", sort reverse map { chr } (0..255);'
Segmentation Fault

Minimal test

After some attempts the code was reduced to:

$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my @x = sort map { chr } (0..255);'
Segmentation Fault

Testing/reducing the character range revealed it only happened when chr 168 is included:

$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; my @x = sort map { chr } (0..167, 169..255);print "OK"'
OK

And one final reduction:

$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; print chr(168) cmp "A";'
Segmentation Fault

Testing other locales

Checking the other installed locales:

Testing code

Code:

#!/usr/bin/perl -l

use strict;
use warnings;

# list copied from `locale -a`
my @locales = qw#
        C
        C.UTF-8
        POSIX
        af_ZA.UTF-8
        ar_AE.UTF-8
        ar_BH.UTF-8
        ar_DZ.UTF-8
        ar_EG.UTF-8
        ar_IQ.UTF-8
        ar_JO.UTF-8
        ar_KW.UTF-8
        ar_LB.UTF-8
        ar_LY.UTF-8
        ar_MA.UTF-8
        ar_OM.UTF-8
        ar_QA.UTF-8
        ar_SA.UTF-8
        ar_TN.UTF-8
        ar_YE.UTF-8
        as_IN.UTF-8
        az_AZ.UTF-8
        be_BY.UTF-8
        bg_BG.ISO8859-5
        bg_BG.UTF-8
        bn_BD.UTF-8
        bn_IN.UTF-8
        bo_CN.UTF-8
        bo_IN.UTF-8
        bs_BA.UTF-8
        ca_ES.ISO8859-15
        ca_ES.UTF-8
        cs_CZ.ISO8859-2
        cs_CZ.UTF-8
        da_DK.ISO8859-1
        da_DK.ISO8859-15
        da_DK.UTF-8
        de_AT.ISO8859-15
        de_AT.UTF-8
        de_BE.UTF-8
        de_CH.ISO8859-1
        de_CH.UTF-8
        de_DE.ISO8859-1
        de_DE.ISO8859-15
        de_DE.UTF-8
        de_LI.UTF-8
        de_LU.UTF-8
        el_CY.UTF-8
        el_GR.ISO8859-7
        el_GR.UTF-8
        en_AU.ISO8859-1
        en_AU.UTF-8
        en_BW.UTF-8
        en_BZ.UTF-8
        en_CA.ISO8859-1
        en_CA.UTF-8
        en_GB.ISO8859-1
        en_GB.ISO8859-15
        en_GB.UTF-8
        en_HK.UTF-8
        en_IE.ISO8859-15
        en_IE.UTF-8
        en_IN.UTF-8
        en_JM.UTF-8
        en_MH.UTF-8
        en_MT.UTF-8
        en_NA.UTF-8
        en_NZ.ISO8859-1
        en_NZ.UTF-8
        en_PH.UTF-8
        en_PK.UTF-8
        en_SG.UTF-8
        en_TT.UTF-8
        en_US.ISO8859-1
        en_US.ISO8859-15
        en_US.UTF-8
        en_ZA.UTF-8
        en_ZW.UTF-8
        es_AR.ISO8859-1
        es_AR.UTF-8
        es_BO.ISO8859-1
        es_BO.UTF-8
        es_CL.ISO8859-1
        es_CL.UTF-8
        es_CO.ISO8859-1
        es_CO.UTF-8
        es_CR.UTF-8
        es_DO.UTF-8
        es_EC.ISO8859-1
        es_EC.UTF-8
        es_ES.ISO8859-1
        es_ES.ISO8859-15
        es_ES.UTF-8
        es_GQ.UTF-8
        es_GT.ISO8859-1
        es_GT.UTF-8
        es_HN.UTF-8
        es_MX.ISO8859-1
        es_MX.UTF-8
        es_NI.ISO8859-1
        es_NI.UTF-8
        es_PA.ISO8859-1
        es_PA.UTF-8
        es_PE.ISO8859-1
        es_PE.UTF-8
        es_PR.UTF-8
        es_PY.UTF-8
        es_SV.ISO8859-1
        es_SV.UTF-8
        es_US.UTF-8
        es_UY.ISO8859-1
        es_UY.UTF-8
        es_VE.ISO8859-1
        es_VE.UTF-8
        et_EE.UTF-8
        fi_FI.ISO8859-15
        fi_FI.UTF-8
        fil_PH.UTF-8
        fr_BE.ISO8859-15
        fr_BE.UTF-8
        fr_CA.ISO8859-1
        fr_CA.UTF-8
        fr_CF.UTF-8
        fr_CH.ISO8859-1
        fr_CH.UTF-8
        fr_FR.ISO8859-1
        fr_FR.ISO8859-15
        fr_FR.UTF-8
        fr_GN.UTF-8
        fr_LU.UTF-8
        fr_MC.UTF-8
        fr_MG.UTF-8
        fr_ML.UTF-8
        fr_NE.UTF-8
        fr_SN.UTF-8
        ga_IE.UTF-8
        gu_IN.UTF-8
        he_IL.UTF-8
        hi_IN.UTF-8
        hr_HR.ISO8859-2
        hr_HR.UTF-8
        hu_HU.ISO8859-2
        hu_HU.UTF-8
        hy_AM.UTF-8
        id_ID.UTF-8
        ii_CN.UTF-8
        is_IS.ISO8859-1
        is_IS.UTF-8
        it_CH.ISO8859-1
        it_CH.UTF-8
        it_IT.ISO8859-1
        it_IT.ISO8859-15
        it_IT.UTF-8
        ja_JP.UTF-8
        ka_GE.UTF-8
        kk_KZ.UTF-8
        km_KH.UTF-8
        kn_IN.UTF-8
        ko_KR.UTF-8
        kok_IN.UTF-8
        lt_LT.ISO8859-13
        lt_LT.UTF-8
        lv_LV.ISO8859-13
        lv_LV.UTF-8
        mk_MK.ISO8859-5
        mk_MK.UTF-8
        ml_IN.UTF-8
        mn_CN.UTF-8
        mn_MN.UTF-8
        mr_IN.UTF-8
        ms_MY.UTF-8
        mt_MT.UTF-8
        nb_NO.UTF-8
        ne_IN.UTF-8
        ne_NP.UTF-8
        nl_BE.ISO8859-15
        nl_BE.UTF-8
        nl_NL.ISO8859-15
        nl_NL.UTF-8
        nn_NO.UTF-8
        or_IN.UTF-8
        pa_IN.UTF-8
        pa_PK.UTF-8
        pl_PL.ISO8859-2
        pl_PL.UTF-8
        pt_BR.UTF-8
        pt_GW.UTF-8
        pt_MZ.UTF-8
        pt_PT.ISO8859-15
        pt_PT.UTF-8
        ro_MD.UTF-8
        ro_RO.UTF-8
        ru_MD.UTF-8
        ru_RU.ISO8859-5
        ru_RU.KOI8-R
        ru_RU.UTF-8
        ru_UA.UTF-8
        sa_IN.UTF-8
        si_LK.UTF-8
        sk_SK.UTF-8
        sl_SI.UTF-8
        sq_AL.ISO8859-2
        sq_AL.UTF-8
        sr_BA.UTF-8
        sr_ME.UTF-8
        sr_RS.UTF-8
        sv_FI.ISO8859-15
        sv_FI.UTF-8
        sv_SE.ISO8859-1
        sv_SE.ISO8859-15
        sv_SE.UTF-8
        ta_IN.UTF-8
        ta_LK.UTF-8
        te_IN.UTF-8
        th_TH.ISO8859-11
        th_TH.UTF-8
        tr_TR.ISO8859-9
        tr_TR.UTF-8
        ug_CN.UTF-8
        uk_UA.UTF-8
        ur_IN.UTF-8
        ur_PK.UTF-8
        vi_VN.UTF-8
        zh_CN.GB18030
        zh_CN.UTF-8
        zh_HK.UTF-8
        zh_MO.UTF-8
        zh_SG.UTF-8
        zh_TW.UTF-8
#;

foreach my $locale (@locales) {
        print $locale;
        system("LD_LIBRARY_PATH=. LC_ALL= LANG= LC_COLLATE='$locale' ./perl -Ilib -wle 'use locale; my \@x = sort map { chr } (0..255)");
}

Running it:

$ perl ~/check-locales.pl
...
ur_PK.UTF-8
vi_VN.UTF-8
sh: 15207: Memory fault
zh_CN.GB18030
zh_CN.UTF-8	
...

Only fails for vi_VN.UTF-8

C reproducer

foo.c:

#include <stdio.h>
#include <string.h>
#include <locale.h>

int main (int argc, char ** argv) {
    char src[100] = "A \xA8 B";
    char dst[100];

    if (argc < 2) {
        printf("Specify locale name as argument\n");
        return 1;
    }

    locale_t loc = newlocale(LC_COLLATE_MASK, argv[1], (locale_t) 0);
    if (! loc) {
        printf("newlocale failed?\n");
        return 1;
    }
    size_t res = strxfrm_l(dst, src, 100, loc);
    printf("res = %d\n", res);
    printf("src = %s\n", src);
    printf("dst = %s\n", dst);
}

Compiling and running:

  • vi-VN.UTF-8 locale:
$ gcc foo.c && ./a.out vi_VN.UTF-8
Segmentation Fault
  • en_US.UTF-8 locale:
$ gcc foo.c && ./a.out en_US.UTF-8 | less
res = 48
src = A <A8> B
dst = 0g060614.1111.>11>.1F0R0R33.0011000P002X000P0012	

Segfault for vi_VN.UTF-8, no segfault for other locales.

Additional information

The segfault is not really related to strxfrm_l/POSIX 2008 locales.
Even before that it segfaulted:

$ git checkout v5.12.0
$ git clean -dxfq && ./Configure -des -Dusedevel -Duseithreads -DDEBUGGING -Doptimize='-g -O0' -Dcc=gcc && make
$ LD_LIBRARY_PATH=. LC_ALL= LANG= LC_COLLATE='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; print "\xA8" cmp "A";'
Segmentation Fault

On a side-note: what did work on older versions:

$ LD_LIBRARY_PATH=. LC_ALL='vi_VN.UTF-8' ./perl -Ilib -wle 'use locale; print "\xA8" cmp "A";'
1

C-reproducer for strxfrm:

foo2.c:

#include <stdio.h>
#include <string.h>
#include <locale.h>

int main (int argc, char ** argv) {
    char src[100] = "A \xA8 B";
    char dst[100];

    if (argc < 2) {
        printf("Specify locale name as argument\n");
        return 1;
    }

    if (! setlocale(LC_COLLATE, argv[1])) {
        printf("setlocale failed?\n");
        return 1;
    }
    size_t res = strxfrm(dst, src, 100);
    printf("res = %d\n", res);
    printf("src = %s\n", src);
    printf("dst = %s\n", dst);
}

Running:

$ gcc foo2.c && ./a.out vi_VN.UTF-8
Segmentation Fault

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions