Skip to content

Avoidable "PCRE compilation error" by allowing more than ASCII #35322

@PallHaraldsson

Description

@PallHaraldsson

Missing "PCRE_UTF8 option" for pcre_compile() or something else?

"ð" (and "þ") are some of he extra letters in the Icelandic alphabet, and I would want to make sure those work too.

From memory same kind of regex worked in Python, and as I was new to this, it took a long time to figure out what was wrong... (porting from Python something that supposedly should have worked) I had some longer regex end i was (too) tedious to count out to the right letter:

julia> r"""(?P<viðburðarnafn>((\w)*))"""
ERROR: LoadError: PCRE compilation error: syntax error in subpattern name (missing terminator) at offset 6
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] compile(::String, ::UInt32) at ./pcre.jl:123
 [3] compile(::Regex) at ./regex.jl:72
 [4] Regex(::String, ::UInt32, ::UInt32) at ./regex.jl:37
 [5] Regex(::String) at ./regex.jl:60
 [6] @r_str(::LineNumberNode, ::Module, ::Any) at ./regex.jl:109
in expression starting at REPL[51]:1
julia> versioninfo
versioninfo (generic function with 2 methods)

julia> versioninfo()
Julia Version 1.5.0-DEV.360
Commit 012b270df6 (2020-02-28 07:57 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, broadwell)
Environment:
  JULIA_LLVM_ARGS = -unroll-threshold=500
  JULIA_NUM_THREADS = 4

http://pcre.sourceforge.net/pcre.txt

UTF-8 SUPPORT

 To build PCRE with support for UTF-8 character strings, add

   --enable-utf8

 to the configure command. Of itself, this does not make PCRE
 treat  strings as UTF-8. As well as compiling PCRE with this
 option, you also have have to set the PCRE_UTF8 option  when
 you call the pcre_compile() function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedIndicates that a maintainer wants help on an issue or pull requeststrings"Strings!"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions