Matches unexpectedly fail when there are soft hyphens (U+00AD)

As you probably know, in German, there are a lot of very long compound words.  
So websites (e.g. newspapers) like to use automatic hyphenization, to add soft hyphen characters into them, so lines can break gracefully.  
Normally, they are completely invisible, unless pasted into a program that cannot handle them (like for example a Linux terminal, where they turn into spaces).

This causes patterns that users expect to work, to fail. And it will be impossible for non-experts to even find out why.

One example: `An­fän­ge­rin­nen` does not match `Anfängerinnen`. The first looks like `An-fän-ge-rin-nen` to searches, but where each `-` is a U+00AD.

This can be circumvented with regexes, of course, but the invisibility of those hyphens makes it cumbersome and write-only: `An­?fän­?ge­?rin­?nen` (Each question mark has a U+00AD in front of it)  matches, but of course still fails if there are soft hyphens at unexpected places.

It would be much nicer, if this was automatically taken care of. Or, given that this will not be as common in other languages, maybe as a setting to enable?

Just removing all U+00AD before running any regexes is a quick and easy workaround that is definitely acceptable, in case you don’t want to dive deeply into how to do this without modifying the regex parse tree. ;)    
It is also what I resorted to.  
But of course it breaks the graceful line breaks, and leaves large gaps at the end of lines on narrower text columns.  
Still, much better than mysterious failing matches and useless bug reports about a “broken regex” that I was about to write, just before I realized this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matches unexpectedly fail when there are soft hyphens (U+00AD) #393

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Matches unexpectedly fail when there are soft hyphens (U+00AD) #393

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions