Skip to content

&str.words should be using the unicode word boundary algorithm #15628

Closed
@huonw

Description

@huonw

Currently it's just splitting on whitespace (i.e. is equivalent to splitting with regex!("\s+")). Preferably it should be using http://www.unicode.org/reports/tr29/#Word_Boundaries

(The old behaviour is easy to replicate with the above regex, or with s.split(|c: char| c.is_whitespace()).filter(|s| !s.is_empty()).)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions