-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Description
There is more to this issue than meets the eye. The stringr::str_to_sentence() does 2 things:
-
capitalise the first word
-
if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.
The
stringrimplementation wrapsstringi::str_trans_totitle(), which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequencestringr::str_to_title()is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of theutf8_capitalizekernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour ofstringr::str_to_sentence().For more extensive discussions around the
stringi / stringrimplementation seestringrissues 202 and 231.Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.
Reporter: Nicola Crane / @thisisnic
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg
Related issues:
- [C++] String capitalize kernel (depends upon)
PRs and other links:
Note: This issue was originally created as ARROW-13615. Please see the migration documentation for further details.