Skip to content

[R] Bindings for stringr::str_to_sentence #29256

@asfimport

Description

@asfimport

There is more to this issue than meets the eye. The stringr::str_to_sentence() does 2 things:

  • capitalise the first word

  • if there are multiple sentences provided as a single string, attempts to find sentence breaks and capitalise the first word of each sentence.

    The stringr implementation wraps stringi::str_trans_totitle(), which in turns uses ICU’s BreakIterator to locate specific text boundaries. As a consequence stringr::str_to_title() is not able to identify a full stop / period (".") as a sentence end and does not capitalise words following it. Thus, there is a discrepancy between behaviour of the utf8_capitalize kernel (which capitalises the first word of a string without making any attempt to break into sentences) and the behaviour of stringr::str_to_sentence().

    For more extensive discussions around the stringi / stringr implementation see stringr issues 202 and 231.

    Due to the complexity of this issue and the relatively niche use cases, the recommendation is to postpone implementation.

Reporter: Nicola Crane / @thisisnic
Assignee: Dragoș Moldovan-Grünfeld / @dragosmg

Related issues:

PRs and other links:

Note: This issue was originally created as ARROW-13615. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions