Skip to content

Type checking for base string functions #22

Open
@alistaire47

Description

Base R's internal string functions (regex, strsplit, substr, paste, etc.) all coerce non-string types (particularly factors, but also numbers) to character vectors, e.g.

# Nice ordered factor
mon <- factor(month.name, month.name, ordered = TRUE)
mon
#>  [1] January   February  March     April     May       June      July     
#>  [8] August    September October   November  December 
#> 12 Levels: January < February < March < April < May < June < ... < December

# Implicitly coerces factor to character
substr(mon, 1, 3)
#>  [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov"
#> [12] "Dec"

# Operating on levels keeps types properly
levels(mon) <- substr(levels(mon), 1, 3)
mon
#>  [1] Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
#> 12 Levels: Jan < Feb < Mar < Apr < May < Jun < Jul < Aug < Sep < ... < Dec

or more ridiculously

gsub(12L, 21, 1234L)
#> [1] "2134"

While the coercion is at least consistent and probably expected by anyone who has used R for a while, it would occasionally be convenient to have a stricter type-safety requirement whereby all coercion must be explicit (like strict does with apply on data.frames), making R

paste0('foo', 47L)
# Error: [strict] ...

paste0('foo', as.character(47L))
#> [1] "foo47"

more like Python:

print('foo' + 47)
# TypeError: must be str, not int

print('foo' + str(47))
#> foo47

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions