Skip to content

Commit 7a53cb3

Browse files
address Milan's review comments
1 parent 2dd0413 commit 7a53cb3

File tree

1 file changed

+29
-24
lines changed

1 file changed

+29
-24
lines changed

base/strings/basic.jl

Lines changed: 29 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,24 @@ about strings:
1616
* Each `Char` in a string is encoded by one or more code units
1717
* Only the index of the first code unit of a `Char` is a valid index
1818
* The encoding of a `Char` is independent of what precedes or follows it
19-
* String encodings are "self-synchronizing" – i.e. `isvalid(s,i)` is O(1)
20-
21-
Some string functions error if you use an out-of-bounds or invalid string index,
22-
including code unit extraction `codeunit(s,i)`, string indexing `s[i]`, and
23-
string iteration `next(s,i)`. Other string functions take a more relaxed
24-
approach to indexing and give you the closest valid string index when in-bounds,
25-
or when out-of-bounds, behave as if there were an infinite number of characters
26-
padding each side of the string. Usually these imaginary padding characters have
27-
code unit length `1`, but string types may choose different sizes. Relaxed
28-
indexing functions include those intended for index arithmetic: `thisind`,
29-
`nextind` and `prevind`. This model allows index arithmetic to work with out-of-
30-
bounds indices as intermediate values so long as one never uses them to retrieve
31-
a character, which often helps avoid needing to code around edge cases.
19+
* String encodings are [self-synchronizing] – i.e. `isvalid(s,i)` is O(1)
20+
21+
[self-synchronizing]: https://en.wikipedia.org/wiki/Self-synchronizing_code
22+
23+
Some string functions that extract code units, characters or substrings from
24+
strings error if you pass them out-of-bounds or invalid string indices. This
25+
includes `codeunit(s, i)`, `s[i]`, and `next(s, i)`. Functions that do string
26+
index arithmetic take a more relaxed approach to indexing and give you the
27+
closest valid string index when in-bounds, or when out-of-bounds, behave as if
28+
there were an infinite number of characters padding each side of the string.
29+
Usually these imaginary padding characters have code unit length `1` but string
30+
types may choose different "imaginary" character sizes as makes sense for their
31+
implementations (e.g. substrings may pass index arithmetic through to the
32+
underlying string they provide a view into). Relaxed indexing functions include
33+
those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
34+
model allows index arithmetic to work with out-of- bounds indices as
35+
intermediate values so long as one never uses them to retrieve a character,
36+
which often helps avoid needing to code around edge cases.
3237
3338
See also: [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref)
3439
"""
@@ -75,8 +80,7 @@ I.e. the value returned by `codeunit(s, i)` is of the type returned by
7580
See also: [`ncodeunits`](@ref), [`checkbounds`](@ref)
7681
"""
7782
codeunit(s::AbstractString, i::Integer) = typeof(i) === Int ?
78-
throw(MethodError(codeunit, Tuple{typeof(s),Int})) :
79-
codeunit(s, Int(i))
83+
throw(MethodError(codeunit, Tuple{typeof(s),Int})) : codeunit(s, Int(i))
8084

8185
"""
8286
isvalid(s::AbstractString, i::Integer) -> Bool
@@ -113,8 +117,7 @@ Stacktrace:
113117
```
114118
"""
115119
isvalid(s::AbstractString, i::Integer) = typeof(i) === Int ?
116-
throw(MethodError(isvalid, Tuple{typeof(s),Int})) :
117-
isvalid(s, Int(i))
120+
throw(MethodError(isvalid, Tuple{typeof(s),Int})) : isvalid(s, Int(i))
118121

119122
"""
120123
next(s::AbstractString, i::Integer) -> Tuple{Char, Int}
@@ -128,8 +131,7 @@ a Unicode index error is raised.
128131
See also: [`getindex`](@ref), [`start`](@ref), [`done`](@ref), [`checkbounds`](@ref)
129132
"""
130133
next(s::AbstractString, i::Integer) = typeof(i) === Int ?
131-
throw(MethodError(next, Tuple{typeof(s),Int})) :
132-
next(s, Int(i))
134+
throw(MethodError(next, Tuple{typeof(s),Int})) : next(s, Int(i))
133135

134136
## basic generic definitions ##
135137

@@ -182,10 +184,12 @@ promote_rule(::Type{<:AbstractString}, ::Type{<:AbstractString}) = String
182184
## string & character concatenation ##
183185

184186
"""
185-
*(s::Union{AbstractString, Char}, t::Union{AbstractString, Char}...) -> String
187+
*(s::Union{AbstractString, Char}, t::Union{AbstractString, Char}...) -> AbstractString
186188
187189
Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent
188-
to calling the [`string`](@ref) function on the arguments.
190+
to calling the [`string`](@ref) function on the arguments. Concatenation of built-in
191+
string types always produces a value of type `String` but other string types may choose
192+
to return a string of a different type as appropriate.
189193
190194
# Examples
191195
```jldoctest
@@ -299,9 +303,10 @@ isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
299303
The number of characters in string `s` from indices `lo` through `hi`. This is
300304
computed as the number of code unit indices from `lo` to `hi` which are valid
301305
character indices. Without only a single string argument, this computes the
302-
number of characters in the entire string. If `lo` or `hi` are out of ranges
303-
each out of range code unit is considered to be one character. This matches the
304-
"loose" indexing model of `thisind`, `nextind` and `prevind`.
306+
number of characters in the entire string. With `lo` and `hi` arguments it computes
307+
the number of indices between `lo` and `hi` inclusive that are valid indices in
308+
the string `s`. Note that the trailing character may include code units past `hi`
309+
and still be counted.
305310
306311
See also: [`isvalid`](@ref), [`ncodeunits`](@ref), [`endof`](@ref), [`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref)
307312

0 commit comments

Comments
 (0)