@@ -16,19 +16,24 @@ about strings:
1616 * Each `Char` in a string is encoded by one or more code units
1717 * Only the index of the first code unit of a `Char` is a valid index
1818 * The encoding of a `Char` is independent of what precedes or follows it
19- * String encodings are "self-synchronizing" – i.e. `isvalid(s,i)` is O(1)
20-
21- Some string functions error if you use an out-of-bounds or invalid string index,
22- including code unit extraction `codeunit(s,i)`, string indexing `s[i]`, and
23- string iteration `next(s,i)`. Other string functions take a more relaxed
24- approach to indexing and give you the closest valid string index when in-bounds,
25- or when out-of-bounds, behave as if there were an infinite number of characters
26- padding each side of the string. Usually these imaginary padding characters have
27- code unit length `1`, but string types may choose different sizes. Relaxed
28- indexing functions include those intended for index arithmetic: `thisind`,
29- `nextind` and `prevind`. This model allows index arithmetic to work with out-of-
30- bounds indices as intermediate values so long as one never uses them to retrieve
31- a character, which often helps avoid needing to code around edge cases.
19+ * String encodings are [self-synchronizing] – i.e. `isvalid(s,i)` is O(1)
20+
21+ [self-synchronizing]: https://en.wikipedia.org/wiki/Self-synchronizing_code
22+
23+ Some string functions that extract code units, characters or substrings from
24+ strings error if you pass them out-of-bounds or invalid string indices. This
25+ includes `codeunit(s, i)`, `s[i]`, and `next(s, i)`. Functions that do string
26+ index arithmetic take a more relaxed approach to indexing and give you the
27+ closest valid string index when in-bounds, or when out-of-bounds, behave as if
28+ there were an infinite number of characters padding each side of the string.
29+ Usually these imaginary padding characters have code unit length `1` but string
30+ types may choose different "imaginary" character sizes as makes sense for their
31+ implementations (e.g. substrings may pass index arithmetic through to the
32+ underlying string they provide a view into). Relaxed indexing functions include
33+ those intended for index arithmetic: `thisind`, `nextind` and `prevind`. This
34+ model allows index arithmetic to work with out-of- bounds indices as
35+ intermediate values so long as one never uses them to retrieve a character,
36+ which often helps avoid needing to code around edge cases.
3237
3338See also: [`codeunit`](@ref), [`ncodeunits`](@ref), [`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref)
3439"""
@@ -75,8 +80,7 @@ I.e. the value returned by `codeunit(s, i)` is of the type returned by
7580See also: [`ncodeunits`](@ref), [`checkbounds`](@ref)
7681"""
7782codeunit (s:: AbstractString , i:: Integer ) = typeof (i) === Int ?
78- throw (MethodError (codeunit, Tuple{typeof (s),Int})) :
79- codeunit (s, Int (i))
83+ throw (MethodError (codeunit, Tuple{typeof (s),Int})) : codeunit (s, Int (i))
8084
8185"""
8286 isvalid(s::AbstractString, i::Integer) -> Bool
@@ -113,8 +117,7 @@ Stacktrace:
113117```
114118"""
115119isvalid (s:: AbstractString , i:: Integer ) = typeof (i) === Int ?
116- throw (MethodError (isvalid, Tuple{typeof (s),Int})) :
117- isvalid (s, Int (i))
120+ throw (MethodError (isvalid, Tuple{typeof (s),Int})) : isvalid (s, Int (i))
118121
119122"""
120123 next(s::AbstractString, i::Integer) -> Tuple{Char, Int}
@@ -128,8 +131,7 @@ a Unicode index error is raised.
128131See also: [`getindex`](@ref), [`start`](@ref), [`done`](@ref), [`checkbounds`](@ref)
129132"""
130133next (s:: AbstractString , i:: Integer ) = typeof (i) === Int ?
131- throw (MethodError (next, Tuple{typeof (s),Int})) :
132- next (s, Int (i))
134+ throw (MethodError (next, Tuple{typeof (s),Int})) : next (s, Int (i))
133135
134136# # basic generic definitions ##
135137
@@ -182,10 +184,12 @@ promote_rule(::Type{<:AbstractString}, ::Type{<:AbstractString}) = String
182184# # string & character concatenation ##
183185
184186"""
185- *(s::Union{AbstractString, Char}, t::Union{AbstractString, Char}...) -> String
187+ *(s::Union{AbstractString, Char}, t::Union{AbstractString, Char}...) -> AbstractString
186188
187189Concatenate strings and/or characters, producing a [`String`](@ref). This is equivalent
188- to calling the [`string`](@ref) function on the arguments.
190+ to calling the [`string`](@ref) function on the arguments. Concatenation of built-in
191+ string types always produces a value of type `String` but other string types may choose
192+ to return a string of a different type as appropriate.
189193
190194# Examples
191195```jldoctest
@@ -299,9 +303,10 @@ isless(a::Symbol, b::Symbol) = cmp(a, b) < 0
299303The number of characters in string `s` from indices `lo` through `hi`. This is
300304computed as the number of code unit indices from `lo` to `hi` which are valid
301305character indices. Without only a single string argument, this computes the
302- number of characters in the entire string. If `lo` or `hi` are out of ranges
303- each out of range code unit is considered to be one character. This matches the
304- "loose" indexing model of `thisind`, `nextind` and `prevind`.
306+ number of characters in the entire string. With `lo` and `hi` arguments it computes
307+ the number of indices between `lo` and `hi` inclusive that are valid indices in
308+ the string `s`. Note that the trailing character may include code units past `hi`
309+ and still be counted.
305310
306311See also: [`isvalid`](@ref), [`ncodeunits`](@ref), [`endof`](@ref), [`thisind`](@ref), [`nextind`](@ref), [`prevind`](@ref)
307312
0 commit comments