feat: adds substring method for Text with tests #523

krpeacock · 2023-02-13T18:20:21Z

No description provided.

crusso · 2023-02-13T19:25:16Z

src/Text.mo

+  /// Text.substring("This is a sentence.", 5, 4) // "is a"
+  /// Text.substring("This is a sentence.", 0, 0) // ""
+  /// ```
+  public func substring(t : Text, start : Int, len : Int) : Text {


Unless you've got something more clever planned for negative args, I'd use Nat for start and len and then just two (explicit) loops over the iterator's elements (using iter.next() explicitly). One to loop advance the iterator by count, the second to append the chars. The other question is whether you want to trap when out-of-bounds or return null with return type ?Text, not Text.

Yeah, that makes sense in terms of Motoko conventions. I was basing this off of a script I was writing where I was using it in conjunction with indexOf, which returned a -1 if the pattern wasn't found, but both can simply return options and use Nats instead

This was based off of the JavaScript patterns, where you are allowed to use negative indexes to count backward from the end of iterable structures. There's no need to introduce that to Motoko, however

I think in my opinion it's nicer to return an empty string over null. I've defined those cases in the doc comment, where start or length exceeds the length of a base string, it will

return "" if start is beyond input.size()

return the substring from the starting position to the end of the input if length exceeds the rest

chenyan-dfinity · 2023-02-13T21:16:07Z

@crusso Do you think it makes sense to expose str[i] or even str[i..j] as primitives? If the base library implementation ever becomes a bottleneck.

rossberg · 2023-02-14T09:58:32Z

Correct index-based access is a linear-time operation on Unicode strings, which immediately makes any looping based on it quadratic. It hence was a very deliberate decision to favour iterators and not to support random access on Text in Motoko, nor any other index-based operations.

We have discussed in the past that a subtext operator(*) should hence take two iterators as boundaries (which also eliminates the out-of-bounds error case), i.e., something like:

extract : (text : Text, start : Iter<Text>, end : Iter<Text>) -> Text

where end is exclusive.

The open problem, as I remember it, was how to ensure that the iterators belong to the text string in question. That will have to be a dynamic check, but that and extracting the position would require piercing the closure representing the iterator. (Maybe a more bearable solution is to introduce a subtype TextIter that has an additional field data : TextIterData, where TextIterData is an opaque primitive type that encapsulates subject text and position; its opaqueness guarantees that the value is well-formed.)

(*) substring wouldn't be a fitting name in Motoko, since the type is called Text.

krpeacock · 2023-02-16T15:37:55Z

I agree that the substring name may not be fitting, but I think that passing Iters instead of Nats is really unintuitive, as a consumer of the API.

Could we call this Text.slice?

rossberg · 2023-02-17T14:20:31Z

There isn't really a useful alternative to an iterators-based API. Because how would you compute an index? The only available way is by iterating over the text -- with an iterator. What would be the point in having to convert that into an index first?

It's perhaps unusual from a JS perspective, but there are other container libraries that work that way as well.

timohanke · 2023-02-24T08:03:36Z

How do iterators as boundaries work? How do I have to define them to get the boundaries I want?

crusso · 2023-03-01T17:14:37Z

How do iterators as boundaries work? How do I have to define them to get the boundaries I want?

Yeah, I'm curious about that too.

In the past, I've want to represent a Text slice as a pair of private and cloneable text iterator, plus length. Iterating the slice clones the iterator and enumerates (up to) length values. But our text iterators don't support cloning at the moment, though I don't think that would be too hard to add.

krpeacock added 2 commits February 13, 2023 10:20

feat: adds substring method for Text

4b050a6

missing comma

89b067a

krpeacock changed the title ~~feat: adds substring method for Text~~ feat: adds substring method for Text with tests Feb 13, 2023

krpeacock added 3 commits February 13, 2023 10:51

single test experiment

d0a6025

uncommenting tests

f5043b0

cases for negative length and negative start

916cf45

crusso reviewed Feb 13, 2023

View reviewed changes

krpeacock added 2 commits February 13, 2023 12:42

switching to nat inputs and adjusting test cases

61db4f7

eject early if substring is complete

4856220

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adds substring method for Text with tests #523

feat: adds substring method for Text with tests #523

krpeacock commented Feb 13, 2023

crusso Feb 13, 2023

krpeacock Feb 13, 2023

krpeacock Feb 13, 2023

krpeacock Feb 13, 2023

chenyan-dfinity commented Feb 13, 2023

rossberg commented Feb 14, 2023 •

edited

Loading

krpeacock commented Feb 16, 2023 •

edited

Loading

rossberg commented Feb 17, 2023

timohanke commented Feb 24, 2023

crusso commented Mar 1, 2023

feat: adds substring method for Text with tests #523

Are you sure you want to change the base?

feat: adds substring method for Text with tests #523

Conversation

krpeacock commented Feb 13, 2023

crusso Feb 13, 2023

Choose a reason for hiding this comment

krpeacock Feb 13, 2023

Choose a reason for hiding this comment

krpeacock Feb 13, 2023

Choose a reason for hiding this comment

krpeacock Feb 13, 2023

Choose a reason for hiding this comment

chenyan-dfinity commented Feb 13, 2023

rossberg commented Feb 14, 2023 • edited Loading

krpeacock commented Feb 16, 2023 • edited Loading

rossberg commented Feb 17, 2023

timohanke commented Feb 24, 2023

crusso commented Mar 1, 2023

rossberg commented Feb 14, 2023 •

edited

Loading

krpeacock commented Feb 16, 2023 •

edited

Loading