Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: adds substring method for Text with tests #523

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

krpeacock
Copy link
Contributor

No description provided.

@krpeacock krpeacock changed the title feat: adds substring method for Text feat: adds substring method for Text with tests Feb 13, 2023
src/Text.mo Outdated
/// Text.substring("This is a sentence.", 5, 4) // "is a"
/// Text.substring("This is a sentence.", 0, 0) // ""
/// ```
public func substring(t : Text, start : Int, len : Int) : Text {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you've got something more clever planned for negative args, I'd use Nat for start and len and then just two (explicit) loops over the iterator's elements (using iter.next() explicitly). One to loop advance the iterator by count, the second to append the chars. The other question is whether you want to trap when out-of-bounds or return null with return type ?Text, not Text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense in terms of Motoko conventions. I was basing this off of a script I was writing where I was using it in conjunction with indexOf, which returned a -1 if the pattern wasn't found, but both can simply return options and use Nats instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was based off of the JavaScript patterns, where you are allowed to use negative indexes to count backward from the end of iterable structures. There's no need to introduce that to Motoko, however

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in my opinion it's nicer to return an empty string over null. I've defined those cases in the doc comment, where start or length exceeds the length of a base string, it will

  • return "" if start is beyond input.size()
  • return the substring from the starting position to the end of the input if length exceeds the rest

@chenyan-dfinity
Copy link
Contributor

@crusso Do you think it makes sense to expose str[i] or even str[i..j] as primitives? If the base library implementation ever becomes a bottleneck.

@rossberg
Copy link
Contributor

rossberg commented Feb 14, 2023

Correct index-based access is a linear-time operation on Unicode strings, which immediately makes any looping based on it quadratic. It hence was a very deliberate decision to favour iterators and not to support random access on Text in Motoko, nor any other index-based operations.

We have discussed in the past that a subtext operator(*) should hence take two iterators as boundaries (which also eliminates the out-of-bounds error case), i.e., something like:

extract : (text : Text, start : Iter<Text>, end : Iter<Text>) -> Text

where end is exclusive.

The open problem, as I remember it, was how to ensure that the iterators belong to the text string in question. That will have to be a dynamic check, but that and extracting the position would require piercing the closure representing the iterator. (Maybe a more bearable solution is to introduce a subtype TextIter that has an additional field data : TextIterData, where TextIterData is an opaque primitive type that encapsulates subject text and position; its opaqueness guarantees that the value is well-formed.)

(*) substring wouldn't be a fitting name in Motoko, since the type is called Text.

@krpeacock
Copy link
Contributor Author

krpeacock commented Feb 16, 2023

I agree that the substring name may not be fitting, but I think that passing Iters instead of Nats is really unintuitive, as a consumer of the API.

Could we call this Text.slice?

@rossberg
Copy link
Contributor

There isn't really a useful alternative to an iterators-based API. Because how would you compute an index? The only available way is by iterating over the text -- with an iterator. What would be the point in having to convert that into an index first?

It's perhaps unusual from a JS perspective, but there are other container libraries that work that way as well.

@timohanke
Copy link

How do iterators as boundaries work? How do I have to define them to get the boundaries I want?

@crusso
Copy link
Contributor

crusso commented Mar 1, 2023

How do iterators as boundaries work? How do I have to define them to get the boundaries I want?

Yeah, I'm curious about that too.

In the past, I've want to represent a Text slice as a pair of private and cloneable text iterator, plus length. Iterating the slice clones the iterator and enumerates (up to) length values. But our text iterators don't support cloning at the moment, though I don't think that would be too hard to add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants