Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strings and UTF-8. #1104

Open
otrho opened this issue Mar 30, 2022 · 5 comments
Open

Strings and UTF-8. #1104

otrho opened this issue Mar 30, 2022 · 5 comments
Labels
bikeshedding For bikeshedding trivialities compiler General compiler. Should eventually become more specific as the issue is triaged language feature Core language features visible to end users P: medium

Comments

@otrho
Copy link
Contributor

otrho commented Mar 30, 2022

We need to make some decisions around how we use strings in Sway. I can see some potential problems if we explicitly support UTF-8 and keep the string size in its type.

Right now we can declare a string to have type str[N] where N is the number of bytes in the string. We can then compare those types and if N doesn't match then it's a different string type.

But is this useful? If a string type is 4 chars then I could use "abcd" or "🎸" -- both are 4 bytes, but the latter is one character and doesn't 'feel' like a str[4]. If a function or initialiser has the type str[2] then I just can't put "🎸" in there.

But then, what is the point of having a fixed length string as a variable anyway? If I have a function which takes a string for a tag, or log, or even just to hash, how big do I make it?

I'm thinking we should instead just have a str type, without a size. When type checking all strings match, regardless of their length. But we won't support mutable or variable length strings, they'll all still be a fixed known size at compile time. Internally the compiler would always used them by reference, and compare them with MEQ or similar.

We may also need a len() function or method which would be a compile time constant for that string value. OTOH, maybe not, it depends how much functionality we want to provide. Proper UTF-8 support would imply allowing to iterate for each character/glyph in the string, etc. But does Sway need this?

@otrho otrho added question bikeshedding For bikeshedding trivialities compiler General compiler. Should eventually become more specific as the issue is triaged language feature Core language features visible to end users labels Mar 30, 2022
@sezna
Copy link
Contributor

sezna commented Mar 31, 2022

I also don't know if str[N] is a useful thing to have and it very well could be worth removing. These compile-time-sized but opaque to the user strings sounds pretty novel, I've not encountered another language that tracks string size and hides it from the user at compile time. Would this be useful and/or more ergonomic?

That being said, I think variable-length strings could be offloaded to libraries, perhaps? We would want to use a collection under the hood, which would be library code. Once that is implemented, we could add some syntactic sugar to the compiler to make string literals more convenient to use with library code. It's very possible that after we do that, our primitive fixed-length str type could be removed entirely, in favor of this new stdlib version.

@mohammadfawaz
Copy link
Contributor

mohammadfawaz commented Jan 31, 2023

I'm moving this to the Sway project so that we can give it some attention. Not sure what direction we want to go with just yet.

@mohammadfawaz
Copy link
Contributor

The library String can go a long way if we can:

  1. Make it more ergonomic to use, initialize, and update using actual characters/strings.
  2. Figure out what how to use it in storage. A special StorageStrings type could be an option here but that seems like an overkill. Storage is actual once situation where a lightweight str type could be useful.

@sezna
Copy link
Contributor

sezna commented Jan 31, 2023

We don't have to tackle this all at once. We could make a string library with a decent API wrapping a vector of characters. Eventually we can follow that up with some syntax support. Tackling it as two distinct issues like that is probably easiest?

@mohammadfawaz
Copy link
Contributor

The String library is pretty good at the moment! But we still can't do something like let hello = String::from("Hello, world!");. Either way, splitting this issue into multiple issues is a good idea indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bikeshedding For bikeshedding trivialities compiler General compiler. Should eventually become more specific as the issue is triaged language feature Core language features visible to end users P: medium
Projects
Status: Todo
Development

No branches or pull requests

3 participants