Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Go 2: permit converting a string constant to a byte array type #36890

Closed
jfcg opened this issue Jan 30, 2020 · 52 comments
Closed

proposal: Go 2: permit converting a string constant to a byte array type #36890

jfcg opened this issue Jan 30, 2020 · 52 comments
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Milestone

Comments

@jfcg
Copy link

jfcg commented Jan 30, 2020

byte slices and strings are convertable to each other:

var bs = []byte("şevkı")
fmt.Println(len(bs), string(bs))

I propose to extend this by allowing to convert:

  • string constants to byte arrays:
var ba = [...]byte("şevkı")
  • byte arrays to strings
var ba = [...]byte{20, 40, 60}
var s = string(ba)

Also:

// should not compile
var ba = [6]byte("şevkı")

// ok
var ba = [7]byte("şevkı")

// pad with zeros
var ba = [8]byte("şevkı")

I believe this is backward-compatible with Go 1. What do you think?

@gopherbot gopherbot added this to the Proposal milestone Jan 30, 2020
@bcmills bcmills added v2 An incompatible library change LanguageChange Suggested changes to the Go language labels Jan 30, 2020
@bcmills bcmills modified the milestones: Proposal, Go2 Jan 30, 2020
@randall77
Copy link
Contributor

The latter you can do with string(ba[:]).
The former would only make sense with constant strings. Can you elaborate on situations where that comes up?

@jfcg
Copy link
Author

jfcg commented Jan 30, 2020

Most systems nowadays are 64-bit. A string takes 16 bytes and a slice takes 24 bytes. It is especially useful if you want to:

  • stay compact with a small buffer
  • provide human readable initial values for your byte buffer.
    Btw I think this is Go 1 compatible, it is not only applicable to Go 2.

@randall77
Copy link
Contributor

Most systems nowadays are 64-bit. A string takes 16 bytes and a slice takes 24 bytes. It is especially useful if you want to:

stay compact with a small buffer

I don't see how this matters. Your first proposal would happen at compile time, and the second one just needs an extra 8 bytes of stack space (which could be optimized away if we cared).

provide human readable initial values for your byte buffer.

That seems like a reasonable thing to want. You could do

var ba [5]byte
copy(ba[:], "hello")

The only thing that [...]byte{"hello"} really gets you is automatic sizing.

@ianlancetaylor ianlancetaylor changed the title proposal: byte array <-> string conversions proposal: Go 2: byte array <-> string conversions Jan 30, 2020
@ianlancetaylor
Copy link
Contributor

How often does code want to initialize a byte array (not slice) with a constant string?

@bcmills
Copy link
Contributor

bcmills commented Jan 30, 2020

I have, on occasion, wanted a byte-array from a constant string in order to have something addressable to pass to a C function (https://play.golang.org/p/vTuzoWnwRhA).

I typically end up working around it by using a variable and copy instead, but that is more verbose (and, as @randall77 notes, requires more care about sizing).

I do not know how representative that use-case is.

@josharian
Copy link
Contributor

@bcmills another verbose fix in that kind of case could come from #395 --you'd convert twice, once to []byte, and then to *[N]byte.

@jfcg
Copy link
Author

jfcg commented Jan 30, 2020

stay compact with a small buffer

I don't see how this matters. Your first proposal would happen at compile time, and the second one just needs an extra 8 bytes of stack space (which could be optimized away if we cared).

this will use extra 24 bytes on the stack compared to a byte array:

func f() {
    var buf = []byte("file000")
    ...
}

If buf is a fixed size buffer and f() is recursive, then redundant extra space rapidly grows.

Initialzing a byte array from a string can be really convenient: It can be used for enumerations of strings with a specific format like above, among other useful cases.

provide human readable initial values for your byte buffer.

That seems like a reasonable thing to want. You could do
var ba [5]byte
copy(ba[:], "hello")
The only thing that [...]byte{"hello"} really gets you is automatic sizing.

Ignoring having to type an extra line, yes automatic sizing is also a convenient gain ;)

@jimmyfrasche
Copy link
Member

Minor nit but it seems like it should be [...]byte("string").

@networkimprov
Copy link

@jfcg no "v2.0" is planned. The "Go2" label means "defer until language changes resume" (they resumed in 1.13).

@randall77
Copy link
Contributor

this will use extra 24 bytes on the stack compared to a byte array:

func f() {
var buf = []byte("file000")
...
}

Comparing, say,

func f() {
    ba := [3]byte{65, 65, 65}
    s := string(ba[:])
}

and

func g() {
    ba := [3]byte{65, 65, 65}
    s := string(ba) // proposed new feature
}

f has to call runtime.slicebytetostring which takes 24 bytes for the slice.
g would have to call a hypothetical runtime.arraybytetostring which needs to take a pointer and a size, 16 bytes.
In either case, this space is shared between all calls out of f (or g). So if you have a recursive call with 3 word-sized arguments (a string and an int, say), the conversion mentioned here costs no space at all. At worst it costs 8 bytes.

We could easily optimize the expression string(ba[:]) to only pass a ptr+len instead of a ptr+len+cap, and then it wouldn't cost any space, ever.

@jfcg
Copy link
Author

jfcg commented Jan 30, 2020

Having to call a function like runtime.slicebytetostring for s := string(ba[:]) is weird, let alone passing cap as a parameter to that function redundantly.
I thought Go compiler allocates space for string on stack, and just copies ptr and len.

Ok, this is different than having a fixed size buffer and seems the proposal has a slight 8 byte advantage, and three characters [:] less to type ;)

@randall77
Copy link
Contributor

let alone passing cap as a parameter to that function redundantly.

That is true, slicebytetostring never uses the capacity. I guess just fixing that is equivalent to my runtime.arraybytetostring proposal.

I thought Go compiler allocates space for string on stack, and just copies ptr and len.

It always has to do at least a copy of the data, because the compiler currently has no analysis to prove that the byte array is not modified subsequently (#31506, #2205).

If the result doesn't escape, it can allocate the backing array on the stack. But I don't think the details of that are much different in the two cases.
Read runtime/string.go:slicebytetostring for the gory details.

@josharian
Copy link
Contributor

I've started on removing unnecessary cap arguments to a few runtime calls. It's mostly done. I'll plan to finish it up and mail during 1.15. (Writing here to try to avoid duplicate work.)

@ianlancetaylor
Copy link
Contributor

As noted above, we can already convert byte arrays to strings by adding [:], so this is really about converting strings to byte arrays. One of the criteria for Go 2 language changes from https://blog.golang.org/go2-here-we-come is that a language change should "address an important issue for many people".

How often does this really come up in practice? For example, can you find examples of existing code that would be simplified if we added this conversion to the language?

@jfcg
Copy link
Author

jfcg commented Feb 11, 2020

In the top 15 trending Go repos this month

grep -Pr "\[[^]]+\]byte{('[^']+', )+" .

yields 74 results:

./go-master/doc/gccgo_install.html:var name = [4]byte{'f', 'o', 'o', 0};
./go-master/src/encoding/xml/marshal_test.go:	{Value: &Plain{[3]byte{'<', '/', '>'}}, ExpectXML: `<Plain><V>&lt;/&gt;</V></Plain>`},
./go-master/src/fmt/fmt_test.go:	{"%s", [3]byte{'a', 'b', 'c'}, "abc"},
./go-master/src/fmt/fmt_test.go:	{"%s", &[3]byte{'a', 'b', 'c'}, "&abc"},
./go-master/src/reflect/all_test.go:		s := [...]byte{'_', '_', '_', '_', '_', '_', '_', '_'}
./go-master/src/runtime/race.go:var qq = [...]byte{'?', '?', 0}
./go-master/src/runtime/race.go:var dash = [...]byte{'-', 0}
./go-master/src/syscall/exec_linux.go:	none  = [...]byte{'n', 'o', 'n', 'e', 0}
./go-master/src/syscall/exec_linux.go:	slash = [...]byte{'/', 0}
./go-master/test/fixedbugs/bug102.go:	var b1 = [5]byte{'h', 'e', 'l', 'l', 'o'}
./go-master/test/fixedbugs/issue15528.go:	deep interface{} = [1]struct{ a *[2]byte }{{a: &[2]byte{'z', 'w'}}}
./go-master/test/fixedbugs/issue15528.go:	if !reflect.DeepEqual(*(deep.([1]struct{ a *[2]byte })[0].a), [2]byte{'z', 'w'}) {
./go-master/test/rotate.go:	cop = [2]byte{'|', '^'}
./kops-master/vendor/github.com/hashicorp/go-msgpack/codec/time.go:	timeDigits = [...]byte{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
./kops-master/vendor/golang.org/x/crypto/salsa20/salsa/hsalsa20.go:var Sigma = [16]byte{'e', 'x', 'p', 'a', 'n', 'd', ' ', '3', '2', '-', 'b', 'y', 't', 'e', ' ', 'k'}
./libpod-master/vendor/github.com/uber/jaeger-client-go/thrift/simple_json_protocol.go:		fill := [...]byte{'=', '=', '='}
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'a', 'r', 't', '-', 'l', 'o', 'j', 'b', 'a', 'n'}: _jbo, // art-lojban
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'a', 'm', 'i'}:                          _ami, // i-ami
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'b', 'n', 'n'}:                          _bnn, // i-bnn
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'h', 'a', 'k'}:                          _hak, // i-hak
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'k', 'l', 'i', 'n', 'g', 'o', 'n'}:      _tlh, // i-klingon
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'l', 'u', 'x'}:                          _lb,  // i-lux
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'n', 'a', 'v', 'a', 'j', 'o'}:           _nv,  // i-navajo
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'p', 'w', 'n'}:                          _pwn, // i-pwn
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 't', 'a', 'o'}:                          _tao, // i-tao
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 't', 'a', 'y'}:                          _tay, // i-tay
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 't', 's', 'u'}:                          _tsu, // i-tsu
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'n', 'o', '-', 'b', 'o', 'k'}:                     _nb,  // no-bok
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'n', 'o', '-', 'n', 'y', 'n'}:                     _nn,  // no-nyn
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'f', 'r'}:      _sfb, // sgn-BE-FR
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'n', 'l'}:      _vgt, // sgn-BE-NL
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'s', 'g', 'n', '-', 'c', 'h', '-', 'd', 'e'}:      _sgg, // sgn-CH-DE
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'g', 'u', 'o', 'y', 'u'}:           _cmn, // zh-guoyu
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'h', 'a', 'k', 'k', 'a'}:           _hak, // zh-hakka
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'm', 'i', 'n', '-', 'n', 'a', 'n'}: _nan, // zh-min-nan
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'x', 'i', 'a', 'n', 'g'}:           _hsn, // zh-xiang
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'c', 'e', 'l', '-', 'g', 'a', 'u', 'l', 'i', 's', 'h'}: -1, // cel-gaulish
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'e', 'n', '-', 'g', 'b', '-', 'o', 'e', 'd'}:           -2, // en-GB-oed
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'd', 'e', 'f', 'a', 'u', 'l', 't'}:           -3, // i-default
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'e', 'n', 'o', 'c', 'h', 'i', 'a', 'n'}:      -4, // i-enochian
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'm', 'i', 'n', 'g', 'o'}:                     -5, // i-mingo
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'm', 'i', 'n'}:                          -6, // zh-min
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'r', 'o', 'o', 't'}:                                    0,  // root
./libpod-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'e', 'n', '-', 'u', 's', '-', 'p', 'o', 's', 'i', 'x'}: -7, // en_US_POSIX"
./terraform-master/vendor/github.com/Azure/go-ntlmssp/messageheader.go:var signature = [8]byte{'N', 'T', 'L', 'M', 'S', 'S', 'P', 0}
./terraform-master/vendor/github.com/ugorji/go/codec/binc.go:// var timeDigits = [...]byte{'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'}
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'a', 'r', 't', '-', 'l', 'o', 'j', 'b', 'a', 'n'}: _jbo, // art-lojban
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'a', 'm', 'i'}:                          _ami, // i-ami
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'b', 'n', 'n'}:                          _bnn, // i-bnn
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'h', 'a', 'k'}:                          _hak, // i-hak
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'k', 'l', 'i', 'n', 'g', 'o', 'n'}:      _tlh, // i-klingon
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'l', 'u', 'x'}:                          _lb,  // i-lux
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'n', 'a', 'v', 'a', 'j', 'o'}:           _nv,  // i-navajo
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'p', 'w', 'n'}:                          _pwn, // i-pwn
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 't', 'a', 'o'}:                          _tao, // i-tao
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 't', 'a', 'y'}:                          _tay, // i-tay
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 't', 's', 'u'}:                          _tsu, // i-tsu
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'n', 'o', '-', 'b', 'o', 'k'}:                     _nb,  // no-bok
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'n', 'o', '-', 'n', 'y', 'n'}:                     _nn,  // no-nyn
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'f', 'r'}:      _sfb, // sgn-BE-FR
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'s', 'g', 'n', '-', 'b', 'e', '-', 'n', 'l'}:      _vgt, // sgn-BE-NL
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'s', 'g', 'n', '-', 'c', 'h', '-', 'd', 'e'}:      _sgg, // sgn-CH-DE
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'g', 'u', 'o', 'y', 'u'}:           _cmn, // zh-guoyu
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'h', 'a', 'k', 'k', 'a'}:           _hak, // zh-hakka
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'm', 'i', 'n', '-', 'n', 'a', 'n'}: _nan, // zh-min-nan
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'x', 'i', 'a', 'n', 'g'}:           _hsn, // zh-xiang
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'c', 'e', 'l', '-', 'g', 'a', 'u', 'l', 'i', 's', 'h'}: -1, // cel-gaulish
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'e', 'n', '-', 'g', 'b', '-', 'o', 'e', 'd'}:           -2, // en-GB-oed
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'd', 'e', 'f', 'a', 'u', 'l', 't'}:           -3, // i-default
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'e', 'n', 'o', 'c', 'h', 'i', 'a', 'n'}:      -4, // i-enochian
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'i', '-', 'm', 'i', 'n', 'g', 'o'}:                     -5, // i-mingo
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'z', 'h', '-', 'm', 'i', 'n'}:                          -6, // zh-min
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'r', 'o', 'o', 't'}:                                    0,  // root
./terraform-master/vendor/golang.org/x/text/internal/language/lookup.go:		[maxLen]byte{'e', 'n', '-', 'u', 's', '-', 'p', 'o', 's', 'i', 'x'}: -7, // en_US_POSIX"

This is an understated statistics. It is pretty clear this feature is direly needed...

@ianlancetaylor ianlancetaylor changed the title proposal: Go 2: byte array <-> string conversions proposal: Go 2: permit converting a string constant to a byte array type Feb 18, 2020
@earthboundkid
Copy link
Contributor

This compiles on the playground:

	const staticname = "foo\x00"
	var name [len(staticname)]byte
	copy(name[:], staticname)

Do we really need it to be more convenient than this?

@jfcg
Copy link
Author

jfcg commented Feb 19, 2020

Are you seriously comparing that to:

var name = [...]byte("foo\x00")

C and C++ has this for years:

char name[] = "foo";

@josharian
Copy link
Contributor

@jfcg I think he was serious, yes. Please be polite. Although you may disagree, it is not an unreasonable position; the bar for language changes is very high.

@earthboundkid
Copy link
Contributor

earthboundkid commented Feb 19, 2020

The C/C++ version is not directly relevant to the discussion:

  • Null terminated strings are the biggest security disaster in the history of computing
  • Hiding a null at the end of a string literal leads to a lot of beginner confusion and even mistakes for advanced programmers
  • C/C++ don't distinguish the types of different array lengths

The question is not "is [...]byte("") shorter/better?" Of course it is. But everything is a tradeoff. Are byte arrays useful enough, often enough to be worth the cost of updating the language and adding a special casing for this? One point against special casing it is that you can use [len(constant)]byte as a type.

@ianlancetaylor
Copy link
Contributor

The discussion about "24 bytes of waste" is assuming specific compiler behavior. It's misleading because compilers can change. In the example above it's true that buf is likely to be allocated as 24 bytes on the stack in order to get a value to pass to fmt.Println. But it's also true that using a byte array is likely to cause two copies of the array to exist, in order to get a separate copy of the array to pass to fmt.Println. Exactly what happens is going to depend deeply on what the compiler is able to do. If the compiler never needs to store the capacity of a slice anywhere, then perhaps it won't. If the compiler never needs to copy an array, then perhaps it won't.

In general, we should avoid making language decisions based on compiler behavior.

(Which is not to say that I am opposed to this change, but if there is a reason to do it it has to do with making code simpler and more readable, and is not because of saving space on the stack.)

@earthboundkid
Copy link
Contributor

Your example is very artificial and hard to draw conclusions from. In this case, fmt.Println accepts a slice of interface{}, so you're already "wasting" many more bytes on that. But the program probably isn't memory constrained, so there's no reason to optimize it either.

@jfcg
Copy link
Author

jfcg commented Feb 21, 2020

@carlmjohnson, any fixed-size local buffer that does not escape the function is better as a byte array.

@ianlancetaylor, byte slices are officially seperate types that reference to memory buffers, am I wrong? How can a compiler choose to not allocate space for that slice, can you explain?

@ianlancetaylor
Copy link
Contributor

Both byte slices and byte arrays have an array of bytes. We can disregard that when comparing them.

A byte slice is three different values: a pointer to the array of bytes, a length, and a capacity. A byte array will internally be represented using a pointer to the array, and a constant length. When the compiler sees []byte("abc") it knows that the length and capacity are 3, so they are also constant, just as with a byte array. So internally when the compiler is working with a byte slice initialized in that way, it has a pointer to an array, and a constant length and capacity. In other words, it's just the same as an array.

The 24 bytes only arises when the compiler has to construct the slice in memory. In practice that only happens if the slice is converted to an interface type. When the slice is indexed, the compiler can compare the index with the length, in this case a constant. When the slice is passed to a function, the compiler will pass three separate values, where in this case two of them are constants. The compiler does not need to construct a 24 byte value in order to do that (the arguments to the function will take 24 bytes, but exactly the same would happen when passing a byte array to a function that expects a byte slice, by slicing the array).

In other words, the compiler doesn't think of a byte slice as a single value of size 24 bytes. It thinks of it as three different values, one pointer and two ints. Only when the slice must be stored into memory, as when converting to an interface type or setting a global variable, does the compiler need to assemble the 24 byte value.

@jfcg
Copy link
Author

jfcg commented Feb 21, 2020

Ok I see, so the compiler is smart for many cases.
Do we have the 24-byte problem for the example above for gc toolchain as of today?

@mrkanister
Copy link

It's not just the space savings: if you have a protocol that requires a fixed-length byte array, obtaining such an array from a variable-length slice is verbose at best, and turns what ought to be a compile-time length check into a run-time one (or, worse, an undetected under- or over-run, which may be particularly dangerous when populating fields in a struct passed in from C).

@bcmills, you are right, that is another valid use case for byte arrays. I wasn't trying to ignore your comment from a few weeks back, but rather wanted to emphasize that focussing on the (supposedly) saved space might not be the way to gain enough support for the proposal. Sorry, if I didn't make that clear enough.

@ianlancetaylor
Copy link
Contributor

As far as I know the gc compiler doesn't have the "24 byte problem." And, if it does, we should treat as something to fix in the compiler, rather than something to fix in the language.

(As I mentioned above I'm not opposed to this change, I just don't think that optimization is a reason for it.)

@jfcg
Copy link
Author

jfcg commented Feb 25, 2020

Hi. To better understand slice/array initialization differences, I disassembled
slice.go:

package mypkg

//import "fmt"

func slc(x int) {
	if x > 99 {
		return
	}
	var buf = []byte("file000")

	buf[6] += byte(x % 10) // just do something with x & buf
	//	fmt.Println(buf)

	slc(x + 1)
	slc(x + 2)
}

array.go:

package mypkg

//import "fmt"

func arr(x int) {
	if x > 99 {
		return
	}
	var buf = [...]byte{'f', 'i', 'l', 'e', '0', '0', '0'}

	buf[6] += byte(x % 10) // just do something with x & buf
	//	fmt.Println(buf[:])

	arr(x + 1)
	arr(x + 2)
}

with go tool compile -S array.go > array.S etc. Dissamblies are almost the same:

  • array version has two more Go assembly instructions
  • array version has "file000" in a readonly data segment
  • slice version has "file000" in a Go string
  • slice assembly has no definition of autotmp_2 (string to slice library function?)

I think both of these simplified versions spend same amount of stack as @ianlancetaylor said.

This is the vimdiff shot:
slice-arr

Could someone please explain these little differences?
Thanks..

@josharian
Copy link
Contributor

I’m on my phone, but I suspect that after https://go-review.googlesource.com/c/go/+/220499 goes in, those differences may disappear. That change description may help you understand the difference as well. If you want to know more, look for array and slice initialization in src/cmd/compile/internal/gc/sinit.go.

@ianlancetaylor
Copy link
Contributor

Thanks for finding the examples where this could be used. I note that many of the cases are one map in x/text/internal/language, which appears twice in the examples. Many of the other cases are only in tests. This doesn't seem to be used often enough to justify changing the language, per the criteria for language changes at https://blog.golang.org/go2-here-we-come.

It's a bit odd to restrict this conversion only to constant strings. But it's also odd to permit non-constant strings, as it's not obvious what should happen if the string is too long.

Given that it only applies to constant strings, it's syntactic sugar for listing out the bytes.

As discussed above, the effects on the compiler should be fixed in the compiler, not in the language.

For these reasons, this is a likely decline. Leaving open for four weeks for final comments.

@jfcg
Copy link
Author

jfcg commented Mar 24, 2020

  • This change makes sense not just for constant strings but constant-size strings:
func fn(s string) {
    if len(s) != 4 {
        return
    }
    buf := [...]byte(s)
  • On top of the dozens of examples above, there are 4076+ examples of creating slices from simple strings. People are likely always choosing slices in these cases because only slices have a convenient initialization from strings. If only 1% of them could be written with arrays, you have 40+
    more examples for this proposal.
  • This proposal will close an assymmetry in slice and array initialization. It is like being able to initialize float32 from an int constant, but not float64.
  • It is Go 1 compatible.

If this proposal does not satisfy the condition of inclusion, I dont see what does. This is very wrong. Go is a programming language, not a religion. I dont get it.

@ianlancetaylor
Copy link
Contributor

@jfcg The language can't rely only tests like if len(s) != 4 that might be omitted or incorrect. We have to define exactly what should happen when converting a non-constant string to a byte array type.

The blog post lists three criteria for language changes. One of those is that a language change must "address an important issue for many people." The cases in x/text/internal/language/lookup.go we can effectively treat as a single example, as they are all the same. Ignoring cases in tests, I count 12 examples in #36890 (comment). That isn't many, especially considering that you looked at the whole standard library. And each of those 12 examples is easy to write today; this language change would make it slightly more convenient, but it wouldn't add a feature that is not already available.

I don't see any particular reason to assume that any cases of converting a string to a slice could instead convert a string to a byte array. Perhaps some could, perhaps not. Slices and arrays are different and serve different purposes.

I don't agree that adding this feature would close an asymmetry in the language. You can't convert an array of bytes to a string.

@jfcg
Copy link
Author

jfcg commented Mar 24, 2020

@jfcg The language can't rely only tests like if len(s) != 4 that might be omitted or incorrect. We have to define exactly what should happen when converting a non-constant string to a byte array type.

Any string whose size can be determined at compile-time can be converted to a byte array. How is constant-size string not well defined?

The blog post lists three criteria for language changes. One of those is that a language change must "address an important issue for many people." The cases in x/text/internal/language/lookup.go we can effectively treat as a single example, as they are all the same. Ignoring cases in tests, I count 12 examples in #36890 (comment). That isn't many, especially considering that you looked at the whole standard library. And each of those 12 examples is easy to write today; this language change would make it slightly more convenient, but it wouldn't add a feature that is not already available.

Ian, how many slice(string) conversions were there in the global Go code bases before Go team added that ability to Go? How many are there now?

I don't see any particular reason to assume that any cases of converting a string to a slice could instead convert a string to a byte array. Perhaps some could, perhaps not. Slices and arrays are different and serve different purposes.

I don't agree that adding this feature would close an asymmetry in the language. You can't convert an array of bytes to a string.

we can write []byte("input") but not [...]byte("input"). Do you agree?
i guess you dont count string(array[:]) as a conversion.

@ianlancetaylor
Copy link
Contributor

Any string whose size can be determined at compile-time can be converted to a byte array. How is constant-size string not well defined?

The Go language is defined by a language spec, not by an implementation. There are multiple Go compilers. If we implement this feature, all compilers must agree on exactly what it is permitted to convert a string to a byte array, and when it is not. Otherwise a program might compile with one compiler but not with another. Therefore, we can't just say "size can be determined at compile-time." We must write down the precise rules by which the size can be determined at compile time.

Ian, how many slice(string) conversions were there in the global Go code bases before Go team added that ability to Go? How many are there now?

I understand what you are asking, but I don't think it's comparable. Before we added the ability to convert a string to a []byte, there was no way to do that conversion. We can already today convert a constant string to a byte array, by writing down the bytes one by one, as in the examples you listed above. I don't see an obvious reason to believe that people would write more conversions of constant strings to byte arrays if they could do the conversion directly, rather than by doing what they can already do today.

we can write []byte("input") but not [...]byte("input"). Do you agree?

Yes.

i guess you dont count string(array[:]) as a conversion.

That is one slice expression and one conversion.

@mrkanister
Copy link

mrkanister commented Mar 25, 2020

On top of the dozens of examples above, there are 4076+ examples of creating slices from simple strings. People are likely always choosing slices in these cases because only slices have a convenient initialization from strings. If only 1% of them could be written with arrays, you have 40+ more examples for this proposal.

Just curious, why would you want to rewrite them to use arrays when they currently work fine with slices?

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/227163 mentions this issue: cmd/compile,runtime: pass only ptr and len to some runtime calls

gopherbot pushed a commit that referenced this issue Apr 8, 2020
Some runtime calls accept a slice, but only use ptr and len.
This change modifies most such routines to accept only ptr and len.

After this change, the only runtime calls that accept an unnecessary
cap arg are concatstrings and slicerunetostring.
Neither is particularly common, and both are complicated to modify.

Negligible compiler performance impact. Shrinks binaries a little.
There are only a few regressions; the one I investigated was
due to register allocation fluctuation.

Passes 'go test -race std cmd', modulo #38265 and #38266.
Wow, does that take a long time to run.

Updates #36890

file      before    after     Δ       %       
compile   19655024  19655152  +128    +0.001% 
cover     5244840   5236648   -8192   -0.156% 
dist      3662376   3658280   -4096   -0.112% 
link      6680056   6675960   -4096   -0.061% 
pprof     14789844  14777556  -12288  -0.083% 
test2json 2824744   2820648   -4096   -0.145% 
trace     11647876  11639684  -8192   -0.070% 
vet       8260472   8256376   -4096   -0.050% 
total     115163736 115118808 -44928  -0.039% 

Change-Id: Idb29fa6a81d6a82bfd3b65740b98cf3275ca0a78
Reviewed-on: https://go-review.googlesource.com/c/go/+/227163
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
@tv42
Copy link

tv42 commented Apr 14, 2020

@bcmills #36890 (comment)

I have, on occasion, wanted a byte-array from a constant string in order to have something addressable to pass to a C function (https://play.golang.org/p/vTuzoWnwRhA).

&slice[0] works just fine. https://play.golang.org/p/3mtOk5XJWJB

@ianlancetaylor
Copy link
Contributor

No change in consensus.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/245099 mentions this issue: compiler,runtime: pass only ptr and len to some runtime calls

gopherbot pushed a commit to golang/gofrontend that referenced this issue Jul 28, 2020
This ports https://golang.org/cl/227163 to the Go frontend.
This is a step toward moving up to the go1.15rc1 release.

Original CL description:

    cmd/compile,runtime: pass only ptr and len to some runtime calls

    Some runtime calls accept a slice, but only use ptr and len.
    This change modifies most such routines to accept only ptr and len.

    After this change, the only runtime calls that accept an unnecessary
    cap arg are concatstrings and slicerunetostring.
    Neither is particularly common, and both are complicated to modify.

    Negligible compiler performance impact. Shrinks binaries a little.
    There are only a few regressions; the one I investigated was
    due to register allocation fluctuation.

    Passes 'go test -race std cmd', modulo golang/go#38265 and golang/go#38266.
    Wow, does that take a long time to run.

    file      before    after     Δ       %
    compile   19655024  19655152  +128    +0.001%
    cover     5244840   5236648   -8192   -0.156%
    dist      3662376   3658280   -4096   -0.112%
    link      6680056   6675960   -4096   -0.061%
    pprof     14789844  14777556  -12288  -0.083%
    test2json 2824744   2820648   -4096   -0.145%
    trace     11647876  11639684  -8192   -0.070%
    vet       8260472   8256376   -4096   -0.050%
    total     115163736 115118808 -44928  -0.039%

For golang/go#36890

Change-Id: I1dc1424ccb092a9ad70472e560a743c35dd27bfc
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/245099
Reviewed-by: Cherry Zhang <cherryyz@google.com>
jpf91 pushed a commit to D-Programming-GDC/gcc that referenced this issue Jul 28, 2020
This ports https://golang.org/cl/227163 to the Go frontend.
This is a step toward moving up to the go1.15rc1 release.

Original CL description:

    cmd/compile,runtime: pass only ptr and len to some runtime calls

    Some runtime calls accept a slice, but only use ptr and len.
    This change modifies most such routines to accept only ptr and len.

    After this change, the only runtime calls that accept an unnecessary
    cap arg are concatstrings and slicerunetostring.
    Neither is particularly common, and both are complicated to modify.

    Negligible compiler performance impact. Shrinks binaries a little.
    There are only a few regressions; the one I investigated was
    due to register allocation fluctuation.

    Passes 'go test -race std cmd', modulo golang/go#38265 and golang/go#38266.
    Wow, does that take a long time to run.

    file      before    after     Δ       %
    compile   19655024  19655152  +128    +0.001%
    cover     5244840   5236648   -8192   -0.156%
    dist      3662376   3658280   -4096   -0.112%
    link      6680056   6675960   -4096   -0.061%
    pprof     14789844  14777556  -12288  -0.083%
    test2json 2824744   2820648   -4096   -0.145%
    trace     11647876  11639684  -8192   -0.070%
    vet       8260472   8256376   -4096   -0.050%
    total     115163736 115118808 -44928  -0.039%

For golang/go#36890

Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/245099
@golang golang locked and limited conversation to collaborators Jul 27, 2021
@ianlancetaylor ianlancetaylor modified the milestones: Go2, Proposal Aug 6, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge LanguageChange Suggested changes to the Go language Proposal Proposal-FinalCommentPeriod v2 An incompatible library change
Projects
None yet
Development

No branches or pull requests