the unmarshal and decode API are wasteful and somewhat misleading #349

bluebrown · 2022-04-04T16:57:58Z

Hi, I was just looking at ways the unmarshalling and decoding is handled by this package. I must say im a bit terrifed with what I saw.

So Unmarshal converts bytes to string and passes it to Decode

func Unmarshal(p []byte, v interface{}) error {
	_, err := Decode(string(p), v)
	return err
}

Then Decode converts the string to io.Reader and creates a new Decoder

func Decode(data string, v interface{}) (MetaData, error) {
	return NewDecoder(strings.NewReader(data)).Decode(v)
}

Finally, decoder.Decode, uses io.ReadAll to read everything into []bytes and converts it back into a string before calling parse.

data, err := ioutil.ReadAll(dec.r)
if err != nil {
	return MetaData{}, err
}
p, err := parse(string(data))
if err != nil {
	return MetaData{}, err
}

So it seems like using the Unmarshal function is pretty wasteful since it does this coversion roudtrip.

Additionally. it is not possible to use the decoder created by NewDecoder in a true decoder fashion, since usually it is read until EOF or another error. But EOF is never returned by decoder.Decode.

So, the below will just hang. Although it is the expected way to use a decoder.

for {
    if err := dec.Decode(&data); err == io.EOF {
        break
    } else if err != nil {
        panic(err)
    }
}

It's just the same, but with extra steps. Ref: #349

arp242 · 2022-04-04T18:03:44Z

We can't change the return value of Decode() since it's not a compatible change. It could be done in a v2 release, but it's not a standard interface and returning nil seems fine to me. Even for a v2 I'd like to minimize changes, especially if they're runtime changes in behaviour, and not compiletime changes.

I changed Unmarshal() to call Decoder.Decode() directly, rather than through Decode(). That was indeed a bit pointless 😅

The []byte→string conversations could probably be improved more, but it's a large change as the lexer and parser operate on strings and not bytes. Changing the lexer to operate on bytes is not a huge change, but makes no difference in performance (it's actually a wee bit slower, probably because it does loads of "little" []byte→string conversations when sending it to the parser, rather than one big one). Changing the parser to use []byte is a much bigger change, so I didn't try that.

Changing a 3,000 character []byte to a string takes ~360ns on my system. It would be nice to eliminate, but it's not a huge priority; the example takes about 100,000ns to decode, so we'd be looking at a ~0.01% performance boost (rounded up).

All of that being said, I wish the API surface of this package was smaller and we didn't have 3 different ways to decode a piece of TOML (toml.Unmarshal, toml.Decode, toml.Decoder.Decode).

bluebrown · 2022-04-04T18:15:38Z

Amazing work.

And I am sorry if my tone was somewhat harsh or rude. I didn't mean to critique in that fashion. After all my knowledge of the go isn't expert level, so I don't actually know what performance impact these things have.

Maybe for v2 the NewDecoder could just be removed since it's not actually reading from a stream. It just calls io.ReadAll internally. Perhaps that would make it clearer to the user code what is happening here.

In any case, thank you for the swift action, and keep up the good work.

arp242 · 2022-04-04T22:22:06Z

Maybe for v2 the NewDecoder could just be removed since it's not actually reading from a stream. It just calls io.ReadAll internally. Perhaps that would make it clearer to the user code what is happening here.

It might do in the future :-) I think we should mostly think about the user-visible API and what would be most ideal for that, keeping the future in mind, rather than match the API with the specifics of the current internals.

And I am sorry if my tone was somewhat harsh or rude. I didn't mean to critique in that fashion.

Yeah, it did come of like that a bit to be honest 😅 But it's okay, I figured it was maybe a bit of language barrier and/or maybe a bad day (we all have those).

arp242 · 2023-10-01T10:43:57Z

Additionally. it is not possible to use the decoder created by NewDecoder in a true decoder fashion, since usually it is read until EOF or another error. But EOF is never returned by decoder.Decode.

So, the below will just hang. Although it is the expected way to use a decoder.
for {
    if err := dec.Decode(&data); err == io.EOF {
        break
    } else if err != nil {
        panic(err)
    }
}

JSON doesn't do this:

func main() {
	dec := json.NewDecoder(strings.NewReader(`{"a": 1}`))
	var m any
	err := dec.Decode(&m)
	fmt.Println(err, m)
}

Outputs <nil> map[a:1].

I'm not sure if I've ever seen Decode() return io.EOF 🤔

As for the rest: I'll look at the API when (if!) a v2 gets released – don't really need a keep an issue open for that IMHO.

arp242 added a commit that referenced this issue Apr 4, 2022

Don't run Unmarshal() through Decode()

ef65e34

It's just the same, but with extra steps. Ref: #349

arp242 added the v2 label Apr 4, 2022

arp242 closed this as completed Oct 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the unmarshal and decode API are wasteful and somewhat misleading #349

the unmarshal and decode API are wasteful and somewhat misleading #349

bluebrown commented Apr 4, 2022

arp242 commented Apr 4, 2022

bluebrown commented Apr 4, 2022

arp242 commented Apr 4, 2022

arp242 commented Oct 1, 2023

the unmarshal and decode API are wasteful and somewhat misleading #349

the unmarshal and decode API are wasteful and somewhat misleading #349

Comments

bluebrown commented Apr 4, 2022

arp242 commented Apr 4, 2022

bluebrown commented Apr 4, 2022

arp242 commented Apr 4, 2022

arp242 commented Oct 1, 2023