Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the unmarshal and decode API are wasteful and somewhat misleading #349

Closed
bluebrown opened this issue Apr 4, 2022 · 4 comments
Closed
Labels

Comments

@bluebrown
Copy link

Hi, I was just looking at ways the unmarshalling and decoding is handled by this package. I must say im a bit terrifed with what I saw.

So Unmarshal converts bytes to string and passes it to Decode

func Unmarshal(p []byte, v interface{}) error {
	_, err := Decode(string(p), v)
	return err
}

Then Decode converts the string to io.Reader and creates a new Decoder

func Decode(data string, v interface{}) (MetaData, error) {
	return NewDecoder(strings.NewReader(data)).Decode(v)
}

Finally, decoder.Decode, uses io.ReadAll to read everything into []bytes and converts it back into a string before calling parse.

data, err := ioutil.ReadAll(dec.r)
if err != nil {
	return MetaData{}, err
}
p, err := parse(string(data))
if err != nil {
	return MetaData{}, err
}

So it seems like using the Unmarshal function is pretty wasteful since it does this coversion roudtrip.

Additionally. it is not possible to use the decoder created by NewDecoder in a true decoder fashion, since usually it is read until EOF or another error. But EOF is never returned by decoder.Decode.

So, the below will just hang. Although it is the expected way to use a decoder.

for {
    if err := dec.Decode(&data); err == io.EOF {
        break
    } else if err != nil {
        panic(err)
    }
}
arp242 added a commit that referenced this issue Apr 4, 2022
It's just the same, but with extra steps.

Ref: #349
@arp242
Copy link
Collaborator

arp242 commented Apr 4, 2022

We can't change the return value of Decode() since it's not a compatible change. It could be done in a v2 release, but it's not a standard interface and returning nil seems fine to me. Even for a v2 I'd like to minimize changes, especially if they're runtime changes in behaviour, and not compiletime changes.

I changed Unmarshal() to call Decoder.Decode() directly, rather than through Decode(). That was indeed a bit pointless 😅

The []byte→string conversations could probably be improved more, but it's a large change as the lexer and parser operate on strings and not bytes. Changing the lexer to operate on bytes is not a huge change, but makes no difference in performance (it's actually a wee bit slower, probably because it does loads of "little" []byte→string conversations when sending it to the parser, rather than one big one). Changing the parser to use []byte is a much bigger change, so I didn't try that.

Changing a 3,000 character []byte to a string takes ~360ns on my system. It would be nice to eliminate, but it's not a huge priority; the example takes about 100,000ns to decode, so we'd be looking at a ~0.01% performance boost (rounded up).

All of that being said, I wish the API surface of this package was smaller and we didn't have 3 different ways to decode a piece of TOML (toml.Unmarshal, toml.Decode, toml.Decoder.Decode).

@bluebrown
Copy link
Author

Amazing work.

And I am sorry if my tone was somewhat harsh or rude. I didn't mean to critique in that fashion. After all my knowledge of the go isn't expert level, so I don't actually know what performance impact these things have.

Maybe for v2 the NewDecoder could just be removed since it's not actually reading from a stream. It just calls io.ReadAll internally. Perhaps that would make it clearer to the user code what is happening here.

In any case, thank you for the swift action, and keep up the good work.

@arp242 arp242 added the v2 label Apr 4, 2022
@arp242
Copy link
Collaborator

arp242 commented Apr 4, 2022

Maybe for v2 the NewDecoder could just be removed since it's not actually reading from a stream. It just calls io.ReadAll internally. Perhaps that would make it clearer to the user code what is happening here.

It might do in the future :-) I think we should mostly think about the user-visible API and what would be most ideal for that, keeping the future in mind, rather than match the API with the specifics of the current internals.

And I am sorry if my tone was somewhat harsh or rude. I didn't mean to critique in that fashion.

Yeah, it did come of like that a bit to be honest 😅 But it's okay, I figured it was maybe a bit of language barrier and/or maybe a bad day (we all have those).

@arp242
Copy link
Collaborator

arp242 commented Oct 1, 2023

Additionally. it is not possible to use the decoder created by NewDecoder in a true decoder fashion, since usually it is read until EOF or another error. But EOF is never returned by decoder.Decode.

So, the below will just hang. Although it is the expected way to use a decoder.

for {
    if err := dec.Decode(&data); err == io.EOF {
        break
    } else if err != nil {
        panic(err)
    }
}

JSON doesn't do this:

func main() {
	dec := json.NewDecoder(strings.NewReader(`{"a": 1}`))
	var m any
	err := dec.Decode(&m)
	fmt.Println(err, m)
}

Outputs <nil> map[a:1].

I'm not sure if I've ever seen Decode() return io.EOF 🤔

As for the rest: I'll look at the API when (if!) a v2 gets released – don't really need a keep an issue open for that IMHO.

@arp242 arp242 closed this as completed Oct 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants