Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make ListFormatter allocation datagen test not allocate #2738

Open
sffc opened this issue Oct 11, 2022 · 5 comments
Open

Make ListFormatter allocation datagen test not allocate #2738

sffc opened this issue Oct 11, 2022 · 5 comments
Labels
A-performance Area: Performance (CPU, Memory) C-list Component - ListFormat help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-enhancement Type: Nice-to-have but not required

Comments

@sffc
Copy link
Member

sffc commented Oct 11, 2022

It seems that ListFormatter invokes a nontrivial validation of the regex-automata bytes that involves memory allocations and a BTreeMap. We should find a way to avoid this, either by validating with no allocations or by performing GIGO. This may involve patching the regex-automata crate.

@sffc sffc added T-enhancement Type: Nice-to-have but not required good first issue Good for newcomers S-medium Size: Less than a week (larger bug fix or enhancement) C-list Component - ListFormat A-performance Area: Performance (CPU, Memory) labels Oct 11, 2022
@sffc
Copy link
Member Author

sffc commented Oct 12, 2022

So it looks like to avoid the BTreeMap we just need to disable the alloc feature on regex automata.

https://github.com/BurntSushi/regex-automata/blob/a4900a9ba0d14b27f2ed7d782b296f54d300b376/src/dfa/sparse.rs#L1372

It says:

        // In order to validate everything, we not only need to make sure we
        // can decode every state, but that every transition in every state
        // points to a valid state. There are many duplicative transitions, so
        // we record state IDs that we've verified so that we don't redo the
        // decoding work.
        //
        // Except, when in no_std mode, we don't have dynamic memory allocation
        // available to us, so we skip this optimization. It's not clear
        // whether doing something more clever is worth it just yet. If you're
        // profiling this code and need it to run faster, please file an issue.
        //
        // ---AG

I think we should disable the alloc feature. If, in the future, we discover a bottleneck (ideally one that a client reports), we can file an issue. There are a few ways to make it faster, one of which is to GIGO it.

CC @robertbastian

@sffc sffc added discuss Discuss at a future ICU4X-SC meeting and removed good first issue Good for newcomers discuss Discuss at a future ICU4X-SC meeting labels Oct 12, 2022
@robertbastian
Copy link
Member

We already deactivate alloc, however the zero-copy test runs in the datagen crate, with datagen's dependencies, which all have std (datagen actually, which can't work without regex-automata/alloc) enabled. It actually has a hard dependency on datagen, because it uses DatagenProvider to convert from buffers to concrete data payloads. We'd have to redesign how this tests gets its deserialization implementations. @Manishearth.

@sffc
Copy link
Member Author

sffc commented Oct 12, 2022

Hmm, interesting. If it's an issue we could probably just submit an upstream PR splitting this out into a different feature we can disable.

@robertbastian
Copy link
Member

Upstream will likely tell us to fix our features. It would be good to find a way to run the zero-copy test without the datagen feature enabled.

@Manishearth
Copy link
Member

We can at least ask, I guess.

@sffc sffc changed the title Investigate ListFormatter regex validation code size Make ListFormatter allocation datagen test not allocate Oct 17, 2022
@sffc sffc added backlog help wanted Issue needs an assignee labels Oct 17, 2022
@sffc sffc added this to the Backlog milestone Dec 22, 2022
@sffc sffc removed the backlog label Dec 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-performance Area: Performance (CPU, Memory) C-list Component - ListFormat help wanted Issue needs an assignee S-medium Size: Less than a week (larger bug fix or enhancement) T-enhancement Type: Nice-to-have but not required
Projects
None yet
Development

No branches or pull requests

3 participants