Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wc] faster character count #248

Merged
merged 3 commits into from
Sep 25, 2024
Merged

[wc] faster character count #248

merged 3 commits into from
Sep 25, 2024

Conversation

gcanat
Copy link
Contributor

@gcanat gcanat commented Sep 23, 2024

On my machine, time decreased by 75% on 1Gb utf8.txt, from 7.4s to 1.85s.

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Nice.

tiny comment: No need to pass global variable table as a function parameter.

Will review and test in depth tomorrow.

@jgarzik
Copy link
Contributor

jgarzik commented Sep 24, 2024

Does this work if a multi-byte character straddles the edge of two input buffers?
i.e. First portion of char is input via file.read(), and 2nd portion of char is input via 2nd call to file.read()?

@gcanat
Copy link
Contributor Author

gcanat commented Sep 24, 2024

Well I just tested with a 200Mb file filled with these 2 chars 生履, with no space and I get the same result as GNU wc -m, wc2 -m and wz -c.

@jgarzik jgarzik merged commit 21e1ef1 into rustcoreutils:main Sep 25, 2024
2 checks passed
@gcanat gcanat deleted the wc_chars branch September 25, 2024 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants