Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve wc #243

Merged
merged 2 commits into from
Sep 23, 2024
Merged

Improve wc #243

merged 2 commits into from
Sep 23, 2024

Conversation

gcanat
Copy link
Contributor

@gcanat gcanat commented Sep 23, 2024

As suggested in #137, I took inspiration from wc2 to improve wc_file_bytes.

In my benchmarks (tested with 1G utf8.txt file generated with wctool), it is a bit faster than wc2, and on par with wz, except when passing multiple files as argument because wz will parallelize computation, but I guess we don't want to pull rayon as a dependency here :)

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Looks good at a quick glance.

The table initialization can be improved further with a static table, yes? e.g. https://www.reddit.com/r/rust/comments/1dokkh2/recipe_for_compiletime_sparse_array_initialization/

@gcanat
Copy link
Contributor Author

gcanat commented Sep 23, 2024

OK I made the change.
For what it's worth here's the result on 1G utf8.txt

tool branch time
posixutils-rs main 2.66s
posixutils-rs wc2 1.50s
wc2.c main 2.10s
wz main 1.57s
wc v8.32 15.33

@jgarzik jgarzik merged commit 1b1ab55 into rustcoreutils:main Sep 23, 2024
2 checks passed
@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Thanks, merged.

Interested in tackling wc_file_chars()? :)

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Post-merge comment: was_space need not appear in CountInfo at all, as far as I can tell.

@gcanat
Copy link
Contributor Author

gcanat commented Sep 23, 2024

Oh yeah, I can have a look at wc_file_chars() in the coming days. And will try to handle was_space differently.

@jgarzik
Copy link
Contributor

jgarzik commented Sep 23, 2024

Oh yeah, I can have a look at wc_file_chars() in the coming days. And will try to handle was_space differently.

Thanks.

re was_space, it appears just a local, since it is not accumulated, and should not count across different files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants