Skip to content

Conversation

zleyyij
Copy link
Contributor

@zleyyij zleyyij commented Sep 30, 2025

Closes #38

This PR introduces the addition of several crates to improve the ergonomics of the CLI.

The new CLI has been designed to be simple, ergonomic, and scale well as more features are added. It has a real time progress bar, along with an ETA. Once processing is complete, it presents basic stats about what happened.

If demand is present, backwards compatibility could be added under a compat subcommand.

Future work building on this PR might involve the addition of a bench subcommand, and/or a gen-dict subcommand.

Attached is a recording of the decompress command in action

asciicast

@KillingSpark
Copy link
Owner

KillingSpark commented Oct 1, 2025

That progress bar is slick!

I think the result page should show the ratio inverted

0.93GiB ——> 406.89MiB (2.34x) makes we wonder why it would come out 2.34x time bigger, when it is actually 0.43x smaller. Maybe that's just me though :)

I was also wondering if it might make sense to make this repo a cargo workspace and put the zstd binary into it's onw crate? This might make dependency management and crate features less messy

@zleyyij
Copy link
Contributor Author

zleyyij commented Oct 1, 2025

I agree that the ratio feels inverted, but it's how Facebook represents it (and I think most entities). It could be replaced with y% of original, or z% smaller if you feel it would be more intuitive.

https://facebook.github.io/zstd/

Switching to a workspace makes sense, I'd be down for that and can implement that change if needed.

@fintelia
Copy link
Contributor

fintelia commented Oct 2, 2025

The C-based zstd utility outputs the compression ratio as a percentage:

$ zstd foo.txt 
foo.txt              : 33.91%   (  4.43 KiB =>   1.50 KiB, foo.txt.zst)

@KillingSpark
Copy link
Owner

Switching to a workspace makes sense, I'd be down for that and can implement that change if needed.

That would be awesome!

I agree that the ratio feels inverted, but it's how Facebook represents it (and I think most entities).

I think that's just for benchmarking purposes (or rather for purposes of cool graphs). "Line go up" is visually more impressive than "Line go down". Since the C impl seem to show this as a precentage then that's the way I'd go too :)

@zleyyij
Copy link
Contributor Author

zleyyij commented Oct 4, 2025

Okay, switching to a workspace has been done, and the ratio was switched to a percentage (as well as a more granular time display)

Some future discussion topics:

  • Can we make the test that decodes every file in local_corpus_files skipped by default, or removed entirely? IMO those are more integration tests, and I find myself renaming the local_corpus_files directory temporarily just so I can avoid processing gigabytes of data for every feature flag combo. Alternatively, updating the .gitignore to ignore any file with the local_ prefix
  • What do you think about adding a task runner to the repository? (i.e just or cargo xtask). This would be done to make running CI locally easier, you could have something like just nightly-ci and just check. While you can run CI locally with act, or by copy pasting the commands, act is slower than bare metal, repeats a lot of work, and creates a 30+gb docker cache for this repo alone. Copy pasting the commands is slow and easy to miss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve the zstd binary provided
3 participants