rut is a tool for selecting bytes, characters, or fields from files. It is heavily based on
cut, but with additional support for delimiting fields with a regular expression. Note that there
are multiple implementations of cut which differ in various ways. rut is primarily based on
the GNU Coreutils implementation.
Below is a table briefly describing rut's features and how it compares to cut.
| Option | Description | POSIX cut 1 |
GNU Coreutils cut |
rut |
|---|---|---|---|---|
-b |
Select bytes. | ✔ | ✔ (also supports --bytes) |
✔ (also supports --bytes) |
-c |
Select characters. | ✔ | ⚠ (also supports --characters; behaves the same as -b) |
✔ (also supports --characters; requires UTF-8 input) |
-f |
Select fields (strings separated by a delimiter). | ✔ | ✔ (also supports --fields; treats each byte as a character, without regard for encoding) |
✔ (also supports --fields; requires UTF-8 input) |
-d |
Specify a single character delimiter when used with -f. |
✔ | ⚠ (also supports --delimiter; requires single byte character) |
✔ (also supports --delimiter; must be a UTF-8 character) |
-s |
Do not print lines without a delimiter. Normal behavior is to print the full line. | ✔ | ✔ (also supports --only-delimited) |
✔ (also supports --only-delimited) |
-n |
Do not split multi-byte characters when used with -b. |
✔ | ⚠ (no-op) | ⚠ (no-op) |
--output-delimiter |
Specify a string to delimit selected fields. Normal behavior is to use delimiter. | ❌ | ✔ | ✔ (also supports -o) |
--complement |
Select the complement of selected bytes/characters/fields. | ❌ | ✔ | ✔ |
-z / --zero-terminated |
Delimit "lines" with a zero byte rather than a newline | ❌ | ✔ | ✔ |
-r / --regex-delimiter |
Specific a regular expression as a delimiter when used with -f. |
❌ | ❌ | ✔ |
- This column describes the POSIX definition of
cutand not any particular implementation.
rut is intended to be a drop-in replacement for the GNU Coreutils implementation of cut in many
cases. In particular, you should be able to replace cut with rut for any valid cut command
with ASCII input. It is considered a bug if it does not.
rut also adds support for an additional options and UTF-8 encoded input. This has the following
consequences.
- Some
cutcommands which would fail due to invalid or unrecognized options will pass withrut. - The output for commands using
-cor-fwill be different for non-ASCII input.
Select bytes from a file:
$ rut -b1-5 tests/files/ascii.txt
abcde
a b c
a_b_c
a:b:cSelect from stdin:
$ cat tests/files/ascii.txt | rut -b1-5
abcde
a b c
a_b_c
a:b:cComparison of bytes versus characters:
$ rut -b1-4 tests/files/utf8.txt
abcd
αβ
abα
😀
$ rut -c1-4 tests/files/utf8.txt
abcd
αβγδ
abαβ
😀😁😂😃Select fields:
$ rut -f1 -dd tests/files/ascii.txt
abc
a b c
a_b_c_
$ rut -f2,4,6 -d'_' -s tests/files/ascii.txt
b_d_fSelect fields with regex delimiter:
$ rut -f 2-4,6-8 -r '[ _:]' -s -o# tests/files/ascii.txt
b#c#d#f#g#h
b#c#d#f#g#h
b#c#d#f#g#hrut is written in Rust. It has been tested with Rust 1.45.2 but may
work with earlier or later versions. The following instructions assume you have Rust installed.
To run the full test suite, run:
$ cargo testTo build rut in debug mode (faster compile time, slower executable), use:
$ cargo buildFor release mode (slower compile times, faster executable), use:
$ cargo build --releaseBuild scripts are provided to create a release package. Follow the instructions
below to build an archive (.zip or .tar.gz) in the target directory.
$ .\Create-Release.ps1$ ./create-release.sh