Skip to content
/ dcat Public

A simple yet efficient file concater/merger/joiner.

License

Notifications You must be signed in to change notification settings

nadvotsky/dcat

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dcat

dcat Logo

Description

An efficient, zero-dependency command-line utility for concatenating, merging, and joining files, written in the D programming language.

Features

  • Uses low-level OS-specific functions for fast file preallocation and efficient data copying.
  • Optional stacked bar chart visualizing byte distribution across input files.
  • Optional support for pattern-based trimming from file contents.
  • Single binary with zero dependencies — written in a fast, compiled language.

Implementation Details

OS Preallocation Copying Method
Linux fallocate(2) copy_file_range(2) / sendfile(2)
Windows SetEndOfFile Parallel double mmap
POSIX posix_fallocate() Single mmap
macOS ftruncate(2) Buffer-based copying

All platforms also include a fallback sequential buffer-based implementation accessible via -F | --fallback.

For a deeper dive into performance, see the Benchmarking section.

Getting Started

Command-Line Interface

This utility follows POSIX Utility Conventions.

In short, the following is recognized:

  • Flags: Short or long strings with no value (e.g., -f or --flag). May be specified multiple times for options that act as counters.
  • Options: Short or long strings with an associated value (e.g., -f <value> or --flag[=]value).
  • Parameters: Required strings without prefixes (e.g., <in1> <in2>).

General Options

  • -h | --help: Displays a help message with all command-line options.
  • -V | --version: Shows the dcat version and compiler information.

File Selection

  • <input1>, <input2>, ...: A required list of input files to be concatenated.
  • -O <PATH> | --output=<PATH>: A required output file path (must not already exist).

Behavior Control

  • -N | --dry-run: Simulate execution; analyzes files and output schema without writing data.
  • -T <HEX> | --trim[=]<HEX>: A hexadecimal pattern to greedily trim from both the beginning and end of each input file.

Implementation Control

  • -F | --fallback: Forces dcat to use the sequential buffer-based implementation. This is generally slower but more stable across various environments.
  • -P | --posix: (Linux only) Use POSIX single mmap copy approach instead of Linux-specific methods.
  • --no-cow: (Linux only) Use sendfile(2) instead of copy_file_range(2).

Verbosity and Visualization

  • -v | --verbose: Increases the application's verbosity. Can be specified up to two times (-vv) to print detailed debug messages.
  • --bar-padding <int> Left padding of the bar chart (default: 4).
  • --bar-height <int>: Height of the bar chart (default: 10).
  • --bar-width: Width of the bar chart (default: 10).

Building

This project uses DUB, the official D package manager.

Optionally, it includes Taskfile.yml for use with Task, a cross-platform task runner for easy cross-platform workflows. It pairs a perfect duo with D, as both support multiple architectures.

Prerequisites

  • D compiler (DMD, GDC, or LDC)
  • DUB package manager (likely included with D)
  • Task (optional task runner)

Build Instructions

# Clone the repository
git clone https://github.com/nadvotsky/dcat.git && cd dcat

# List available tasks
task list

# Build for the currrent platform (binary will be in `dcat/bin/dcat[.exe]`)
task [default]

# Or build manually with DUB
dub build [--build=release] --root ./dcat

# Run benchmarks
task benchmark

Feel free to customize the Taskfile.yml. For example, BUILD_DIR specifies a path to the project, and BENCH_* variables are specific to the benchmarking.

Benchmarking

dcat includes a flexible benchmarking suite for evaluating performance across different environments.

Requirements

Configuration

The Taskfile.yml defines the following variables for customizing the benchmark:

  • BENCH_SRC: Source code directory for the benchmarking suite.
  • BENCH_TMP: Temporary directory for test files and binaries.
  • BENCH_SZ: A list of file sizes (in MB) to benchmark.
  • BENCH_IN: A list of filenames to be created and used for benchmarking, each of size BENCH_SZ.
  • BENCH_OUT: The name of the output file.
  • BENCH_LINUX, BENCH_NT, BENCH_POSIX: A list of specific dcat variants to build and benchmark. See the Variants section for details.

Challenges

Low-level file operations can be highly sensitive to a variety of system factors, which can significantly influence benchmark results. It is highly recommended to run the benchmarks on your specific system to get relevant performance data!

Key factors include:

  1. Number and size of files.
  2. Storage (RAM, HDD, SSD, NVMe, MMC), and their respective variations (i.e., DRAM Cache).
  3. Filesystems (ext4, Btrfs, XFS, NTFS, ReFS, APFS), including support for features like Copy-on-Write (CoW).
  4. D compiler (DMD, LDC, GDC).
  5. Kernel version and I/O scheduler (kyber, bfq, cfq).
  6. System load, cache exaggeration, hugepages, access times (atime), etc.

Theory

NOTE: The explanations above is the tip of the iceberg of "zero-copy" theory. Filesystem manipulation cannot be standardized across all operating systems and filesystems; trying to do so would be a mistake. There is much more to consider: system cache behavior, buffer strategies, filesystem optimizations, portability issues between UNIX variants, and so on.

Buffer

Traditionally, one of the most portable and straightforward ways to copy files is to use a buffered read/write loop. This approach remains common in thousands of applications and there is nothing wrong with it.

However, with the rise of Copy-On-Write (CoW) capable filesystems and the increasing complexity of modern operating systems (e.g., optimized transfers from the system cache to a NIC), more efficient alternatives were developed.

Specific System Calls

Starting with Linux kernel 2.2, the sendfile(2) system call became available. This enables copying data entirely within kernel space, eliminating unnecessary context switches between user and kernel modes.

Later, in Linux 4.5, the copy_file_range(2) syscall was introduced to work with CoW in mind, offering a more flexible method of file copying (for instance, allowing to copy between different filesystems).

Memory Mapped file

Another alternative is to use memory-mapped files. This technique maps a file directly into a process's address space, allowing file I/O to be handled through standard memory operations. The kernel's virtual memory subsystem transparently handles page swapping.

The downside of memory-mapped files is that large files may exceed the addressable space of 32-bit applications. Additionally, this behavior can vary significantly by OS: some may overcommit memory pool, while others may perform inefficient copying under the hood.

Variants

The benchmarking suite includes the following implementation variants:

  • sendseq/sendpar: Sequential/parallel sendfile(2)
    • sendpar: Opens an additional output file handle for each thread and performs fseek
  • mmapseq/mmappar: Sequential/parallel memory-mapped input
    • mmappar: Opens an additional output file handle for each thread and performs fseek
  • dmmapseq/dmmappar: Sequential/parallel dual memory-mapped input/output
  • copyseq/copypar: Sequential/parallel copy_file_range(2)
    • copypar: Opens an additional output file handle for each thread
  • chunkseq: D language high-level chunked copy
  • blockseq: Basic C-style buffer copy

Results

NOTE: Results are machine-specific and may not be representative of the particular environment! Refer to Challenges for more information.

Windows

dbench_dmmappar.exe ran
  1.37 ± 0.04 times faster than COPY /B
  2.24 ± 0.01 times faster than dbench_dmmapseq.exe
  4.73 ± 0.43 times faster than dbench_chunkseq.exe
  4.89 ± 0.58 times faster than dbench_blockseq.exe
  5.00 ± 2.31 times faster than dbench_mmappar.exe
  6.23 ± 4.60 times faster than dbench_mmapseq.exe

Dual memory mapping on Windows yields promising performance.

Linux

  cat ran
    1.45 ± 0.14  times faster than dbench_sendseq
    1.46 ± 0.19  times faster than dbench_copyseq
    1.59 ± 0.16  times faster than dbench_sendpar
    1.61 ± 0.23  times faster than dbench_copypar
  275.09 ± 26.85 times faster than dbench_dmmappar
  352.58 ± 34.36 times faster than dbench_mmapseq
  361.62 ± 35.24 times faster than dbench_chunkseq
  365.65 ± 37.74 times faster than dbench_blockseq
  389.06 ± 37.91 times faster than dbench_dmmapseq
  560.34 ± 91.21 times faster than dbench_mmappar

It is unsurprising that GNU cat is highly optimized for untrimmed copy, leveraging copy_file_range(2) since 2022. For reference, FreeBSD also uses this approach.

However, dcat is competitive as a fast alternative for systems where a similarly optimized cat is not available, such as on Busybox Linux distributions and other POSIX systems. It also supports trimming which may be challenging to implement efficently via shell scripts.

macOS

dbench_blockseq ran
    1.23 ± 0.46 times faster than dbench_chunkseq
    1.30 ± 0.48 times faster than dbench_dmmapseq
    1.34 ± 0.51 times faster than cat
    1.46 ± 0.56 times faster than dbench_dmmappar
    1.69 ± 0.63 times faster than dbench_mmappar
    1.70 ± 0.63 times faster than dbench_mmapseq

While macOS's libc provides file copying functions, it lacks partial file moving capabilities. See Apple developer notes: NSFileManager, FSCopyObjectAsync

Acknowledgments

Thanks for checking out dcat! Special thanks to:

Here are some related links:

License

This project is licensed under the MIT License. You are free to use, modify, and distribute this software, but please provide attribution.