Skip to content

binpash/fractal

Repository files navigation

Fractal: Fault-Tolerant Shell-Script Distribution

Overview | Quick Setup | More Info | Structure | Community | Citing | License & Contributions

For issues and ideas, email fractal@brown.edu or, better, open a GitHub issue.

Fractal executes unmodified POSIX shell scripts across a cluster and recovers automatically from node failures. It bolts failure tolerance on top of DiSh, a state-of-the-art shell-script distribution system, and is described in an upcoming NSDI'26 paper.

Overview:

Fractal is an open source, MIT-licensed system that offers fault-tolerant distributed execution of unmodified shell scripts. It first identifies recoverable regions from side-effectful ones, and augments them with additional runtime support aimed at fault recovery. It employs precise dependency and progress tracking at the subgraph level to offer sound and efficient fault recovery. It minimizes the number of upstream regions that are re-executed during recovery and ensures exactly-once semantics upon recovery for downstream regions. Fractal's fault-free performance is comparable to state-of-the-art failure-intolerant distributed shell-script execution engines, while in cases of failures it recoveres 7.8–16.4× compared to Hadoop Streaming.

At a glance:

  • No script changes – full POSIX shell semantics.
  • Exactly-once semantics via remote pipes and replay suppression.
  • Per-subgraph dynamic decision to persist or stream data
  • Millisecond-scale re-scheduling driven by HDFS heartbeats + 17-byte events.

Quick Setup

To quickly set up Fractal on a single host (Docker Compose, tested on Linux):

# Clone with submodule so PaSh code is present
$ git clone --recurse-submodules https://github.com/binpash/dish.git
$ cd dish/docker-hadoop
# Spin up 1 namenode, 1 datanode, 1 client container
$ ./setup-compose.sh

To tear Fractal down: ./stop-compose.sh (add -v to prune volumes).

More Information

After installing fractal, run it inside the client container:

# put a sample file in HDFS
hdfs dfs -put /etc/hosts /hosts
# Execute a tiny script with fault tolerance on (dynamic persistence)
cd /opt/dish
./fractal.sh --ft dynamic scripts/sample.sh   # output identical to bash

Inject a fail-stop fault: ./fractal.sh --ft dynamic --kill regular scripts/sample.sh.

Repository Structure

Here are the key components of the Fractal repository:

  • pash/: PaSh submodule – compiler & JIT groundwork
  • runtime/: Remote Pipe, DFS reader, Go libraries
  • pash/compiler/dspash/: Fractal scheduler, executor, along with health and progress monitors
  • docker-hadoop/: Local and CloudLab cluster bootstrap
  • evaluation/: Benchmarks & fault-injection scripts
  • scripts/: Miscallencous helper scripts

Detailed system architecture: The figure below describes Fractal's key components. A1A6 annotate control-plane stages; B1-B4 run on each executor. Fractal first isolates side-effectful regions from recoverable regions; it then executes recoverable subgraphs on nodes, tracking locality, dependencies, progress, and health; and it detects failures, re-scheduling the minimal set of unfinished subgraphs for re-execution.

Fractal architecture

The list of components is explained below, along with their location in the code:

  • A1: DFG augmentation and isolation of the unsafe-main subgraph (in prepare_graph_for_remote_exec)
  • A2: Remote pipe instrumentation, which injects read/write nodes that track byte offsets (in remote_pipe, pipes)
  • A3: Dynamic output persistence, a heuristic that chooses between spilling to disk or streaming (in add_singular_flags, check_persisted_discovery, writeOptimized)
  • A4: Scheduler and batched dispatch of subgraphs to executors (worker_manager)
  • A5: Progress monitor and discovery, a 17-byte completion events and endpoint registry (discovery, datastream)
  • A6: Health monitor, which polls HDFS Namenode JMX to identify slow/failed nodes (hdfs_utils)
  • B1: Executor no-blocking event loop, which launches subgraphs (EventLoop)
  • B2: Remote pipe data path within executor (socket/file, buffered I/O) (datastream)
  • B3: Distributed file reader, which streams HDFS splits locally (dfs)
  • B4: On-node cache of persisted outputs, which avoids re-computation after faults (writeOptimized)

Community and More

Fractal is a member of the PaSh family of systems, availabile by the Linux Foundation. Please join the community:

Citing Fractal

Fractal is backed up by state-of-the-art research—if you are using it to accelerate your processing, consider citing the following paper:

@inproceedings{fractal:nsdi:2026,
 author = {Zhicheng Huang and Ramiz Dundar and Yizheng Xie and Konstantinos Kallas and Nikos Vasilakis},
 title = {Fractal: Fault-Tolerant Shell-Script Distribution},
 booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
 year = {2026},
 address = {Renton, WA},
 publisher = {USENIX Association},
 month = may
}
(More bibtex) Fractal builds on DiSh and PaSh

The DiSh paper, from NSDI'23:

@inproceedings{dish:nsdi:2023,
 author = {Tammam Mustafa and Konstantinos Kallas and Pratyush Das and Nikos Vasilakis},
 title = {{DiSh}: Dynamic {Shell-Script} Distribution},
 booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
 year = {2023},
 isbn = {978-1-939133-33-5},
 address = {Boston, MA},
 pages = {341--356},
 url = {https://www.usenix.org/conference/nsdi23/presentation/mustafa},
 publisher = {USENIX Association},
 month = apr
}

The PaSh paper, from OSDI'22:

@inproceedings{pash:osdi:2022,
 author = {Konstantinos Kallas and Tammam Mustafa and Jan Bielak and Dimitris Karnikis and Thurston H.Y. Dang and Michael Greenberg and Nikos Vasilakis},
 title = {Practically Correct, {Just-in-Time} Shell Script Parallelization},
 booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
 year = {2022},
 isbn = {978-1-939133-28-1},
 address = {Carlsbad, CA},
 pages = {769--785},
 url = {https://www.usenix.org/conference/osdi22/presentation/kallas},
 publisher = {USENIX Association},
 month = jul
}

License & Contributions

Fractal is an open-source, collaborative, MIT-licensed project available by the Linux Foundation and developed by researchers at Brown University and UCLA. If you'd like to contribute, please see the CONTRIBUTING.md file—we welcome contributions! And please come talk to us if you're looking to optimize shell programs!

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 7