Overview | Quick Setup | More Info | Structure | Community | Citing | License & Contributions
For issues and ideas, email fractal@brown.edu or, better, open a GitHub issue.
Fractal executes unmodified POSIX shell scripts across a cluster and recovers automatically from node failures. It bolts failure tolerance on top of DiSh, a state-of-the-art shell-script distribution system, and is described in an upcoming NSDI'26 paper.
Fractal is an open source, MIT-licensed system that offers fault-tolerant distributed execution of unmodified shell scripts. It first identifies recoverable regions from side-effectful ones, and augments them with additional runtime support aimed at fault recovery. It employs precise dependency and progress tracking at the subgraph level to offer sound and efficient fault recovery. It minimizes the number of upstream regions that are re-executed during recovery and ensures exactly-once semantics upon recovery for downstream regions. Fractal's fault-free performance is comparable to state-of-the-art failure-intolerant distributed shell-script execution engines, while in cases of failures it recoveres 7.8–16.4× compared to Hadoop Streaming.
At a glance:
- No script changes – full POSIX shell semantics.
- Exactly-once semantics via remote pipes and replay suppression.
- Per-subgraph dynamic decision to persist or stream data
- Millisecond-scale re-scheduling driven by HDFS heartbeats + 17-byte events.
To quickly set up Fractal on a single host (Docker Compose, tested on Linux):
# Clone with submodule so PaSh code is present
$ git clone --recurse-submodules https://github.com/binpash/dish.git
$ cd dish/docker-hadoop
# Spin up 1 namenode, 1 datanode, 1 client container
$ ./setup-compose.shTo tear Fractal down: ./stop-compose.sh (add -v to prune volumes).
After installing fractal, run it inside the client container:
# put a sample file in HDFS
hdfs dfs -put /etc/hosts /hosts
# Execute a tiny script with fault tolerance on (dynamic persistence)
cd /opt/dish
./fractal.sh --ft dynamic scripts/sample.sh # output identical to bashInject a fail-stop fault: ./fractal.sh --ft dynamic --kill regular scripts/sample.sh.
Here are the key components of the Fractal repository:
pash/: PaSh submodule – compiler & JIT groundworkruntime/: Remote Pipe, DFS reader, Go librariespash/compiler/dspash/: Fractal scheduler, executor, along with health and progress monitorsdocker-hadoop/: Local and CloudLab cluster bootstrapevaluation/: Benchmarks & fault-injection scriptsscripts/: Miscallencous helper scripts
Detailed system architecture: The figure below describes Fractal's key components. A1–A6 annotate control-plane stages; B1-B4 run on each executor. Fractal first isolates side-effectful regions from recoverable regions; it then executes recoverable subgraphs on nodes, tracking locality, dependencies, progress, and health; and it detects failures, re-scheduling the minimal set of unfinished subgraphs for re-execution.
The list of components is explained below, along with their location in the code:
A1: DFG augmentation and isolation of the unsafe-main subgraph (inprepare_graph_for_remote_exec)A2: Remote pipe instrumentation, which injects read/write nodes that track byte offsets (inremote_pipe,pipes)A3: Dynamic output persistence, a heuristic that chooses between spilling to disk or streaming (inadd_singular_flags,check_persisted_discovery,writeOptimized)A4: Scheduler and batched dispatch of subgraphs to executors (worker_manager)A5: Progress monitor and discovery, a 17-byte completion events and endpoint registry (discovery,datastream)A6: Health monitor, which polls HDFS Namenode JMX to identify slow/failed nodes (hdfs_utils)B1: Executor no-blocking event loop, which launches subgraphs (EventLoop)B2: Remote pipe data path within executor (socket/file, buffered I/O) (datastream)B3: Distributed file reader, which streams HDFS splits locally (dfs)B4: On-node cache of persisted outputs, which avoids re-computation after faults (writeOptimized)
Fractal is a member of the PaSh family of systems, availabile by the Linux Foundation. Please join the community:
- Chat: Discord
- Email: fractal@brown.edu
- Issues: Open a GitHub issue
Fractal is backed up by state-of-the-art research—if you are using it to accelerate your processing, consider citing the following paper:
@inproceedings{fractal:nsdi:2026,
author = {Zhicheng Huang and Ramiz Dundar and Yizheng Xie and Konstantinos Kallas and Nikos Vasilakis},
title = {Fractal: Fault-Tolerant Shell-Script Distribution},
booktitle = {23rd USENIX Symposium on Networked Systems Design and Implementation (NSDI 26)},
year = {2026},
address = {Renton, WA},
publisher = {USENIX Association},
month = may
}(More bibtex) Fractal builds on DiSh and PaSh
The DiSh paper, from NSDI'23:
@inproceedings{dish:nsdi:2023,
author = {Tammam Mustafa and Konstantinos Kallas and Pratyush Das and Nikos Vasilakis},
title = {{DiSh}: Dynamic {Shell-Script} Distribution},
booktitle = {20th USENIX Symposium on Networked Systems Design and Implementation (NSDI 23)},
year = {2023},
isbn = {978-1-939133-33-5},
address = {Boston, MA},
pages = {341--356},
url = {https://www.usenix.org/conference/nsdi23/presentation/mustafa},
publisher = {USENIX Association},
month = apr
}The PaSh paper, from OSDI'22:
@inproceedings{pash:osdi:2022,
author = {Konstantinos Kallas and Tammam Mustafa and Jan Bielak and Dimitris Karnikis and Thurston H.Y. Dang and Michael Greenberg and Nikos Vasilakis},
title = {Practically Correct, {Just-in-Time} Shell Script Parallelization},
booktitle = {16th USENIX Symposium on Operating Systems Design and Implementation (OSDI 22)},
year = {2022},
isbn = {978-1-939133-28-1},
address = {Carlsbad, CA},
pages = {769--785},
url = {https://www.usenix.org/conference/osdi22/presentation/kallas},
publisher = {USENIX Association},
month = jul
}Fractal is an open-source, collaborative, MIT-licensed project available by the Linux Foundation and developed by researchers at Brown University and UCLA. If you'd like to contribute, please see the CONTRIBUTING.md file—we welcome contributions! And please come talk to us if you're looking to optimize shell programs!
