GitHub - RNABioInfo/mcaat: Finding all CRISPR Arrays in Metagenomic Datasets using Graph-Based Strategies

😸 MCAAT - Metagenomic CRISPR Array Analysis Tool

CRISPR-Cas is a bacterial immune system also famous for its use in genome editing. The diversity of known systems could be significantly increased by metagenomic data.
Here we present the Metagenomic CRISPR Array Analysis Tool MCAAT, a highly sensitive algorithm for finding CRISPR Arrays in un-assembled metagenomic data.
It takes advantage of the properties of CRISPR arrays that form multicycles in de Bruijn graphs.
MCAAT's assembly-free graph-based strategy outperforms assembly-based workflows and other assembly-free methods on synthetic and real metagenomes.

🥳 NEWS

Docker container available under: https://hub.docker.com/r/feeka94/mcaat
Version 0.3 makes use of following optimization techniques:
- Better data structures for preprocessing, phmap::flat_hash_set
- Added compiler intrinsics to guide the hardware in the right direction
- Reserving the capacity to prevent rehashing In depth technical details: educational resource and optimization developer notes. As a result of the above optimizations we achieved 17-25 times speedup in 1billion node graph(from 3 days to 3 hours). Considering the complexity of the graphs, this is a huge improvement.

Installation using docker

Docker Build

docker build -t mcaat .

Run the Tool Using Docker

Mount your working directory to access input/output files:

docker run --rm -v $(pwd):/data mcaat \
  --input_files /data/reads_R1.fastq /data/reads_R2.fastq \
  --output-folder /data/results

Final Image Size

The final image is based on debian:bookworm-slim and includes only:

The mcaat binary
Runtime libraries: libomp5, zlib1g

This keeps the image small and portable.

Clean Up

To remove the image:

docker rmi mcaat

Compiling the project

🔧 Build the Project

To allow ./install.sh make changes, we execute following command:

chmod +x ./install.sh

You can build the project and the working version will be saved in the build folder.

./install.sh

It is also possible to install the library by simply putting the --install flag.

./install.sh --install

To clean up you can use --clean flag.

Usage

./mcaat --input-files <file1> [file2] [--ram <amount>] [--threads <num>] [--output-folder <path>] [--help]

🧾 Command-Line Arguments

✅ Required

Argument	Description
`--input_files <file1> [file2]`	One or two input FASTA/FASTQ files. If one file is provided, it is treated as single-end data. If two files are provided, they are treated as paired-end reads.

⚙️ Optional

Argument	Description
`--ram <amount>`	Maximum RAM to use. Units: `B`, `K`, `M`, `G`. Default: 95% of system RAM Example: `--ram 4G`
`--threads <num>`	Number of threads to use. Default: total CPU cores minus 2
`--output-folder <path>`	Output directory for results. If not provided, a timestamped folder will be created automatically. If provided, the folder is used exactly as given.
`--help`, `-h`	Show usage information and exit

📁 Output Structure

The tool creates the following directory structure inside the specified output folder:

<output-folder>/
├── CRISPR_Arrays.txt         # Raw CRISPR array output

🧪 Example Usage

Scenario	Command
Paired-end input with custom output	`./mcaat --input_files reads_R1.fastq reads_R2.fastq --ram 8G --threads 12 --output-folder results/my_run`
Single-end input with default output	`./mcaat --input_files reads.fastq` Creates a folder like `mcaat_run_2025-07-07_15-30-00/`

Notes

Input files must exist and be accessible.
If RAM is set below 1 GB or above system capacity, the program will exit with an error.
If only one input file is provided, the tool assumes single-end data.

⚙️ Settings file support

Create a simple key=value text file (one setting per line) and pass it with --settings /path/to/file.

The program reads values from this file unless you override them with CLI flags. If you change the file, run the program again — new values will be used.

Example of settings.txt (must include input_files):

# MUST INCLUDE
input_files=/data/sample_folder/1.fastq /data/sample_folder/2.fastq.fastq
ram=128G
threads=26
output_folder=results/run_2025-11-19
# OPTIONAL
cycle_max_length=77
cycle_min_length=27
threshold_multiplicity=20
low_abundance=true

Notes:

input_files accepts one or two paths; entries may be separated by spaces, commas, or semicolons.
Terminal values will override the settings.txt. For example for simplicity you can use the settings.txt file and change only -i parameter.

Requirements

C++17 compiler
RapidFuzz (for fuzzy string matching)
Filesystem support (<filesystem>)

Support

If you encounter issues or have questions, feel free to open an issue or write us an email: fikrat.talibli@ibmg.uni-stuttgart.de. If you are using this software please cite this paper: https://academic.oup.com/microlife/article/doi/10.1093/femsml/uqaf016/8205558.

Name		Name	Last commit message	Last commit date
Latest commit History 130 Commits
include		include
libs		libs
src		src
tests		tests
.gitignore		.gitignore
.gitmodules		.gitmodules
CITATION.cff		CITATION.cff
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
install.sh		install.sh
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

😸 MCAAT - Metagenomic CRISPR Array Analysis Tool

🥳 NEWS

Installation using docker

Docker Build

Run the Tool Using Docker

Final Image Size

Clean Up

Compiling the project

🔧 Build the Project

Usage

🧾 Command-Line Arguments

✅ Required

⚙️ Optional

📁 Output Structure

🧪 Example Usage

Notes

⚙️ Settings file support

Requirements

Support

About

Uh oh!

Releases 4

Packages

Contributors 2

Uh oh!

Languages

License

RNABioInfo/mcaat

Folders and files

Latest commit

History

Repository files navigation

😸 MCAAT - Metagenomic CRISPR Array Analysis Tool

🥳 NEWS

Installation using docker

Docker Build

Run the Tool Using Docker

Final Image Size

Clean Up

Compiling the project

🔧 Build the Project

Usage

🧾 Command-Line Arguments

✅ Required

⚙️ Optional

📁 Output Structure

🧪 Example Usage

Notes

⚙️ Settings file support

Requirements

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 2

Uh oh!

Languages

Packages