Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,9 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),

## [Unreleased]

### Added
- Added support for the T2T-CHM13v2.0 human reference genome (CHM13-T2T), nuclear chromosomes only (1–22, X, Y), WGS matrices, distributed via the AlexandrovLab FTP.

## [1.3.6] - 2025-10-28

### Added
Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ The framework is written in PYTHON, however, it also requires the following soft
* WGET version 1.9 or RSYNC if you have a firewall

By default the installation process will save the FASTA files for all chromosomes for the default genome
assemblies (GRCh37, GRCH38, mm10, mm9, rn6). As a result, ~3 Gb of storage must be available for the downloads for each genome.
assemblies (CHM13-T2T, GRCh37, GRCh38, mm10, mm9, rn6). As a result, ~3 Gb of storage must be available for the downloads for each genome.

**QUICK START GUIDE**

Expand Down Expand Up @@ -83,7 +83,7 @@ View the table below for the full list of parameters.
| ------ | ----------- | ----------- | ----------- |
| Required | | | |
| | project | String | The name of the project. |
| | reference_genome | String | The name of the reference genome. Full list of genomes under **Supported Genomes** section. Supported values include the following: {c_elegans, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, yeast} |
| | reference_genome | String | The name of the reference genome. Full list of genomes under **Supported Genomes** section. Supported values include the following: {c_elegans, CHM13-T2T, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, yeast} |
| | path_to_input_files | String | The path to the input files. |
| Optional | | | |
| | exome | Boolean | Downsamples mutational matrices to the exome regions of the genome. Default value False. |
Expand Down Expand Up @@ -204,6 +204,9 @@ SigProfilerMatrixGenerator cnv_matrix_generator BATTENBERG ./SigProfilerMatrixGe

This tool currently supports the following genomes:

T2T-CHM13v2.0 [CHM13-T2T] (Telomere-to-Telomere Consortium CHM13 assembly v2.0), INSDC
Assembly GCA_009914755.4. Nuclear chromosomes only (1–22, X, Y; no mitochondrion). WGS matrices supported.

GRCh38.p12 [GRCh38] (Genome Reference Consortium Human Reference 38), INSDC
Assembly GCA_000001405.27, Dec 2013. Released July 2014. Last updated January 2018. This genome was downloaded from ENSEMBL database version 93.38.

Expand Down
4 changes: 2 additions & 2 deletions SigProfilerMatrixGenerator/controllers/cli_controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def parse_arguments_install(args: List[str]) -> argparse.Namespace:
parser = argparse.ArgumentParser(description="Install reference genome files.")
parser.add_argument(
"genome",
help="The reference genome to install. Supported genomes include {c_elegans, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, rn7, yeast}.",
help="The reference genome to install. Supported genomes include {c_elegans, CHM13-T2T, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, rn7, yeast}.",
)
parser.add_argument(
"-l",
Expand Down Expand Up @@ -85,7 +85,7 @@ def parse_arguments_matrix_generator(args: List[str]) -> argparse.Namespace:
parser.add_argument("project", help="The name of the project.")
parser.add_argument(
"reference_genome",
help="The name of the reference genome. Supported values {c_elegans, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, rn7, yeast}.",
help="The name of the reference genome. Supported values {c_elegans, CHM13-T2T, dog, ebv, GRCh37, GRCh38, mm9, mm10, mm39, rn6, rn7, yeast}.",
)
parser.add_argument("path_to_input_files", help="The path to the input files.")

Expand Down

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def BED_sorting(bed_file_path, genome):
],
**{
g: common_chrom_list
for g in ["GRCh37", "GRCh38", "dog", "ebv", "mm10", "mm9", "mm39", "rn6"]
for g in ["GRCh37", "GRCh38", "CHM13-T2T", "dog", "ebv", "mm10", "mm9", "mm39", "rn6"]
},
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1597,6 +1597,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -1729,6 +1730,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -1880,6 +1882,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -2013,6 +2016,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -2240,6 +2244,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -2370,6 +2375,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -2502,6 +2508,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -2637,6 +2644,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down Expand Up @@ -2768,6 +2776,7 @@ def SigProfilerMatrixGeneratorFunc(
for genome in [
"GRCh37",
"GRCh38",
"CHM13-T2T",
"dog",
"ebv",
"mm10",
Expand Down
26 changes: 26 additions & 0 deletions SigProfilerMatrixGenerator/scripts/reference_genome_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,32 @@
"X": "d5edbea3cf5d1716765dd4a7b41b7656",
"MT": "dfd6db5743d399516d5c8dadee5bee78",
},
"CHM13-T2T": {
"1": "8160d885ea68cbd9e1a85f1a62f5c0f1",
"2": "5041bd8918b488102bd3fc17f4eff66e",
"3": "b499fea5aea8934cb8de2531157d49ec",
"4": "a2a3a52de2cca0e9fb5781901e1dfba8",
"5": "ee7f9612de6e8b1ec484c02b2b31e9fe",
"6": "2f9202597543ae1adc1dd31571a9c3d2",
"7": "be89f4bcc4553389fcc4e5f225db77da",
"8": "b0934433605dca7314235cc4059de0bb",
"9": "c669a1e483304201a6afa38ea0cedc53",
"10": "272a6ea1ec1bbbc3d5f34bd59df76457",
"11": "7bfa1f75f0c08c7ae7ff660f3171c6a6",
"12": "5fb792acd7f9cb00b1cc707b28d13723",
"13": "40cc90706b67aca49d409ef158e00685",
"14": "1a10df3d0fb9206d93fc3791666a54ae",
"15": "c60b41404acf203ffff4251c95ccb94c",
"16": "130a73687af01742dd35b0b5fbc225bb",
"17": "3b0ad5efe98372b5bf3e18d5938098de",
"18": "df7f62810c54fa7aa2d0c20481344252",
"19": "1e9f2c8bb1f11e45c1789a3e69c03094",
"20": "548df0167cbd6401d5f5b1c1050f7065",
"21": "460c01106a2402a25591a625c6a8ff73",
"22": "8d3aa47b4456add9d0b78cb70b155fda",
"X": "13dea22db0695374c214c15211c65424",
"Y": "3a5cc4f01eaa1eef9a34f9d5bd09ca3b",
},
"GRCh38_havana": {
"1": "c4ef4ee14a4f0f7b319e9ed01f2a9742",
"2": "189cb33da673afcf161b7724d78d314b",
Expand Down