Skip to content

Commit

Permalink
change to binf.science because google is sunsetting page.link
Browse files Browse the repository at this point in the history
  • Loading branch information
hasindu2008 committed Sep 26, 2024
1 parent c9b5c39 commit 3a2a479
Show file tree
Hide file tree
Showing 9 changed files with 23 additions and 22 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
cmake-build-debug/
prebuilt-hdf5/
test/data/out
/*.slow5.idx

# OS generated files #
######################
Expand Down
24 changes: 12 additions & 12 deletions docs/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,9 +69,9 @@ An NA24385 R10.4.1 LSK114 dataset sequenced on a PromethION is available on [SRA

| <sub>Description</sub> | <sub>SRA/ENA run Data access</sub> | <sub>Direct download link (md5sum)</sub> |
|------------------------------------------------------|------------------------------------------------------------------------------------------------------------|----------------------|
| <sub>~20K reads subsubset (BLOW5 format)</sub> | | <sub>[hg2_prom_lsk114_subsubsample.tar](https://slow5.page.link/hg2_prom_subsub)</sub> <sub>(`4d338e1cffd6dbf562cc55d9fcca040c`)</sub> |
| <sub>~500K reads subset (BLOW5 format)</sub> | <sub>[SRR23215365](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23215365&display=data-access)</sub> | <sub>[hg2_subsample_slow5.tar](https://slow5.page.link/hg2_prom_sub_slow5)</sub> <sub>(`65386e1da1d82b892677ad5614e8d84d`)</sub> |
| <sub>~15M reads complete PromethION dataset (BLOW5 format)</sub> | <sub>[SRR23215366](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23215366&display=data-access)/[ERR11777845](https://www.ebi.ac.uk/ena/browser/view/ERR11777845)</sub> | <sub> [PGXX22394_reads.blow5](https://slow5.page.link/hg2_prom_slow5) (`3498b595ac7c79a3d2dce47454095610`), [PGXX22394_reads.blow5.idx](https://slow5.page.link/hg2_prom_slow5_idx) (`1e11735c10cf63edc4a7114f010cc472`)</sub>* |
| <sub>~20K reads subsubset (BLOW5 format)</sub> | | <sub>[hg2_prom_lsk114_subsubsample.tar](https://slow5.bioinf.science/hg2_prom_subsub)</sub> <sub>(`4d338e1cffd6dbf562cc55d9fcca040c`)</sub> |
| <sub>~500K reads subset (BLOW5 format)</sub> | <sub>[SRR23215365](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23215365&display=data-access)</sub> | <sub>[hg2_subsample_slow5.tar](https://slow5.bioinf.science/hg2_prom_sub_slow5)</sub> <sub>(`65386e1da1d82b892677ad5614e8d84d`)</sub> |
| <sub>~15M reads complete PromethION dataset (BLOW5 format)</sub> | <sub>[SRR23215366](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23215366&display=data-access)/[ERR11777845](https://www.ebi.ac.uk/ena/browser/view/ERR11777845)</sub> | <sub> [PGXX22394_reads.blow5](https://slow5.bioinf.science/hg2_prom_slow5) (`3498b595ac7c79a3d2dce47454095610`), [PGXX22394_reads.blow5.idx](https://slow5.bioinf.science/hg2_prom_slow5_idx) (`1e11735c10cf63edc4a7114f010cc472`)</sub>* |

*This dataset is hosted in the [gtgseq AWS bucket](https://aws.amazon.com/marketplace/pp/prodview-rve772jpfevtw) granted by the AWS open data sponsorship programme, for which the documentation available under the [gtgseq GitHub repository](https://github.com/GenTechGp/gtgseq).

Expand All @@ -81,7 +81,7 @@ An NA12878 R10.4.1 LSK114 dataset sequenced on a PromethION at 4KHz sampling rat

| <sub>Description</sub> | <sub>ENA run Data access</sub> | <sub>Direct download link (md5sum)</sub> |
|------------------------------------------------------|------------------------------------------------------------------------------------------------------------|----------------------|
| <sub>~11M reads complete PromethION dataset (BLOW5 format)</sub> | <sub>[ERR11777844](https://www.ebi.ac.uk/ena/browser/view/ERR11777844)</sub> | <sub> [PGXXHX230142_reads.blow5](https://slow5.page.link/na12878_prom2_slow5) (`24266f6dabb8d679f7f520be6aa22694`), [PGXXHX230142_reads.blow5.idx](https://slow5.page.link/na12878_prom2_slow5_idx) (`a5659f829b9410616391427b2526b853`) </sub>* |
| <sub>~11M reads complete PromethION dataset (BLOW5 format)</sub> | <sub>[ERR11777844](https://www.ebi.ac.uk/ena/browser/view/ERR11777844)</sub> | <sub> [PGXXHX230142_reads.blow5](https://slow5.bioinf.science/na12878_prom2_slow5) (`24266f6dabb8d679f7f520be6aa22694`), [PGXXHX230142_reads.blow5.idx](https://slow5.bioinf.science/na12878_prom2_slow5_idx) (`a5659f829b9410616391427b2526b853`) </sub>* |

*This dataset is hosted in the [gtgseq AWS bucket](https://aws.amazon.com/marketplace/pp/prodview-rve772jpfevtw) granted by the AWS open data sponsorship programme, for which the documentation available under the [gtgseq GitHub repository](https://github.com/GenTechGp/gtgseq).

Expand All @@ -97,9 +97,9 @@ The NA12878 R9.4.1 PromethION dataset sequenced for the [SLOW5 paper](https://ww

| <sub>Description</sub> | <sub>SRA run Data access</sub> | <sub>Direct download link (md5sum)</sub> |
|------------------------------------------------------|------------------------------------------------------------------------------------------------------------|----------------------|
| <sub>~20K reads subsubset</sub> | - | <sub>[NA12878_prom_subsubsample.tar.gz](https://slow5.page.link/na12878_prom_subsub)</sub> <sub>(`f64074151d25d6e35c73f668d4146032`)</sub> |
| <sub>~500K reads subset</sub> | <sub>[SRR22186403](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR22186403&display=data-access)</sub> | <sub>[subsample_slow5.tar](https://slow5.page.link/na12878_prom_sub_slow5)</sub> <sub>(`6cdbe02c3844960bb13cf94b9c3173bb`)</sub> |
| <sub>~9M reads complete PromethION dataset</sub> | <sub>[SRR22186402](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR22186402&display=data-access)</sub> | <sub>[na12878_prom_merged.blow5](https://slow5.page.link/na12878_prom_slow5) (`7e1a5900aff10e2cf1b97b8d3c6ecd1e`), [na12878_prom_merged.blow5.idx](https://slow5.page.link/na12878_prom_slow5_idx) (`a78919e8ac8639788942dbc3f1a2451a`) </sub> |
| <sub>~20K reads subsubset</sub> | - | <sub>[NA12878_prom_subsubsample.tar.gz](https://slow5.bioinf.science/na12878_prom_subsub)</sub> <sub>(`f64074151d25d6e35c73f668d4146032`)</sub> |
| <sub>~500K reads subset</sub> | <sub>[SRR22186403](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR22186403&display=data-access)</sub> | <sub>[subsample_slow5.tar](https://slow5.bioinf.science/na12878_prom_sub_slow5)</sub> <sub>(`6cdbe02c3844960bb13cf94b9c3173bb`)</sub> |
| <sub>~9M reads complete PromethION dataset</sub> | <sub>[SRR22186402](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR22186402&display=data-access)</sub> | <sub>[na12878_prom_merged.blow5](https://slow5.bioinf.science/na12878_prom_slow5) (`7e1a5900aff10e2cf1b97b8d3c6ecd1e`), [na12878_prom_merged.blow5.idx](https://slow5.bioinf.science/na12878_prom_slow5_idx) (`a78919e8ac8639788942dbc3f1a2451a`) </sub> |


### MinION R9.4.1 selective sequencing datasets
Expand All @@ -113,19 +113,19 @@ MinION datsets sequenced with readfish selective sequencing for [Comprehensive g
Following public datasets from others have been converted to BLOW5 format. Relatively smaller datasets (hundreds of GBs) are directly available for download. Larger datasets (terabytes) have been uploaded to [SRA](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA932454) and are available for cloud delivery. Alternatively, these converted BLOW5 files are currently stored locally in a archive storage at Garvan Institute, if anyone is interested contact.

1. [SP1 SARS-CoV-2 dataset](https://community.artic.network/t/links-to-raw-fast5-fastq-data-for-artic-protocol/17):
- [SP1-raw-mapped.blow5](https://slow5.page.link/SP1-raw-mapped) (md5sum: `d87c60f70bf8646ee56bcee2795e7535`)
- [SP1-raw-mapped.blow5.idx](https://slow5.page.link/SP1-raw-mapped-idx) (md5sum: `c79ef9280be63fad7c07e4352402ce7a`)
- [SP1-raw-mapped.blow5](https://slow5.bioinf.science/SP1-raw-mapped) (md5sum: `d87c60f70bf8646ee56bcee2795e7535`)
- [SP1-raw-mapped.blow5.idx](https://slow5.bioinf.science/SP1-raw-mapped-idx) (md5sum: `c79ef9280be63fad7c07e4352402ce7a`)

2. Some of the [Zymo Mock community](https://github.com/LomanLab/mockcommunity) data:
- [Zymo-GridION-EVEN-BB-SN.blow5](https://slow5.page.link/Zymo-GridION-EVEN-BB-SN) (md5sum: `d7c894164aef398907adc6c034dd3049`)
- [Zymo-GridION-EVEN-BB-SN.blow5.idx](https://slow5.page.link/Zymo-GridION-EVEN-BB-SN-idx) (md5sum: `d7d5feae1107c6d4517ebb416dc02683`)
- [Zymo-GridION-EVEN-BB-SN.blow5](https://slow5.bioinf.science/Zymo-GridION-EVEN-BB-SN) (md5sum: `d7c894164aef398907adc6c034dd3049`)
- [Zymo-GridION-EVEN-BB-SN.blow5.idx](https://slow5.bioinf.science/Zymo-GridION-EVEN-BB-SN-idx) (md5sum: `d7d5feae1107c6d4517ebb416dc02683`)

3. All raw nanopore data from [Telomere-to-telomere consortium CHM13 project](https://github.com/marbl/CHM13)
- BLOW5 files available from [SRR23371619](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23371619&display=data-access). file name: `CHM13_T2T_ONT_blow5.tar` (md5sum: `04f9d1c6ea2d11ccfc131c8244f059d3`).

4. All [nanopore-wgs-consortium](https://github.com/nanopore-wgs-consortium/NA12878) datasets:
- BLOW5 files for the DNA dataset available from [SRR23513620](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23513620&display=data-access). filename: `na12878_DNA_blow5.tar` (md5sum: `2d02a7706d00572dcd9fcfa96e0357f4`)
- BLOW5 files for the direct-RNA dataset available from [SRR23513624](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23513624&display=data-access). filename: `na12878_directRNA_blow5.tar` (md5sum: `282e305f2b6a72d28980a8d5c803d54e`. Also available for direct download from [na12878_rna_merged.blow5](https://slow5.page.link/na12878_rna) (md5sum: `36bc164e9d885838245073f6cd2ecd79`), [na12878_rna_merged.blow5.idx](https://slow5.page.link/na12878_rna_idx) (md5sum: `82f96208ac2f42574abe0cf5a3954602`)
- BLOW5 files for the direct-RNA dataset available from [SRR23513624](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23513624&display=data-access). filename: `na12878_directRNA_blow5.tar` (md5sum: `282e305f2b6a72d28980a8d5c803d54e`. Also available for direct download from [na12878_rna_merged.blow5](https://slow5.bioinf.science/na12878_rna) (md5sum: `36bc164e9d885838245073f6cd2ecd79`), [na12878_rna_merged.blow5.idx](https://slow5.bioinf.science/na12878_rna_idx) (md5sum: `82f96208ac2f42574abe0cf5a3954602`)
- BLOW5 files for the cDNA-RNA dataset available from [SRR23513622](https://trace.ncbi.nlm.nih.gov/Traces/?view=run_browser&acc=SRR23513622&display=data-access). filename: `na12878_cDNA_blow5.tar` (md5sum: `cba2ce651d8c33528e594a9e45ff6515`)

5. All RNA datasets from the [Singapore Nanopore-Expression Project (SG-NEx)](https://github.com/GoekeLab/sg-nex-data) are available in BLOW5 format in the [sg-nex-data-blow5 AWS S3 bucket](http://sg-nex-data-blow5.s3-website-ap-southeast-1.amazonaws.com/). We highly acknowledge [Jonathan Göke](https://github.com/jonathangoeke), [Chen Ying](https://github.com/cying111) for being open and hosting BLOW5 files through the AWS Open Data. Please visit [SG-NEx_blow5_tutorial](https://github.com/GoekeLab/sg-nex-data/blob/master/docs/SG-NEx_blow5_tutorial.md) on how you could use/analyse this data.
Expand Down
2 changes: 1 addition & 1 deletion docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ See [this post](https://github.com/nanoporetech/vbz_compression/issues/5) for tr

**Q4:** Are there any small example datasets that can be used for testing slow5tools?

A tiny subset (~20K reads) of the original NA12878 R9.4.1 PromethION dataset used for benchmaking in the [SLOW5 paper](https://www.nature.com/articles/s41587-021-01147-4) is available [here](https://slow5.page.link/na12878_prom_subsub). A tiny subset (~20K reads) of a NA24385 R10.4.1 PromethION dataset is available [here](https://slow5.page.link/hg2_prom_subsub). Links and information on complete datasets of those samples as well as additional datasets can be found [here](https://hasindu2008.github.io/slow5tools/datasets.html).
A tiny subset (~20K reads) of the original NA12878 R9.4.1 PromethION dataset used for benchmaking in the [SLOW5 paper](https://www.nature.com/articles/s41587-021-01147-4) is available [here](https://slow5.bioinf.science/na12878_prom_subsub). A tiny subset (~20K reads) of a NA24385 R10.4.1 PromethION dataset is available [here](https://slow5.bioinf.science/hg2_prom_subsub). Links and information on complete datasets of those samples as well as additional datasets can be found [here](https://hasindu2008.github.io/slow5tools/datasets.html).

**Q5:** How can I make SLOW5 to FAST5 conversion fast?

Expand Down
2 changes: 1 addition & 1 deletion scripts/install-vbz.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ print() {
echo -e "${GREEN}$1${NC}" >&2
}

MANUAL_LINK="https://f5c.page.link/troubleshoot"
MANUAL_LINK="https://f5c.bioinf.science/troubleshoot"

uname -o || die "Could not determine the O/S. See ${MANUAL_LINK}"
uname -m || die "Could not determine the architecture. See ${MANUAL_LINK}"
Expand Down
2 changes: 1 addition & 1 deletion src/cmd.h
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@
" none - no special signal compression\n" \
" svb-zd - StreamVByte with zig-zag delta\n" \
" ex-zd - exception with zig-zag delta\n\n" \
"See https://slow5.page.link/man for detailed description of these command-line options.\n"
"See https://slow5.bioinf.science/man for detailed description of these command-line options.\n"



Expand Down
2 changes: 1 addition & 1 deletion src/split.c
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,7 @@ int split_func(std::vector<std::string> slow5_files_input, opt_t user_opts, meta
if (read_group_count_i > 1 &&
meta_split_method_object.splitMethod != GROUP_SPLIT &&
meta_split_method_object.splitMethod != DEMUX_SPLIT) {
ERROR("The file %s contains multiple read groups. You must first separate the read groups using -g. See https://slow5.page.link/faq for more info.", slow5_files_input[i].c_str());
ERROR("The file %s contains multiple read groups. You must first separate the read groups using -g. See https://slow5.bioinf.science/faq for more info.", slow5_files_input[i].c_str());
return -1;
}
if(user_opts.flag_lossy==0 && input_slow5_file_i->header->aux_meta == NULL){
Expand Down
4 changes: 2 additions & 2 deletions test/download_test_dataset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,10 @@ download (){

download_dir=test/
# test -d $download_dir/NA12878_prom_subsubsample && rm -r $download_dir/NA12878_prom_subsubsample
# link="https://slow5.page.link/na12878_prom_subsub"
# link="https://slow5.bioinf.science/na12878_prom_subsub"
# download

test -d $download_dir/fast5_soup && rm -r $download_dir/fast5_soup
link="https://slow5.page.link/fast5-soup"
link="https://slow5.bioinf.science/fast5-soup"
download

6 changes: 3 additions & 3 deletions test/test_extensive.sh
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ guppy_basecaller --version > /dev/null || die "guppy_basecaller not in path"

echo "*******************************NA12878_prom_subsubsample**************************************"
DATA_NA12878_SUBSUB=/data/slow5-testdata/NA12878_prom_subsubsample
test -d $DATA_NA12878_SUBSUB || die "ERROR: $DATA_NA12878_SUBSUB not found. Download from https://slow5.page.link/na12878_prom_subsub and extract"
test -d $DATA_NA12878_SUBSUB || die "ERROR: $DATA_NA12878_SUBSUB not found. Download from https://slow5.bioinf.science/na12878_prom_subsub and extract"
mkdir $TMP_DIR || die "Creating $TMP_DIR failed"
test/test_with_guppy.sh $DATA_NA12878_SUBSUB/fast5 $TMP_DIR ./slow5tools guppy_basecaller &> test_s2f_with_guppy_subsub.log || die "test_s2f_with_guppy failed"
rm -r $TMP_DIR
Expand All @@ -58,7 +58,7 @@ echo ""

echo "********************************NA12878_prom_subsample****************************************"
DATA_NA12878=/data/slow5-testdata/NA12878_prom_subsample
test -d $DATA_NA12878 || die "ERROR: $DATA_NA12878 not found. Download from https://slow5.page.link/na12878_prom_sub and extract"
test -d $DATA_NA12878 || die "ERROR: $DATA_NA12878 not found. Download from https://slow5.bioinf.science/na12878_prom_sub and extract"
mkdir $TMP_DIR || die "Creating $TMP_DIR failed"
test/test_with_guppy.sh $DATA_NA12878/fast5 $TMP_DIR ./slow5tools guppy_basecaller &> test_s2f_with_guppy_sub.log || die "test_s2f_with_guppy failed"
rm -r $TMP_DIR
Expand All @@ -71,7 +71,7 @@ echo ""

echo "**************************************fast5-soup**********************************************"
DATA_MISC=/data/slow5-testdata/fast5-soup/
test -d $DATA_MISC || die "ERROR: $DATA_MISC not found. Download from https://slow5.page.link/fast5-soup and extract"
test -d $DATA_MISC || die "ERROR: $DATA_MISC not found. Download from https://slow5.bioinf.science/fast5-soup and extract"
mkdir $TMP_DIR || die "Creating $TMP_DIR failed"
test/test_with_guppy.sh $DATA_MISC $TMP_DIR ./slow5tools guppy_basecaller &> test_s2f_with_guppy_soup.log || die "test_s2f_with_guppy failed for fast5 soup"
rm -r $TMP_DIR
Expand Down

0 comments on commit 3a2a479

Please sign in to comment.