Simon L. Grimm, Jason A. Rothman✉︎, William J. Bradshaw, Kylie Langlois, Joshua A. Steele, John F. Griffith, Jeff T. Kaufman✉︎, Katrine L. Whiteson
Wastewater monitoring for pathogen monitoring has greatly matured over the course of COVID-19 pandemic. While most current wastewater surveillance programs only target specific pathogens using qPCR or amplicon sequencing, untargeted wastewater metatranscriptomic sequencing (W-MTS) offers broader detection capabilities. Here we present a dataset consisting of 13.1 terabases (43B read pairs) of untargeted Illumina W-MTS data, generated from 20 wastewater samples, with 1.4B to 2.8B 150bp read pairs per sample. Wastewater samples were collected between December 2023 and April 2024 at the Hyperion Water Reclamation Plant (HWRP), Los Angeles, serving a population of approximately 4 million residents. The resulting dataset, one of the largest W-MTS collections to date, contains bacterial, archaeal, eukaryotic, and viral taxa—including human-infecting viruses—and many sequences of unknown origin. Uploaded to the NCBI Sequence Read Archive, we expect this data to spur additional research into the composition and viability of wastewater sequencing data for wastewater-based epidemiology and early detection of novel pathogens.
Scripts to create the manuscript figures are found in the figures/
directory. Scripts to generate tables 2 and S1 are in the table_scripts/
directory. Table 1 is based on the SRA metadata, available under https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1198001.
The data used in figures and scripts is accessible under https://doi.org/10.6084/m9.figshare.28454990.v1. Data was created through a bio-computational pipeline, available under https://github.com/naobservatory/mgs-workflow/tree/2.5.0.
In case of questions please reach out to Simon Grimm.