New Atlantis
is an open ocean regeneration project that seeks to address biodiversity loss in our oceans by providing a viable business model to Marine Protected Areas (MPAs). We do this by building an open marine biodiversity analytics platform to monitor and forecast the health of Marine Protected Areas and from which marine biocredits and blue carbon credits can be generated.
The metagenomic pipeline section of the new atlantis github. An easy-to-use pipeline for generating metagenomic data on different ocean samples.
Currently known as the Living Oceans Metagenome Assembly Pipeline
or LOMAP
for short.
Photo used with permission by Paul Nicklen, co-founder of SeaLegacy.org, New Atlantis Founding Advisor, NatGeo Contributor, Instagram
You can set up and use the LOMAP
on the cloud by following along the google colab notebook
Please note that google colab does not provide the computational resources necessary to fully run LOMAP
on a real dataset. This notebook demonstrates how to setup and use LOMAP
by performing the first steps in the workflow on a toy dataset.
You can set up LOMAP
on your computer at home in one line!
git clone https://github.com/new-atlantis-dao/Oceanomics/tree/main/Metagenomics && cd Metagenomics && rm -r .git
Congratulations, you can now start using LOMAP
.
LOMAP
can be used to explore a local section of ocean's planktonic network. A written tutorial on how to use the LOMAP
pipeline will be released at a later date.
βββ LICENSE
βββ Makefile <- Makefile with commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ data
βΒ Β βββ external <- Data from third party sources.
βΒ Β βββ interim <- Intermediate data that has been transformed.
βΒ Β βββ processed <- The final, canonical data sets for modeling.
βΒ Β βββ raw <- The original, immutable data dump.
β
βββ docs <- A default Sphinx project; see sphinx-doc.org for details
β
βββ models <- Trained and serialized models, model predictions, or model summaries
β
βββ notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
β the creator's initials, and a short `-` delimited description, e.g.
β `1.0-jqp-initial-data-exploration`.
β
βββ references <- Data dictionaries, manuals, and all other explanatory materials.
β
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
βΒ Β βββ figures <- Generated graphics and figures to be used in reporting
β
βββ requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
β generated with `pip freeze > requirements.txt`
β
βββ setup.py <- makes project pip installable (pip install -e .) so src can be imported
βββ src <- Source code for use in this project.
βΒ Β βββ __init__.py <- Makes src a Python module
β β
βΒ Β βββ data <- Scripts to download or generate data
βΒ Β βΒ Β βββ make_dataset.py
β β
βΒ Β βββ features <- Scripts to turn raw data into features for modeling
βΒ Β βΒ Β βββ build_features.py
β β
βΒ Β βββ models <- Scripts to train models and then use trained models to make
β β β predictions
βΒ Β βΒ Β βββ predict_model.py
βΒ Β βΒ Β βββ train_model.py
β β
βΒ Β βββ visualization <- Scripts to create exploratory and results oriented visualizations
βΒ Β βββ visualize.py
β
βββ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
Software and marker gene sequences used to build a plankton specific database for taxonomic profiling derive from the following publications:
Microbial abundance, activity and population genomic profiling with mOTUs2 (2019)
read_counter A tool to count the number of reads (from a fastq file) that map to a set of nucleotide sequences (in a fasta format).
A robust approach to estimate relative phytoplankton cell abundances from metagenomes (2022)
Toward a global reference database of COI barcodes for marine zooplankton (2021)
A simple Taxonomic Plankton Profiler Tool (unpublished work).
Please reach out with any comments, concerns, or discussion regarding LOMAP