Skip to content

Commit

Permalink
remove ref constructor and update doc
Browse files Browse the repository at this point in the history
  • Loading branch information
DongzeHE committed Feb 17, 2024
1 parent d6d63b5 commit 8b4e0c4
Show file tree
Hide file tree
Showing 6 changed files with 35 additions and 31 deletions.
11 changes: 9 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,14 @@
# pyroe

## About `pyroe`
The main purpose of `pyroe` is to provide the python interface for loading the quantification results of single-cell sequencing data generated by [`alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) and [`simpleaf`](https://github.com/COMBINE-lab/simpleaf).
- The major function of pyroe is the [`load_fry`](https://pyroe.readthedocs.io/en/latest/processing_fry_quants.html#load-fry-full-usage) function, which loads the quantification results into an [`anndata`](https://anndata.readthedocs.io/en/latest/) object to perform downstream analysis provided by [`scanpy`](https://scanpy.readthedocs.io/en/stable/). It provides many options for constructing the final `anndata` object by combining the count matrices representing difference splicing statuses differently.
- Moreover, `pyroe` provides the interface for the [`quantaf`](https://combine-lab.github.io/quantaf/) project, which is a database containing the quantification results of many publicly available datasets.


### Background
[`Alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) is a fast, accurate, and memory frugal quantification tool for preprocessing single-cell RNA-sequencing data. Detailed information can be found in the alevin-fry [pre-print](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2), and [paper](https://www.nature.com/articles/s41592-022-01408-3).

The `pyroe` package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. The documentation for `pyroe` has its own dedicated website. Please visit the [ReadTheDocs pyroe website here](https://pyroe.readthedocs.io).
[`simpleaf`](https://github.com/COMBINE-lab/simpleaf) provides a simple and easy-to-use interface for running `alevin-fry`, and also more advanced features such as designing and executing custom workflows for single-cell data analysis. ([Paper](https://doi.org/10.1093/bioinformatics/btad614) and [Documentation](https://simpleaf.readthedocs.io/en/latest/))

## Major Updates
Since Pyroe v0.10.0, the functionality for creating augmented transcriptome references and generating gene ID to gene name file has been moved to the [`roers`](https://github.com/COMBINE-lab/roers) packge, which is automatically installed together with [`simpleaf`](https://github.com/COMBINE-lab/alevin-fry). For all our users, we recommend using the simplified command line interface provided in [`simpleaf`](https://simpleaf.readthedocs.io/en/latest/) to process your single-cell sequencing data. The [`simpleaf index`](https://simpleaf.readthedocs.io/en/latest/index-command.html) command will automatically generate the augmented transcriptome reference (including the gene ID to gene name file), indexing the reference for you.
17 changes: 9 additions & 8 deletions docs/source/building_splici_index.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#################################################################################
Preparing an expanded transcriptome reference for quantification with alevin-fry
(Deprecated since v0.10.0)Preparing an expanded transcriptome reference for quantification with alevin-fry
#################################################################################

The USA mode in alevin-fry requires an expanded index reference, in which sequences represent spliced and unspliced transcripts. Pyroe provides CLI programs and python functions to build the pre-defined expanded references, the spliced + intronic (*splici*) reference, which includes the spliced transcripts plus the (merged and collapsed) intronic sequences of each gene and the spliced + unspliced (*spliceu*) reference, which consists of the spliced transcripts plus the unspliced transcript (genes' entire genomic interval) of each gene. The ``make_splici_txome()`` and ``make_spliceu_txome()`` python functions are designed to make the *splici* and *spliceu* reference by taking a genome FASTA file and a gene annotation GTF file as the input. Furthermore, the

Preparing a *spliced+intronic* transcriptome reference
-------------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The *splici* index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences of each gene.

Expand All @@ -29,8 +29,8 @@ The `pyroe make-spliced+intronic` program writes three files to your specified o
* A three-column transcript-name-to-gene-name file that stores the name of each reference sequence in the splici index reference, their corresponding gene name, and the splicing status (`S` for spliced and `U` for unspliced) of those transcripts.
* A two-column TSV file that maps gene ids (used as the keys in eventual alevin-fry output) to gene names. This can later be used with the ``pyroe convert`` command line program to convert gene ids to gene names in the count matrix.

Full usage
^^^^^^^^^^
**Full usage**


.. code::
Expand Down Expand Up @@ -120,7 +120,7 @@ The ``pyroe make-spliced+intronic`` command line program calls the ``make_splici
Nothing will be returned. The splici reference files will be written to disk.
Preparing a *spliced+unspliced* transcriptome reference
-------------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Recently, `He et al., 2023 <https://www.biorxiv.org/content/10.1101/2023.01.04.522742>`_ introduced the spliced + unspliced (*spliceu*) index in alevin-fry. This requires the spliced + unspliced transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene. Details about the *spliceu* can be found in `the preprint <https://www.biorxiv.org/content/10.1101/2023.01.04.522742>`_. To make the spliceu reference using pyroe, one can call the ``make_spliceu_txome()`` python function or ``pyroe make-spliced+unspliced`` or its alias ``pyroe make-spliceu`` from the command line. The following example shows the shell command of building a spliceu reference from a given reference set in the directory ``spliceu_txome``.

Expand All @@ -132,8 +132,8 @@ Recently, `He et al., 2023 <https://www.biorxiv.org/content/10.1101/2023.01.04.5
spliceu_txome \
--filename-prefix spliceu
Full usage
^^^^^^^^^^
**Full usage**


.. code::
Expand Down Expand Up @@ -208,7 +208,8 @@ The ``pyroe make-spliced+unspliced`` command line program calls the ``make_splic
Notes on the input gene annotation GTF files for building an expanded reference
----------------------------------------------------------------------------------
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Pyroe builds expanded transcriptome references, the spliced + intronic (*splici*) and the spliced + unspliced (*spliceu*) transcriptome reference, based on a genome build FASTA file and a gene annotation GTF file.

The input GTF file will be processed before extracting unspliced sequences. If pyroe finds invalid records, a ``clean_gtf.gtf`` file will be generated in the specified output directory. **Note** : The features extracted in the spliced + unspliced transcriptome will not necessarily be those present in the ``clean_gtf.gtf`` file — as this command will prefer the input in the user-provided file wherever possible. One can rerun pyroe using the ``clean_gtf.gtf`` file if needed. More specifically:
Expand Down
2 changes: 1 addition & 1 deletion docs/source/geneid_to_name.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Generating a gene id to gene name mapping
(Deprecated since v0.10.0)Generating a gene id to gene name mapping
=========================================

It is often useful to perform analyses with gene *names* rather than gene *identifiers*. The `convert <https://pyroe.readthedocs.io/en/latest/converting_quants.html>`_ command of ``pyroe`` allows you to specify an id to name mapping so that the converted output matrix will be labeled with gene names rather than identifiers. However, you must provide it with a 2-column tab-separated file mapping IDs to names. This command can help you with that task.
Expand Down
10 changes: 7 additions & 3 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,22 @@ Welcome to the documentation for pyroe
What is pyroe?
===================

The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. Since `simpleaf` version 0.14.0, `roers <https://github.com/COMBINE-lab/roers>`_, instead of pyroe, became as the default augmented reference constructor for `alevin-fry` and `simpleaf`. Now, the main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` quantification results into Python as an `anndata <http://anndata.readthedocs.io/>`_ object, so as to be compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/index.html>`_.
The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. Since `simpleaf` version 0.14.0, `roers <https://github.com/COMBINE-lab/roers>`_, instead of pyroe, became the default augmented reference constructor for `alevin-fry` and `simpleaf`. Now, the main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` and `simpleaf` quantification results into Python as an `anndata <http://anndata.readthedocs.io/>`_ object, so as to perform downstream analysis provided by `scanpy <https://scanpy.readthedocs.io/en/stable/index.html>`_.

Although pyroe is available on bioconda and can be easily installed, if you encounter any problem during installation, you can also define the `load_fry` function locally in your python script by copying the function definition defined `here <https://github.com/COMBINE-lab/pyroe/blob/main/src/pyroe/load_fry.py>`_. The only dependency of `load_fry` is `scanpy <https://scanpy.readthedocs.io/en/stable/installation.html>`_.

Additionally,

.. toctree::
:maxdepth: 2
:caption: Contents:

installing
building_splici_index
processing_fry_quants
converting_quants
fetching_processed_quants
building_splici_index
geneid_to_name
converting_quants
LICENSE.rst

Indices and tables
Expand Down
12 changes: 2 additions & 10 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -16,22 +16,14 @@ classifiers =
packages = find:
package_dir =
= src
scripts =
bin/pyroe
# scripts =
# bin/pyroe
python_requires = >=3.7
include_package_data = True
install_requires =
packaging >= 21.0
scanpy >= 1.8.2

[options.extras_require]
ref =
pyranges == 0.0.129
biopython >= 1.77
pandas >= 1.3.0, <= 2.2.0
# bedtools >= 2.30.0
# scanpy =
# scanpy >= 1.8.2

[options.packages.find]
where = src
Expand Down
14 changes: 7 additions & 7 deletions src/pyroe/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@
from pyroe.pyroe_utils import output_formats


try:
from pyroe.make_txome import make_splici_txome, make_spliceu_txome
from pyroe.id_to_name import id_to_name
except ImportError:
make_splici_txome = None
make_spliceu_txome = None
id_to_name = None
# try:
# from pyroe.make_txome import make_splici_txome, make_spliceu_txome
# from pyroe.id_to_name import id_to_name
# except ImportError:
# make_splici_txome = None
# make_spliceu_txome = None
# id_to_name = None

# flake8: noqa

0 comments on commit 8b4e0c4

Please sign in to comment.