remove ref constructor and update doc

COMBINE-lab · Feb 17, 2024 · 8b4e0c4 · 8b4e0c4
1 parent d6d63b5
commit 8b4e0c4
Show file tree

Hide file tree

Showing 6 changed files with 35 additions and 31 deletions.
diff --git a/README.md b/README.md
@@ -1,7 +1,14 @@
 # pyroe
 
-## About `pyroe`
+The main purpose of `pyroe` is to provide the python interface for loading the quantification results of single-cell sequencing data generated by [`alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) and [`simpleaf`](https://github.com/COMBINE-lab/simpleaf). 
+- The major function of pyroe is the [`load_fry`](https://pyroe.readthedocs.io/en/latest/processing_fry_quants.html#load-fry-full-usage) function, which loads the quantification results into an [`anndata`](https://anndata.readthedocs.io/en/latest/) object to perform downstream analysis provided by [`scanpy`](https://scanpy.readthedocs.io/en/stable/). It provides many options for constructing the final `anndata` object by combining the count matrices representing difference splicing statuses differently. 
+- Moreover, `pyroe` provides the interface for the [`quantaf`](https://combine-lab.github.io/quantaf/) project, which is a database containing the quantification results of many publicly available datasets.
 
+
+### Background
 [`Alevin-fry`](https://github.com/COMBINE-lab/alevin-fry) is a fast, accurate, and memory frugal quantification tool for preprocessing single-cell RNA-sequencing data. Detailed information can be found in the alevin-fry [pre-print](https://www.biorxiv.org/content/10.1101/2021.06.29.450377v2), and [paper](https://www.nature.com/articles/s41592-022-01408-3).
 
-The `pyroe` package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`.  The documentation for `pyroe` has its own dedicated website.  Please visit the [ReadTheDocs pyroe website here](https://pyroe.readthedocs.io).
+[`simpleaf`](https://github.com/COMBINE-lab/simpleaf) provides a simple and easy-to-use interface for running `alevin-fry`, and also more advanced features such as designing and executing custom workflows for single-cell data analysis. ([Paper](https://doi.org/10.1093/bioinformatics/btad614) and [Documentation](https://simpleaf.readthedocs.io/en/latest/))
+
+## Major Updates
+Since Pyroe v0.10.0, the functionality for creating augmented transcriptome references and generating gene ID to gene name file has been moved to the [`roers`](https://github.com/COMBINE-lab/roers) packge, which is automatically installed together with [`simpleaf`](https://github.com/COMBINE-lab/alevin-fry). For all our users, we recommend using the simplified command line interface provided in [`simpleaf`](https://simpleaf.readthedocs.io/en/latest/) to process your single-cell sequencing data. The [`simpleaf index`](https://simpleaf.readthedocs.io/en/latest/index-command.html) command will automatically generate the augmented transcriptome reference (including the gene ID to gene name file), indexing the reference for you.
diff --git a/docs/source/building_splici_index.rst b/docs/source/building_splici_index.rst
@@ -1,11 +1,11 @@
 #################################################################################
-Preparing an expanded transcriptome reference for quantification with alevin-fry
+(Deprecated since v0.10.0)Preparing an expanded transcriptome reference for quantification with alevin-fry
 #################################################################################
 
 The USA mode in alevin-fry requires an expanded index reference, in which sequences represent spliced and unspliced transcripts. Pyroe provides CLI programs and python functions to build the pre-defined expanded references, the spliced + intronic (*splici*) reference, which includes the spliced transcripts plus the (merged and collapsed) intronic sequences of each gene and the spliced + unspliced (*spliceu*) reference, which consists of the spliced transcripts plus the unspliced transcript (genes' entire genomic interval) of each gene. The ``make_splici_txome()`` and ``make_spliceu_txome()`` python functions are designed to make the *splici* and *spliceu* reference by taking a genome FASTA file and a gene annotation GTF file as the input. Furthermore, the 
 
 Preparing a *spliced+intronic* transcriptome reference
--------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 The *splici* index reference of a given species consists of the transcriptome of the species, i.e., the spliced transcripts and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences of each gene.
 
@@ -29,8 +29,8 @@ The `pyroe make-spliced+intronic` program writes three files to your specified o
 * A three-column transcript-name-to-gene-name file that stores the name of each reference sequence in the splici index reference, their corresponding gene name, and the splicing status (`S` for spliced and `U` for unspliced) of those transcripts.
 * A two-column TSV file that maps gene ids (used as the keys in eventual alevin-fry output) to gene names. This can later be used with the ``pyroe convert`` command line program to convert gene ids to gene names in the count matrix.
 
-Full usage
-^^^^^^^^^^
+**Full usage**
+
 
 .. code::
 
@@ -120,7 +120,7 @@ The ``pyroe make-spliced+intronic`` command line program calls the ``make_splici
   Nothing will be returned. The splici reference files will be written to disk.
 
 Preparing a *spliced+unspliced* transcriptome reference
--------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Recently, `He et al., 2023 <https://www.biorxiv.org/content/10.1101/2023.01.04.522742>`_ introduced the spliced + unspliced (*spliceu*) index in alevin-fry. This requires the spliced + unspliced transcriptome reference, where the unspliced transcripts of each gene represent the entire genomic interval of that gene. Details about the *spliceu* can be found in `the preprint <https://www.biorxiv.org/content/10.1101/2023.01.04.522742>`_. To make the spliceu reference using pyroe, one can call the ``make_spliceu_txome()`` python function or ``pyroe make-spliced+unspliced`` or its alias ``pyroe make-spliceu`` from the command line. The following example shows the shell command of building a spliceu reference from a given reference set in the directory ``spliceu_txome``.
 
@@ -132,8 +132,8 @@ Recently, `He et al., 2023 <https://www.biorxiv.org/content/10.1101/2023.01.04.5
   spliceu_txome \
   --filename-prefix spliceu
 
-Full usage
-^^^^^^^^^^
+**Full usage**
+
 
 .. code::
 
@@ -208,7 +208,8 @@ The ``pyroe make-spliced+unspliced`` command line program calls the ``make_splic
 
 
 Notes on the input gene annotation GTF files for building an expanded reference
-----------------------------------------------------------------------------------
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
 Pyroe builds expanded transcriptome references, the spliced + intronic (*splici*) and the spliced + unspliced (*spliceu*) transcriptome reference, based on a genome build FASTA file and a gene annotation GTF file.
 
 The input GTF file will be processed before extracting unspliced sequences. If pyroe finds invalid records, a ``clean_gtf.gtf`` file will be generated in the specified output directory.  **Note** : The features extracted in the spliced + unspliced transcriptome will not necessarily be those present in the ``clean_gtf.gtf`` file — as this command will prefer the input in the user-provided file wherever possible. One can rerun pyroe using the ``clean_gtf.gtf`` file if needed. More specifically:

diff --git a/docs/source/geneid_to_name.rst b/docs/source/geneid_to_name.rst
@@ -1,4 +1,4 @@
-Generating a gene id to gene name mapping
+(Deprecated since v0.10.0)Generating a gene id to gene name mapping
 =========================================
 
 It is often useful to perform analyses with gene *names* rather than gene *identifiers*. The `convert <https://pyroe.readthedocs.io/en/latest/converting_quants.html>`_ command of ``pyroe`` allows you to specify an id to name mapping so that the converted output matrix will be labeled with gene names rather than identifiers.  However, you must provide it with a 2-column tab-separated file mapping IDs to names.  This command can help you with that task.

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -4,18 +4,22 @@ Welcome to the documentation for pyroe
 What is pyroe?
 ===================
 
-The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. Since `simpleaf` version 0.14.0, `roers <https://github.com/COMBINE-lab/roers>`_, instead of pyroe, became as the default augmented reference constructor for `alevin-fry` and `simpleaf`. Now, the main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` quantification results into Python as an `anndata <http://anndata.readthedocs.io/>`_ object, so as to be compatible with `scanpy <https://scanpy.readthedocs.io/en/stable/index.html>`_.
+The pyroe package provides useful functions for analyzing single-cell or single-nucleus RNA-sequencing data using `alevin-fry`. Since `simpleaf` version 0.14.0, `roers <https://github.com/COMBINE-lab/roers>`_, instead of pyroe, became the default augmented reference constructor for `alevin-fry` and `simpleaf`. Now, the main purpose of `pyroe` is to provide the function `load_fry` to load `alevin-fry` and `simpleaf` quantification results into Python as an `anndata <http://anndata.readthedocs.io/>`_ object, so as to perform downstream analysis provided by `scanpy <https://scanpy.readthedocs.io/en/stable/index.html>`_.
+
+Although pyroe is available on bioconda and can be easily installed, if you encounter any problem during installation, you can also define the `load_fry` function locally in your python script by copying the function definition defined `here <https://github.com/COMBINE-lab/pyroe/blob/main/src/pyroe/load_fry.py>`_. The only dependency of `load_fry` is `scanpy <https://scanpy.readthedocs.io/en/stable/installation.html>`_. 
+
+Additionally, 
 
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
 
    installing
-   building_splici_index
    processing_fry_quants
+   converting_quants
    fetching_processed_quants
+   building_splici_index
    geneid_to_name
-   converting_quants
    LICENSE.rst
 
 Indices and tables

diff --git a/setup.cfg b/setup.cfg
@@ -16,22 +16,14 @@ classifiers =
 packages = find:
 package_dir =
     = src
-scripts =
-    bin/pyroe
+# scripts =
+#     bin/pyroe
 python_requires = >=3.7
 include_package_data = True
 install_requires = 
     packaging >= 21.0
     scanpy >= 1.8.2
 
-[options.extras_require]
-ref = 
-    pyranges == 0.0.129
-    biopython >= 1.77
-    pandas >= 1.3.0, <= 2.2.0
-    # bedtools >= 2.30.0
-# scanpy = 
-#     scanpy >= 1.8.2
 
 [options.packages.find]
 where = src

diff --git a/src/pyroe/__init__.py b/src/pyroe/__init__.py
@@ -8,12 +8,12 @@
 from pyroe.pyroe_utils import output_formats
 
 
-try:
-    from pyroe.make_txome import make_splici_txome, make_spliceu_txome
-    from pyroe.id_to_name import id_to_name
-except ImportError:
-    make_splici_txome = None 
-    make_spliceu_txome = None 
-    id_to_name = None 
+# try:
+#     from pyroe.make_txome import make_splici_txome, make_spliceu_txome
+#     from pyroe.id_to_name import id_to_name
+# except ImportError:
+#     make_splici_txome = None 
+#     make_spliceu_txome = None 
+#     id_to_name = None 
 
 # flake8: noqa