Skip to content

Commit

Permalink
Merge pull request #92 from paulsengroup/docs/update
Browse files Browse the repository at this point in the history
Update the docs
  • Loading branch information
robomics authored Oct 22, 2024
2 parents 628d8cf + 4a1bde7 commit 85b4fc5
Show file tree
Hide file tree
Showing 16 changed files with 490 additions and 64 deletions.
62 changes: 62 additions & 0 deletions .github/workflows/lint-cff.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Copyright (C) 2024 Roberto Rossini <roberros@uio.no>
# SPDX-License-Identifier: MIT

name: Lint CITATION.cff

on:
push:
branches: [main]
paths:
- ".github/workflows/lint-cff.yml"
- "CITATION.cff"

pull_request:
paths:
- ".github/workflows/lint-cff.yml"
- "CITATION.cff"

# https://stackoverflow.com/a/72408109
concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

defaults:
run:
shell: bash

jobs:
lint-cff:
runs-on: ubuntu-latest
name: Lint CITATION.cff

steps:
- uses: actions/checkout@v4
with:
sparse-checkout: CITATION.cff
sparse-checkout-cone-mode: false

- name: Generate DESCRIPTION file
run: |
cat << EOF > DESCRIPTION
Package: hictkpy
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "first.last@example.com", role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: MIT
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
Imports:
cffr
EOF
- name: Setup R
uses: r-lib/actions/setup-r@v2

- name: Add requirements
uses: r-lib/actions/setup-r-dependencies@v2

- name: Lint CITATION.cff
run: Rscript -e 'cffr::cff_validate("CITATION.cff")'
22 changes: 12 additions & 10 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,18 +5,20 @@
version: 2

build:
os: ubuntu-22.04
apt_packages:
- librsvg2-bin
os: ubuntu-24.04
tools:
python: "3.11"
python: "3.12"

sphinx:
configuration: docs/conf.py

python:
install:
- requirements: docs/requirements.txt
commands:
- pip install -r docs/requirements.txt
- pip install . -v
- docs/update_index_links.py --root-dir "$PWD" --inplace
- make -C docs linkcheck
- make -C docs html
- make -C docs latexpdf
- mkdir -p "$READTHEDOCS_OUTPUT/pdf"
- cp -r docs/_build/html "$READTHEDOCS_OUTPUT/"
- cp docs/_build/latex/hictkpy.pdf "$READTHEDOCS_OUTPUT/pdf/"

formats:
- pdf
34 changes: 27 additions & 7 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,20 @@ authors:
email: roberros@uio.no
affiliation: 'Department of Biosciences, University of Oslo'
title: hictkpy
abstract: 'Python bindings for hictk.'
abstract: 'Python bindings for hictk: read and write .cool and .hic files directly from Python.'
doi: '10.5281/zenodo.8220299'
url: 'https://github.com/paulsengroup/hictkpy'
repository-code: 'https://github.com/paulsengroup/hictkpy'
type: software
license: MIT
keywords:
- bindings
- bioinformatics
- conversion
- cooler
- hic
- hictk
- python
preferred-citation:
type: article
authors:
Expand All @@ -30,10 +38,22 @@ preferred-citation:
orcid: 'https://orcid.org/0000-0002-7918-5495'
email: jonas.paulsen@ibv.uio.no
affiliation: 'Department of Biosciences, University of Oslo'
doi: '10.1101/2023.11.26.568707'
url: 'https://doi.org/10.1101/2023.11.26.568707'
journal: 'Cold Spring Harbor Laboratory'
year: 2023
month: 11
doi: '10.1093/bioinformatics/btae408'
url: 'https://academic.oup.com/bioinformatics/article/40/7/btae408/7698028'
journal: 'Bioinformatics'
year: 2024
month: 06
title: 'hictk: blazing fast toolkit to work with .hic and .cool files'
abstract: 'We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance. The toolkit is written in C++ and consists of a C++ library with Python bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries. We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.'
abstract: >
Hi-C is gaining prominence as a method for mapping genome organization.
With declining sequencing costs and a growing demand for higher-resolution data, efficient tools for processing Hi-C datasets at different resolutions are crucial.
Over the past decade, the .hic and Cooler file formats have become the de-facto standard to store interaction matrices produced by Hi-C experiments in binary format.
Interoperability issues make it unnecessarily difficult to convert between the two formats and to develop applications that can process each format natively.
We developed hictk, a toolkit that can transparently operate on .hic and .cool files with excellent performance.
The toolkit is written in C++ and consists of a C++ library with Python and R bindings as well as CLI tools to perform common operations directly from the shell, including converting between .hic and .mcool formats. We benchmark the performance of hictk and compare it with other popular tools and libraries.
We conclude that hictk significantly outperforms existing tools while providing the flexibility of natively working with both file formats without code duplication.
The hictk library, Python bindings and CLI tools are released under the MIT license as a multi-platform application available at github.com/paulsengroup/hictk.
Pre-built binaries for Linux and macOS are available on bioconda.
Python bindings for hictk are available on GitHub at github.com/paulsengroup/hictkpy, while R bindings are available on GitHub at github.com/paulsengroup/hictkR.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ Python bindings for hictk, a blazing fast toolkit to work with .hic and .cool fi

## Installing hictkpy

hictkpy can be installed in various ways. The simples method is using pip: `pip install hictkpy`.
hictkpy can be installed in various ways. The simples method is using pip: `pip install hictkpy[all]`.

Refer to [Installation](https://hictkpy.readthedocs.io/en/latest/installation.html) for alternative methods.
Refer to [Installation](https://hictkpy.readthedocs.io/en/stable/installation.html) for alternative methods.

## Using hictkpy

Expand All @@ -37,13 +37,13 @@ m1 = sel.to_numpy() # Get interactions as a numpy matrix
m2 = sel.to_coo() # Get interactions as a scipy.sparse.coo_matrix
```

For more detailed examples refer to [Quickstart](https://hictkpy.readthedocs.io/en/latest/quickstart.html).
For more detailed examples refer to [Quickstart](https://hictkpy.readthedocs.io/en/stable/quickstart.html).

The complete documentation for hictkpy API is available [here](https://hictkpy.readthedocs.io/en/latest/hictkpy.html).
The complete documentation for hictkpy API is available [here](https://hictkpy.readthedocs.io/en/stable/hictkpy.html).

## Citing

If you use hictkpy in you reaserch, please cite the following publication:
If you use hictkpy in you research, please cite the following publication:

Roberto Rossini, Jonas Paulsen, hictk: blazing fast toolkit to work with .hic and .cool files
_Bioinformatics_, Volume 40, Issue 7, July 2024, btae408, [https://doi.org/10.1093/bioinformatics/btae408](https://doi.org/10.1093/bioinformatics/btae408)
Expand Down
1 change: 1 addition & 0 deletions docs/api/cooler.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Cooler API
.. autoclass:: SingleCellFile

.. automethod:: __init__
.. automethod:: __getitem__
.. automethod:: attributes
.. automethod:: bins
.. automethod:: cells
Expand Down
62 changes: 62 additions & 0 deletions docs/api/generic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ Generic API
.. autoclass:: MultiResFile

.. automethod:: __init__
.. automethod:: __getitem__
.. automethod:: chromosomes
.. automethod:: path
.. automethod:: resolutions
Expand Down Expand Up @@ -46,9 +47,68 @@ Generic API
.. automethod:: coord2
.. automethod:: nnz
.. automethod:: sum
.. automethod:: to_arrow
.. automethod:: to_coo
.. automethod:: to_csr
.. automethod:: to_df
.. automethod:: to_numpy
.. automethod:: to_pandas

.. automethod:: __iter__

.. code-block:: ipythonconsole
In [1]: import hictkpy as htk
In [2]: f = htk.File("file.cool")
In [3]: sel = f.fetch("chr2L:10,000,000-20,000,000")
In [4]: for i, pixel in enumerate(sel):
...: print(pixel.bin1_id, pixel.bin2_id, pixel.count)
...: if i > 10:
...: break
...:
1000 1000 6759
1000 1001 3241
1000 1002 760
1000 1003 454
1000 1004 289
1000 1005 674
1000 1006 354
1000 1007 124
1000 1008 130
1000 1009 105
1000 1010 99
1000 1011 120
It is also possible to iterate over pixels together with their genomic coordinates by specifying ``join=True`` when calling :py:meth:`hictkpy.File.fetch()`:

.. code-block:: ipythonconsole
In [5]: sel = f.fetch("chr2L:10,000,000-20,000,000", join=True)
In [6]: for i, pixel in enumerate(sel):
...: print(
...: pixel.chrom1, pixel.start1, pixel.end1,
...: pixel.chrom2, pixel.start2, pixel.end2,
...: pixel.count
...: )
...: if i > 10:
...: break
...:
chr2L 10000000 10010000 chr2L 10000000 10010000 6759
chr2L 10000000 10010000 chr2L 10010000 10020000 3241
chr2L 10000000 10010000 chr2L 10020000 10030000 760
chr2L 10000000 10010000 chr2L 10030000 10040000 454
chr2L 10000000 10010000 chr2L 10040000 10050000 289
chr2L 10000000 10010000 chr2L 10050000 10060000 674
chr2L 10000000 10010000 chr2L 10060000 10070000 354
chr2L 10000000 10010000 chr2L 10070000 10080000 124
chr2L 10000000 10010000 chr2L 10080000 10090000 130
chr2L 10000000 10010000 chr2L 10090000 10100000 105
chr2L 10000000 10010000 chr2L 10100000 10110000 99
chr2L 10000000 10010000 chr2L 10110000 10120000 120
.. autoclass:: Bin

Expand All @@ -63,3 +123,5 @@ Generic API
.. automethod:: resolution
.. automethod:: to_df
.. automethod:: type

.. automethod:: __iter__
Binary file added docs/assets/heatmap_001.pdf
Binary file not shown.
38 changes: 27 additions & 11 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,19 @@
#!/usr/bin/env python3

# Copyright (C) 2023 Roberto Rossini <roberros@uio.no>
#
# SPDX-License-Identifier: MIT


import os

# Define the canonical URL if you are using a custom domain on Read the Docs
html_baseurl = os.environ.get("READTHEDOCS_CANONICAL_URL", "")

# Tell Jinja2 templates the build is running on Read the Docs
if os.environ.get("READTHEDOCS", "") == "True":
if "html_context" not in globals():
html_context = {}
html_context["READTHEDOCS"] = True

# -- General configuration ------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
Expand All @@ -14,8 +24,6 @@
# ones.
extensions = [
"sphinx_copybutton",
"sphinxcontrib.rsvgconverter",
"sphinxcontrib.moderncmakedomain",
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.autosummary",
Expand All @@ -27,8 +35,14 @@

intersphinx_mapping = {
"python": ("https://docs.python.org/3", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"pandas": ("https://pandas.pydata.org/docs/", None),
"pyarrow": ("https://arrow.apache.org/docs/", None),
"scipy": ("https://docs.scipy.org/doc/scipy/", None),
}

intersphinx_timeout = 30

# Add any paths that contain templates here, relative to this directory.
templates_path = [".templates"]

Expand Down Expand Up @@ -205,6 +219,7 @@
copybutton_selector = "div:not(.no-copybutton) > div.highlight > pre"
copybutton_exclude = ".linenos, .gp, .go"
copybutton_copy_empty_lines = False
copybutton_prompt_text = "user@dev:/tmp$"

# -- Options for LaTeX output ---------------------------------------------

Expand All @@ -214,13 +229,6 @@
"papersize": "a4paper",
"pointsize": "10pt",
"classoptions": ",openany,oneside",
"preamble": r"""
\usepackage{MnSymbol}
\DeclareUnicodeCharacter{25CB}{\ensuremath{\circ}}
\DeclareUnicodeCharacter{25CF}{\ensuremath{\bullet}}
\DeclareUnicodeCharacter{21B5}{\ensuremath{\rhookswarrow}}
\DeclareUnicodeCharacter{2194}{\ensuremath{\leftrightarrow}}
""",
}

# Grouping the document tree into LaTeX files. List of tuples
Expand Down Expand Up @@ -249,3 +257,11 @@

# If false, no module index is generated.
# latex_domain_indices = True

linkcheck_ignore = [
r"https://hictk.*\.readthedocs\.build.*",
r"https://hictk.*readthedocs.*/_/downloads/en/.*/pdf/",
]

primary_domain = "py"
highlight_language = "py"
6 changes: 3 additions & 3 deletions docs/creating_cool_hic_files.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,13 @@ Creating .cool and .hic files

hictkpy supports creating .cool and .hic files from pre-binned interactions in COO or BedGraph2 format.

The example use file `4DNFIOTPSS3L.hic <https://data.4dnucleome.org/files-processed/4DNFIOTPSS3L>`_, which can be downloaded from `here <https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/7386f953-8da9-47b0-acb2-931cba810544/4DNFIOTPSS3L.hic>`_.
The example in this section use file `4DNFIOTPSS3L.hic <https://data.4dnucleome.org/files-processed/4DNFIOTPSS3L>`_, which can be downloaded from `here <https://4dn-open-data-public.s3.amazonaws.com/fourfront-webprod/wfoutput/7386f953-8da9-47b0-acb2-931cba810544/4DNFIOTPSS3L.hic>`_.

Preparation
-----------

The first step consists of converting interactions from ``4DNFIOTPSS3L.hic`` to bedGraph2 format.
This can be achieved using ``hictk dump``
This can be achieved using ``hictk dump`` (or alternatively with :py:meth:hictkpy.File.fetch()`.

.. code-block:: console
Expand All @@ -33,7 +33,7 @@ This can be achieved using ``hictk dump``
2L 0 50000 2L 450000 500000 756
Next, we also generate the list of chromosomes.
Next, we also generate the list of chromosomes to use as reference.

.. code-block:: console
Expand Down
Loading

0 comments on commit 85b4fc5

Please sign in to comment.