VRS-Python provides Python language support and a reference implementation for the GA4GH Variation Representation Specification(VRS).
- Pydantic implementation of GKS core models and VRS models
- Algorithm for generating consistent, globally unique identifiers for variation without a central authority
- Algorithm for performing fully justified allele normalization
- Translating from and to other variant formats
- Annotate VCFs with VRS
- Convert GA4GH objects between inlined and referenced forms
You are encouraged to browse issues. All known issues are listed there. Please report any issues you find.
- Python >= 3.10
- Note: Python 3.12 is required for developers contributing to VRS-Python. The
Makefile sets up a virtual environment in
venv/3.12
and expects Python to be available aspython3.12
.
- Note: Python 3.12 is required for developers contributing to VRS-Python. The
Makefile sets up a virtual environment in
- libpq
- postgresql
You can use Homebrew to install the prerequisites. See the
Homebrew documentation for how to install. Make
sure Homebrew is up-to-date by running brew update
.
brew install libpq
brew install python3
brew install postgresql@14
sudo apt install gcc libpq-dev python3-dev
VRS-Python is available on PyPI.
pip install 'ga4gh.vrs[extras]'
The [extras]
argument tells pip to install packages to fulfill the dependencies of the
ga4gh.vrs.extras
package.
The ga4gh.vrs.extras
modules are not part of the VR spec per se. They are
bundled with ga4gh.vrs for development and installation convenience. These
modules depend directly and indirectly on external data sources of sequences,
transcripts, and genome-transcript alignments.
First, you must install a local SeqRepo:
pip install seqrepo
export SEQREPO_VERSION=2024-02-20 # or newer if available -- check `seqrepo list-remote-instances`
sudo mkdir -p /usr/local/share/seqrepo
sudo chown $USER /usr/local/share/seqrepo
seqrepo pull -i $SEQREPO_VERSION
seqrepo update-latest
If you encounter a permission error similar to the one below:
PermissionError: [Error 13] Permission denied: '/usr/local/share/seqrepo/2024-02-20._fkuefgd' -> '/usr/local/share/seqrepo/2024-02-20'
Try moving data manually with sudo
:
sudo mv /usr/local/share/seqrepo/$SEQREPO_VERSION.* /usr/local/share/seqrepo/$SEQREPO_VERSION
To make installation easy, we recommend using Docker to install the other Biocommons tools - SeqRepo REST and UTA. If you would like to use local instances of UTA, see UTA directly. We do provide some additional setup help here.
Next, run the following commands:
docker volume create --name=uta_vol
docker volume create --name=seqrepo_vol
docker-compose up
This should start three containers:
- seqrepo: downloads seqrepo into a docker volume and exits
- seqrepo-rest-service: a REST service on seqrepo (localhost:5000)
- uta: a database of transcripts and alignments (localhost:5432)
Check that the containers are running, by running:
$ docker ps
CONTAINER ID IMAGE // NAMES
86e872ab0c69 biocommons/seqrepo-rest-service:latest // vrs-python_seqrepo-rest-service_1
a40576b8cf1f biocommons/uta:uta_20210129b // vrs-python_uta_1
Depending on your network and host, the first run is likely to take 5-15 minutes in order to download and install data. Subsequent startups should be nearly instantaneous.
You can test UTA and seqrepo installations like so:
$ psql -XAt postgres://anonymous@localhost/uta -c 'select count(*) from uta_20210129b.transcript'
314227
Here are some things to try.
-
Bring up one service at a time. For example, if you haven't download seqrepo yet, you might see this:
$ docker-compose up seqrepo-rest-service Starting vrs-python_seqrepo-rest-service_1 ... done Attaching to vrs-python_seqrepo-rest-service_1 seqrepo-rest-service_1 | 2022-07-26 15:59:59 seqrepo_rest_service.__main__[1] INFO Using seqrepo_dir='/usr/local/share/seqrepo/2024-02-20' from command line ⋮ seqrepo-rest-service_1 | OSError: Unable to open SeqRepo directory /usr/local/share/seqrepo/2024-02-20 vrs-python_seqrepo-rest-service_1 exited with code 1
The ga4gh/vrs-python repo embeds the ga4gh/vrs repo as a git submodule for testing purposes. Each ga4gh.vrs package on PyPI embeds a particular version of VRS. The correspondences between the packages that are currently maintained may be summarized as:
vrs-python branch | vrs-python tag/version | vrs branch | vrs version |
---|---|---|---|
main (default branch) | 2.x | 2.x | 2.x |
1.x | 0.8.x | 1.x | 1.x |
⚠ Note: Only 2.x branch is being actively maintained. The 1.x branch will only be maintained for bug fixes.
⚠ Developers: See the development section below for recommendations for using submodules gracefully (and without causing problems for others!).
The correspondences between the packages that are no longer maintained may be summarized as:
vrs-python branch | vrs-python tag/version | vrs branch | vrs version |
---|---|---|---|
0.9 | 0.9.x | metaschema-update | N/A |
0.7 | 0.7.x | 1.2 | 1.2.x |
0.6 | 0.6.x | 1.1 | 1.1.x |
This section is intended for developers who contribute to VRS-Python.
Fork the repo at https://github.com/ga4gh/vrs-python/ and initialize a development environment.
git clone --recurse-submodules git@github.com:YOUR_GITHUB_ID/vrs-python.git
cd vrs-python
make devready
source venv/3.12/bin/activate
This setup includes pre-commit hooks. If you create a virtual environment manually, be sure to install the hooks yourself; otherwise, commits may fail during CI/CD checks:
source venv/3.12/bin/activate
pre-commit install
If you already cloned the repo, but forgot to include --recurse-submodules
you can run:
git submodule update --init --recursive
vrs-python embeds vrs as a submodule, only for testing purposes. When checking out vrs-python and switching
branches, it is important to make sure that the submodule tracks vrs-python
correctly. The recommended way to do this is git config --global submodule.recurse true
. If you don't set submodule.recurse, developers and
reviewers must be extremely careful to not accidentally upgrade or downgrade
schemas with respect to vrs-python.
Alternatively, see misc/githooks/
.
This package implements typical unit tests for ga4gh.core and ga4gh.vrs. This package also implements the compliance tests from vrs (vrs/validation) in the tests/validation/ directory.
To run tests:
make test
The notebooks do not require you to setup SeqRepo or UTA from Install External Data Sources.
Binder allows you to create custom computing environments that can be shared and used by many remote users.
You can access the notebooks on Binder here.
Terra is a cloud platform for biomedical research developed by the Broad Institute, Microsoft and Verily. The platform includes preconfigured environments that provide user-friendly access to various applications commonly used in bioinformatics, including Jupyter Notebooks.
We have created a public VRS-demo-notebooks
workspace in Terra that contains the demo notebooks along with instructions for running them with minimal setup. To get started, see either the VRS-demo-notebooks
workspace or the Terra.ipynb
notebook in this repository.
VS Code is a code editor developed by Microsoft. It is lightweight, highly customizable, and supports a wide range of programming languages, with a robust extension system. You can download VS Code here.
- Open VS Code.
- Use Extensions view (Ctrl+Shift+X or ⌘+Shift+X) to install the Jupyter extension.
- Navigate to your vrs-python project folder and open it in VS Code.
- In a notebook, click
Select Kernel
at the top right. Select the option where the path isvenv/3.12/bin/python3
. See here for more information on managing Jupyter Kernels in VS Code. - After selecting the kernel you can now run the notebook.
A stand-alone security review has been performed on the specification itself. This implementation is offered as-is, and without any security guarantees. It will need an independent security review before it can be considered ready for use in security-critical applications. If you integrate this code into your application it is AT YOUR OWN RISK AND RESPONSIBILITY to arrange for a security audit.