User-facing documentation is hosted at refgenie.org/refget.
This repository includes:
/refget: TherefgetPython package, which provides a Python interface to both remote and local use of refget standards. It has clients and functions for both refget sequences and refget sequence collections (seqcol)./seqcolapi: Sequence collections API software, a FastAPI wrapper built on top of therefgetpackage. It provides a bare-bones Sequence Collections API service./deployment: Server configurations for demo instances and public deployed instances. There are also github workflows (in.github/workflows) that deploy the demo server instance from this repository./test_fastaand/test_api: Dummy data and a compliance test, to test external implementations of the Refget Sequence Collections API./frontend: a React seqcolapi front-end.
To deploy the public demo instance, you can either:
-
Create a GitHub release - This triggers the
deploy_release_software.ymlworkflow, which builds and pushes the Docker image to DockerHub. After that completes, it automatically triggersdeploy_primary.ymlto deploy to AWS ECS. -
Manual dispatch - You can manually trigger either workflow from the GitHub Actions tab.
This builds seqcolapi, pushes to DockerHub, and deploys to ECS.
pytestIntegration tests run against an ephemeral PostgreSQL database in Docker:
./scripts/test-integration.shThis starts the test database, runs tests, and cleans up automatically.
In a moment I'll show you how to do these steps individually, but if you're in a hurry, the easy way get a development API running for testing is to just use my very simple shell script like this (no data persistence, just loads demo data):
bash deployment/demo_up.shThis will:
- populate env vars
- launch postgres container with docker
- run the refget service with uvicorn
- load up the demo data
- block the terminal until you press Ctrl+C, which will shut down all services.
Alternatively, if you want to run each step separately to see what's really going on, start here.
First configure a database connection through environment variables. Choose one of these:
source deployment/local_demo/local_demo.env # local demo (see below to create the database using docker)
source deployment/seqcolapi.databio.org/production.env # connect to production database
If you're using the local_demo, then use docker to launch a local postgres database service like this:
docker run --rm --name refget-postgres -p 127.0.0.1:5432:5432 \
-e POSTGRES_PASSWORD \
-e POSTGRES_USER \
-e POSTGRES_DB \
-e POSTGRES_HOST \
postgres:17.0
If you need to load test data into your server, then you have to install gtars (with pip install gtars), a Python package for computing GA4GH digests. You can then load test data like this:
PYTHONPATH=. python data_loaders/load_demo_seqcols.py
or:
refget add-fasta -p test_fasta/test_fasta_metadata.csv -r test_fasta
Run the demo seqcolapi service like this:
uvicorn seqcolapi.main:app --reload --port 8100
To build the docker file, first build the image from the root of this repository:
docker build -f deployment/dockerhub/Dockerfile -t databio/seqcolapi seqcolapi
To run in container:
source deployment/seqcolapi.databio.org/production.env
docker run --rm -p 8000:80 --name seqcolapi \
--env "POSTGRES_USER" \
--env "POSTGRES_DB" \
--env "POSTGRES_PASSWORD" \
--env "POSTGRES_HOST" \
databio/seqcolapi
Use the github action in this repo which deploys on release, or through manual dispatch.
Once you have a backend running, you can run a frontend to interact with it
cd frontend
npm i
VITE_API_BASE="http://localhost:8100" npm run dev
cd frontend
npm i
VITE_API_BASE="https://seqcolapi.databio.org" npm run dev
The /digest feature uses @databio/gtars for WASM-based FASTA processing. To use a local gtars-wasm build instead of the npm package:
LOCAL_GTARS=../../gtars/gtars-wasm/pkg npm run dev
The LOCAL_GTARS env var should point to the pkg/ directory of a built gtars-wasm package (run wasm-pack build --target web in gtars-wasm to build it).
The streaming API handles files of any size:
import * as gtars from '@databio/gtars';
await gtars.default(); // Initialize WASM
// Streaming API (for large files)
const handle = gtars.fastaHasherNew();
gtars.fastaHasherUpdate(handle, chunk); // Feed Uint8Array chunks
const result = gtars.fastaHasherFinish(handle); // Get SeqColResult
// Batch API (for small files)
const result = gtars.digestSeqcol(fastaBytes);Result object:
interface SeqColResult {
digest: string; // Collection digest (SHA512t24u)
names_digest: string;
sequences_digest: string;
lengths_digest: string;
n_sequences: number;
sequences: Array<{
name: string;
length: number;
alphabet: string; // dna2bit, dna3bit, etc.
sha512t24u: string;
md5: string;
description?: string;
}>;
}- Ensure the refget package master branch is as you want it.
- Deploy the updated secqolapi app to dockerhub (using manual dispatch, or deploy on github release).
- Finally, deploy the instance with manual dispatch using the included GitHub action.
The objects and attributes are represented as SQLModel objects in refget/models.py. To add a new attribute:
- create a new model. This will create a table for that model, etc.
- change the function that creates the objects, to populate the new attribute.
refget add-fasta -p ref_fasta.csv -r $BRICKYARD/datasets_downloaded/pangenome_fasta/reference_fasta