A utility for packaging objects and validating metadata for FAIRSCAPE.
Documentation: https://fairscape.github.io/fairscape-cli/
fairscape-cli provides a Command Line Interface (CLI) that allows the client side to create:
- RO-Crate - a light-weight approach to packaging research data with their metadata. The CLI allows users to:
- Create Research Object Crates (RO-Crates)
- Add (transfer) digital objects to the RO-Crate
- Register metadata of the objects
- Describe the schema of tabular dataset objects as metadata and perform validation.
Python 3.8+
$ pip install fairscape-cli
- Show all commands, arguments, and options
$ fairscape-cli --help
- Create an RO-Crate in a specified directory
$ fairscape-cli rocrate create \
--name "test rocrate" \
--description "Example RO Crate for Tests" \
--organization-name "UVA" \
--project-name "B2AI" \
--keywords "b2ai" \
--keywords "cm4ai" \
--keywords "U2OS" \
"./test_rocrate"
- Create an RO-Crate in the current working directory
$ fairscape-cli rocrate init \
--name "test rocrate" \
--description "Example RO Crate for Tests" \
--organization-name "UVA" \
--project-name "B2AI" \
--keywords "b2ai" \
--keywords "cm4ai" \
--keywords "U2OS"
- Add a dataset to the RO-Crate
$ fairscape-cli rocrate add dataset \
--name "AP-MS embeddings" \
--author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
--version "1.0" \
--date-published "2021-04-23" \
--description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study, generated by node2vec predict." \
--keywords "b2ai" \
--keywords "cm4ai" \
--keywords "U2OS" \
--data-format "CSV" \
--source-filepath "./tests/data/APMS_embedding_MUSIC.csv" \
--destination-filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
"./test_rocrate"
- Add a software to the RO-Crate
$ fairscape-cli rocrate add software \
--name "calibrate pairwise distance" \
--author "Qin, Y." \
--version "1.0" \
--description "script written in python to calibrate pairwise distance." \
--keywords "b2ai" \
--keywords "cm4ai" \
--keywords "U2OS" \
--file-format "py" \
--source-filepath "./tests/data/calibrate_pairwise_distance.py" \
--destination-filepath "./test_rocrate/calibrate_pairwise_distance.py" \
--date-modified "2021-04-23" \
"./test_rocrate"
- Register a computation to the RO-Crate
$ fairscape-cli rocrate register computation \
--name "calibrate pairwise distance" \
--run-by "Qin, Y." \
--date-created "2021-05-23" \
--description "Average the predicted proximities" \
--keywords "b2ai" \
--keywords "cm4ai" \
--keywords "U2OS" \
"./test_rocrate"
- Create a schema
$ fairscape-cli schema create-tabular \
--name 'APMS Embedding Schema' \
--description 'Tabular format for APMS music embeddings from PPI networks from the music pipeline from the B2AI Cellmaps for AI project' \
--separator ',' \
--header False \
./schema_apms_music_embedding.json
- Add a string property
$ fairscape-cli schema add-property string \
--name 'Experiment Identifier' \
--index 0 \
--description 'Identifier for the APMS experiment responsible for generating the raw PPI used to create this embedding vector' \
--pattern '^APMS_[0-9]*$' \
./schema_apms_music_embedding.json
- Add annother string property
$ fairscape-cli schema add-property string \
--name 'Gene Symbol' \
--index 1 \
--description 'Gene Symbol for the APMS bait protien' \
--pattern '^[A-Za-z0-9\-]*$' \
--value-url 'http://edamontology.org/data_1026' \
./schema_apms_music_embedding.json
- Add an array property
$ fairscape-cli schema add-property array \
--name 'MUSIC APMS Embedding' \
--index '2::' \
--description 'Embedding Vector values for genes determined by running node2vec on APMS PPI networks. Vector has 1024 values for each bait protien' \
--items-datatype 'number' \
--unique-items False \
--min-items 1024 \
--max-items 1024 \
./schema_apms_music_embedding.json
- Show a successful validation of the schema against the dataset
$ fairscape-cli schema validate \
--data ./examples/schemas/MUSIC_embedding/APMS_embedding_MUSIC.csv \
--schema ./examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
- Show an unsuccessful validation of the schema against the dataset
$ fairscape-cli schema validate \
--data examples/schemas/MUSIC_embedding/APMS_embedding_corrupted.csv \
--schema examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
- Validate using default schemas
# validate imageloader files
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/imageloader/samplescopy.csv" \
--schema "ark:59852/schema-cm4ai-imageloader-samplescopy"
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/imageloader/uniquecopy.csv" \
--schema "ark:59852/schema-cm4ai-imageloader-uniquecopy"
# validate image embedding outputs
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/image_embedding/image_emd.tsv" \
--schema "ark:59852/schema-cm4ai-image-embedding-image-emd"
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/image_embedding/labels_prob.tsv" \
--schema "ark:59852/schema-cm4ai-image-embedding-labels-prob"
# validate apsm loader input
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_gene_node_attributes.tsv" \
--schema "ark:59852/schema-cm4ai-apmsloader-gene-node-attributes"
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_edgelist.tsv" \
--schema "ark:59852/schema-cm4ai-apmsloader-ppi-edgelist"
# validate apms embedding
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/apms_embedding/ppi_emd.tsv" \
--schema "ark:59852/schema-cm4ai-apms-embedding"
# validate coembedding
$ fairscape-cli schema validate \
--data "examples/schemas/cm4ai-rocrates/coembedding/coembedding_emd.tsv" \
--schema "ark:59852/schema-cm4ai-coembedding"
If you'd like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.
This project is licensed under the terms of the MIT license.