Skip to content

Data Validation and Packaging utility for sending evidence graphs to FAIRSCAPE

License

Notifications You must be signed in to change notification settings

fairscape/fairscape-cli

Repository files navigation

fairscape-cli

A utility for packaging objects and validating metadata for FAIRSCAPE.


Features

fairscape-cli provides a Command Line Interface (CLI) that allows the client side to create:

  • RO-Crate - a light-weight approach to packaging research data with their metadata. The CLI allows users to:
    • Create Research Object Crates (RO-Crates)
    • Add (transfer) digital objects to the RO-Crate
    • Register metadata of the objects
    • Describe the schema of tabular dataset objects as metadata and perform validation.

Requirements

Python 3.8+

Installation

$ pip install fairscape-cli

Minimal example

Basic commands

  • Show all commands, arguments, and options
$ fairscape-cli --help
  • Create an RO-Crate in a specified directory
$ fairscape-cli rocrate create \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"
  • Create an RO-Crate in the current working directory
$ fairscape-cli rocrate init \
  --name "test rocrate" \
  --description "Example RO Crate for Tests" \
  --organization-name "UVA" \
  --project-name "B2AI"  \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS"
  • Add a dataset to the RO-Crate
$ fairscape-cli rocrate add dataset \
  --name "AP-MS embeddings" \
  --author "Krogan lab (https://kroganlab.ucsf.edu/krogan-lab)" \
  --version "1.0" \
  --date-published "2021-04-23" \
  --description "Affinity purification mass spectrometer (APMS) embeddings for each protein in the study,  generated by node2vec predict." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --data-format "CSV" \
  --source-filepath "./tests/data/APMS_embedding_MUSIC.csv" \
  --destination-filepath "./test_rocrate/APMS_embedding_MUSIC.csv" \
  "./test_rocrate"
  • Add a software to the RO-Crate
$ fairscape-cli rocrate add software \
  --name "calibrate pairwise distance" \
  --author "Qin, Y." \
  --version "1.0" \
  --description "script written in python to calibrate pairwise distance." \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  --file-format "py" \
  --source-filepath "./tests/data/calibrate_pairwise_distance.py" \
  --destination-filepath "./test_rocrate/calibrate_pairwise_distance.py" \
  --date-modified "2021-04-23" \
  "./test_rocrate"
  • Register a computation to the RO-Crate
$ fairscape-cli rocrate register computation \
  --name "calibrate pairwise distance" \
  --run-by "Qin, Y." \
  --date-created "2021-05-23" \
  --description "Average the predicted proximities" \
  --keywords "b2ai" \
  --keywords "cm4ai" \
  --keywords "U2OS" \
  "./test_rocrate"
  • Create a schema
$ fairscape-cli schema create-tabular \
    --name 'APMS Embedding Schema' \
    --description 'Tabular format for APMS music embeddings from PPI networks from the music pipeline from the B2AI Cellmaps for AI project' \
    --separator ',' \
    --header False \
    ./schema_apms_music_embedding.json
  • Add a string property
$ fairscape-cli schema add-property string \
    --name 'Experiment Identifier' \
    --index 0 \
    --description 'Identifier for the APMS experiment responsible for generating the raw PPI used to create this embedding vector' \
    --pattern '^APMS_[0-9]*$' \
    ./schema_apms_music_embedding.json
  • Add annother string property
$ fairscape-cli schema add-property string \
    --name 'Gene Symbol' \
    --index 1 \
    --description 'Gene Symbol for the APMS bait protien' \
    --pattern '^[A-Za-z0-9\-]*$' \
    --value-url 'http://edamontology.org/data_1026' \
    ./schema_apms_music_embedding.json
  • Add an array property
$ fairscape-cli schema add-property array \
    --name 'MUSIC APMS Embedding' \
    --index '2::' \
    --description 'Embedding Vector values for genes determined by running node2vec on APMS PPI networks. Vector has 1024 values for each bait protien' \
    --items-datatype 'number' \
    --unique-items False \
    --min-items 1024 \
    --max-items 1024 \
    ./schema_apms_music_embedding.json
  • Show a successful validation of the schema against the dataset
$ fairscape-cli schema validate \
    --data ./examples/schemas/MUSIC_embedding/APMS_embedding_MUSIC.csv  \
    --schema ./examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
  • Show an unsuccessful validation of the schema against the dataset
$ fairscape-cli schema validate \
    --data examples/schemas/MUSIC_embedding/APMS_embedding_corrupted.csv \
    --schema examples/schemas/MUSIC_embedding/music_apms_embedding_schema.json
  • Validate using default schemas
# validate imageloader files
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/imageloader/samplescopy.csv" \
        --schema "ark:59852/schema-cm4ai-imageloader-samplescopy" 
    
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/imageloader/uniquecopy.csv" \
        --schema "ark:59852/schema-cm4ai-imageloader-uniquecopy"
       
# validate image embedding outputs
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/image_embedding/image_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-image-embedding-image-emd"
     
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/image_embedding/labels_prob.tsv" \
        --schema "ark:59852/schema-cm4ai-image-embedding-labels-prob"

# validate apsm loader input
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_gene_node_attributes.tsv" \
        --schema "ark:59852/schema-cm4ai-apmsloader-gene-node-attributes"

$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apmsloader/ppi_edgelist.tsv" \
        --schema "ark:59852/schema-cm4ai-apmsloader-ppi-edgelist"

# validate apms embedding 
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/apms_embedding/ppi_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-apms-embedding"    

# validate coembedding 
$ fairscape-cli schema validate \
        --data "examples/schemas/cm4ai-rocrates/coembedding/coembedding_emd.tsv" \
        --schema "ark:59852/schema-cm4ai-coembedding"

Contribution

If you'd like to request a feature or report a bug, please create a GitHub Issue using one of the templates provided.

License

This project is licensed under the terms of the MIT license.

About

Data Validation and Packaging utility for sending evidence graphs to FAIRSCAPE

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages