Utils

This repository contains scripts and code snippets that serve as utilities for various tasks. The goal is to provide reusable code to avoid rewriting and reinventing the wheel. All content in this repository is open and free to use under a Creative Commons license. Feel free to use and contribute!

Current Content

1. Parse Hipathia Pathways to CSV for Neo4J: `hipath2CSV` Folder

This script extracts pathway information using the Hipathia package for a specified species and pathway ID. It generates two output files:

A CSV file containing node attributes.
A CSV file containing interactions/relations between nodes.

The CSV files can be used for further implementations, such as loading into Neo4J.

Clarification about `Path_ID`

path_id:
- This is a number that should be a valid pathway identifier from the KEGG pathway database.
- Example: 04210 for the Apoptosis pathway.
Path_ID1 Path_ID2 ... Path_IDN:
- This represents a space-separated list of pathway IDs that you want to process when using the multiple pathways option.
- Example: "04210 04150 04010" for processing the Apoptosis pathway, the Cell Cycle pathway, and the Glycolysis pathway simultaneously.

More information read this paper about Hipathia

1.1. Run for One Pathway

Usage

For a single pathway, you can use the get_path.R script. Run the following command:

Rscript get_path.R --species "hsa" --path_id "04210" --output_folder "pathways"

Options

-s, --species : Species code (e.g., 'hsa' for Homo sapiens) [default: "hsa"]
-p, --path_id : Pathway ID (e.g., '04210' for Apoptosis pathway) [default: "04210"]
-o, --output_folder : Output folder name where the files will be saved [default: "pathways"]
-q, --quiet : Suppress output messages

Example

Parsing the Apoptosis KEGG pathway for humans:

Rscript get_path.R --species "hsa" --path_id "04210" --output_folder "pathways"

For the Apoptosis KEGG pathway for mouse species with quiet mode:

Rscript get_path.R --species "mmu" --path_id "04150" --output_folder "mouse_pathways" -q

1.2. Run Multiple Pathways in Parallel: `get_paths_parallel.sh`

This script runs get_path.R for a list of pathway IDs in parallel using GNU Parallel. It allows you to process multiple pathways simultaneously, improving efficiency.

Note: if you are using conda env, please run this before:

conda activate <your_env>
conda install -c conda-forge parallel

More information

Usage

To use this script, run the following command:

chmod +x get_paths_parallel.sh
./get_paths_parallel.sh "Path_ID1 Path_ID2 ... Path_IDN" "output_folder_name" [-q]

Options

-q : Suppress output messages

Example

Running get_path.R for multiple pathways:

./get_paths_parallel.sh "04210 04150 04010" "ThreePathways"

Running get_path.R for multiple pathways with quiet mode:

./get_paths_parallel.sh "04210 04150 04010" "ThreePathways" -q

Note: -j is set to 0 by default (Actualy is hardcoded inside the get_paths_parallel.sh script :/), which uses all available CPU cores. Please adjust this value as needed in the script.

Dependencies

This script requires the following R packages: hipathia, igraph, dplyr, optparse.

License

This repository is licensed under a Creative Commons license. You are free to use, share, and adapt the content for any purpose, provided you give appropriate feedback :).

Author

Kinza Rian

Getting Started

To get started, clone this repository and explore the scripts available. Additional utilities and documentation will be added over time.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
hipath2CSV		hipath2CSV
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Utils

Current Content

1. Parse Hipathia Pathways to CSV for Neo4J: `hipath2CSV` Folder

Clarification about `Path_ID`

1.1. Run for One Pathway

Usage

Options

Example

1.2. Run Multiple Pathways in Parallel: `get_paths_parallel.sh`

Usage

Options

Example

Dependencies

License

Author

Getting Started

About

Releases

Packages

Languages

kinzaR/utils

Folders and files

Latest commit

History

Repository files navigation

Utils

Current Content

1. Parse Hipathia Pathways to CSV for Neo4J: hipath2CSV Folder

Clarification about Path_ID

1.1. Run for One Pathway

Usage

Options

Example

1.2. Run Multiple Pathways in Parallel: get_paths_parallel.sh

Usage

Options

Example

Dependencies

License

Author

Getting Started

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Parse Hipathia Pathways to CSV for Neo4J: `hipath2CSV` Folder

Clarification about `Path_ID`

1.2. Run Multiple Pathways in Parallel: `get_paths_parallel.sh`

Packages