To run the data processing procedure make sure you have installed in the MAIN folder the following:
- R version ≥ 3.5.0. The R version used to write this data processing procedure is 4.1.1 (2021-08-10) -- "Kick Things" and it is available at https://cran.r-project.org/src/base/R-4/ for MacOs, https://cran.r-project.org/bin/windows/base/old/4.1.1/ for Windows and https://cran.r-project.org/doc/manuals/r-patched/R-admin.html for Unix.
- PHP, available at https://www.php.net/manual/en/install.php
To visualize and interact with the data processing procedure, make sure you have installed Cytoscape version 3.9.1, available at https://cytoscape.org/download.html
Pick a Vitis vinifera or a Homo Sapiens gene and find its OneGenE expansion list.
To check if your Vitis vinifera (Vv) gene has been already expanded, go to VvOneGenE and under gene name(s) type the Ordered Locus Name (VIT_XXsYYYYgZZZZZ) or gene name of your Vv gene. You can also type multiple names (space separated) and get multiple expansion lists as a result. For example:
- Type VIT_04s0008g06000 in the gene name(s) box; this name corresponds to the transcription factor VvERF045.
- You are then redirected to the output page where you can check the expansion list of VIT_04s0008g06000 and press the download button.
- The expansion list will appear in your Donwnloadn folder as a zip compressed file, just extract it (54651_Vv-VIT_04s0008g06000.exp.csv) to use it. The expansion list is already annotated with additional information about the candidate genes that could be useful for a biologist.
- Move the espansion list to the MAIN folder of this project to provide it as input to the data processing procedure.
To check if your human gene has been already expanded, go to HsOneGenE, choose Homo Sapiens (Hs) in the Organism box and leave Tile size and Iterations blank; the significance level alpha is set to 0.05 by default. Under LGN name, type the gene symbol of you Hs gene. For example:
- Type MFSD2A in the LGN name box.
- You are then redirected to the output page where the first result contains the expansion list of MFSD2A. Click on its pcim_id (193111) and the download will start automatically.
- The expansion list will appear in your Donwnload folder as a zip compressed folder (193111_Hs.zip) . Unzip the folder to get access to its content: the .interactions file (193111_Hs.interactions) is the output of NES2RA, while the .expansion file (193111_Hs.expansion) is the actual expansion list, which is not in its working form, you have to annotate it first.
- Move the .interactions file to the MAIN folder of this project, open a new terminal panel in this folder and type the following command:
% php anno-hsf5.php file.interactions
- You expansion list is now available in MAIN in csv format (193111_Hs_p1@MFSD2A.csv) and you can provide it as input to the data processing procedure.
Right now, the file fantom_mat.csv is a placeholder for the actual FANTOM-full transcriptomic dataset, a gene@home version of the FANTOM5 transcriptomic dataset. This file should be replaced and renamed accordingly in order for the data processing procedure to work. The FANTOM-full transcriptomic dataset can be downoloaded from: Human OneGenE download page. The file will need to be extracted, renamed (fantom_mat.csv) and placed in the MAIN folder.
Right now, the file vespucci_mat.csv is a placeholder for the actual VESPUCCI transcriptomic dataset. This file should be replaced and renamed accordingly in order for the data processing procedure to work. The VESPUCCI transcriptomic dataset can be downoloaded from: Vitis OneGenE downolad page. The file will need to be extracted, renamed (vespucci_mat.csv) and placed in the MAIN folder.
To run the data processing procedure, make sure you have a terminal panel open in the MAIN folder and type the following command:
% Rscript install_packages.R
In this way, all the necessary R pacakges will be installed, if not present.
Next, you can type the actual command that runs the data processing procedure:
% Rscript --vanilla data_processing_procedure.R explist.csv organism_type n
The arguments you can provide are:
- explist.csv, which corresponds to the annotated expansion list of the gene under investigation. For Vv gene VIT_04s0008g06000 it is 54651_Vv-VIT_04s0008g06000.exp.csv and for Hs gene MFSD2A, it is 193111_Hs_p1@MFSD2A.csv (as explained in input-preparation);
- organism_type, which corresponds to the organism to which the gene under investigation belongs;
- n, which can be the first n genes you select from the expansion list (make sure that n is not greater than the expansion list length) or the relative frequency threshold according to which you can cut the expansion list, by selecting only the candidate genes with relative frequency >= n (0 <n <= 1).
Here are some examples:
% Rscript --vanilla data_processing_procedure.R 54651_Vv-VIT_04s0008g06000.exp.csv Vv 0.7
% Rscript --vanilla data_processing_procedure.R 54651_Vv-VIT_04s0008g06000.exp.csv Vv 150
% Rscript --vanilla data_processing_procedure.R 193111_Hs_p1@MFSD2A.csv Hs 0.5
% Rscript --vanilla data_processing_procedure.R 193111_Hs_p1@MFSD2A.csv Hs 200
Note: The expansion lists of NFKB1 and TNF, used in the biological validation of the paper are made available for testing.
The data processing procedure has two output files:
- a list of edges (gene_edges.csv), which represents the interactions retrieved by pc_parallel() between the surviving input gene nodes, divided into source and target, and the direction of their interaction, --- if undirected or --> if directed. Also the pearson correlation (cor) computed between the input genes, as the zero-order conditional independence test, is provided, along with its sign (cor_sign)
spurce | interaction | target | cor | cor_sign |
---|---|---|---|---|
T178190 | --- | T009518 | 0.455 | + |
T032201 | --> | T054717 | 0.699 | + |
- a list of nodes (gene_nodes.csv), which represents the input gene nodes that survived after pc_parallel() application and for which an interaction was found in the output graph. Additional information, extracted from human and grapevine annotation files, is added for the biological interpretation of the results.
These two files are contained in the Vv folder inside the MAIN folder, if a grapevine expansion list was chosen as input to the data processing procedure, or they are contained in the Hs folder inside the MAIN folder, if a human expansion list was chosen as input.
ID | association_with_transcript | entrezgene_id | hgnc_id | uniprot_id | description | rank | Frel | type |
---|---|---|---|---|---|---|---|---|
T178190 | p4@CDK6 | CDK6 | 1777 | Q00534 | cyclin dependent kinase 6 | 14 | 0.996 | gene with protein product |
To visualize the pc_parallel() ouptut graph on Cytoscape, do the following steps:
- Open Cytoscape and allow the app to accept incoming network connections;
- Select the network icon from the main horizontal toolbar, which stands for Import Network from File System (or from File -> Import -> Network from file) and select the gene_edges.csv file from the Vv folder or Hs folder inside MAIN;
- Click the OK button in the Import Network from Table panel and wait for the network to load;
- Select the table icon from the main horizontal toolbar, which stands for Import Table from File (or from File -> Import -> Table from file) and select the gene_nodes.csv file from the Vv folder or Hs folder inside MAIN;
- Click the OK button in the Import Columns from Table panel and wait for the table to load;
- The Vv and Hs folders contain respectively a Vv_style.xml and a Hs_style.xml that can be uploaded in Cytoscape to customize the network appearance (From File -> Import -> Styles from file...). This feature is managed by the Style panel (under Network in the main vertical toolbar), from which you can selected the uploaded style and visualize the network in a more human-friendly and enriched way.