GitHub - jingxuyy/PDATC-NCPMKL: PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning In this study, a new recommendation system (named PDATC-NCPMKL), which incorporated network consistency projection and multi-kernel learning, was designed to identify drug-ATC code associations.

PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning

This code is an implementation of our paper "PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning"

We proposed a model PDATC-NCPMKL based on multiple kernel learning and network consistency projection algorithm. By integrating multi-source information of drugs (drug target protein, drug side effect, drug interaction, drug fingerprint, and drug-ATC code association), several drug kernels were constructed. In the same way, the ATC code kernels were set up. The drug and ATC code kernels were fused into a unified drug kernel and ATC code kernel by a multiple kernel learning algorithm and a kernel integrated scheme. On the other hand, the drug-ATC code association adjacency matrix was reformulated by a variant of weighted K nearest known neighbors (WKNKN). Above kernels and matrix were fed into the network consistency projection to generate the association score matrix. The model was tested on the ATC codes at the second, third and fourth levels using ten-fold cross-validation. For detailed descriptions on the model and results, please refer to our article.

Requirements

python = 3.8
cvxopt = 1.3.0
cvxpy = 1.2.0
ecos = 2.0.10
fastcache = 1.1.0
numpy+mkl = 1.22.4
osqp = 0.6.2
pandans = 1.3.5
scikit-learn = 1.0.2
scipy = 1.7.3
scs = 3.2.0

Files:

filename	explain
new_2930_second_ATC.csv	The adjacency matrix of drugs and ATC codes at the second level.
new_2930_third_ATC.csv	The adjacency matrix of drugs and ATC codes at the third level.
new_2930_fourth_ATC.csv	The adjacency matrix of drugs and ATC codes at the fourth level.
2930_fingerprint.csv	The drug representation based on their fingerprints.
side_effects.csv	The drug representation based on their side effects.
uniprot.csv	The drug representation based on their target proteins.
interaction_kernel.csv	The drug kernel using the interaction information collected in STITCH.

Drug SMILES information, drug ATC code information, and drug target protein information were obtained from DrugBank database (https://go.drugbank.com/), drug side effect information were obtained from SIDER database (http://sideeffects.emblde/), and drug interaction information was obtained from STITCH website (http://stitch4.embl.de/).

Usage

How to use it?

1. Use the data set we provide

1.1 Cross verification

If you use our dataset for cross-validation, all you need to do is enter the following command in the terminal:

 python main.py

1.2 Modify model parameters

You just need to adjust the following code in the main.py file.

if __name__ == "__main__":
    drug_atc_path = 'data/drug_ATC/new_2930_fourth_ATC.csv'
    op = Options(drug_atc_path=drug_atc_path, level=4, omega=0.9)
    op.train(k=10)

drug_atc_path is the file path storing the adjacency matrix of drug and ATC codes.
The parameter level represents the level of ATC codes. It can be 2, 3 and 4.
The parameter omega represents parameter in WKNKN when reformulating the adjacency matrix. It can be any numbers between 0.0 and 1.0.
The parameter k represents the number of folds in cross-validation. k was set to 10 in our study.

2. Use your own data set

2.1 Preprocessed data set

You need to prepare some files, which are all in CSV format. The detailed format is displayed as below:

1. The adjacency matrix of drug-ATC code associations

DrugBankID	code1	code2	code3	code4	...	codem
drugID1	0	1	1	0	...	0
drugID2	1	0	0	1	...	0
drugID3	1	1	0	0	...	1
...	...	...	...	...	...	...
drugIDn	0	0	1	1	...	0

2. Drug fingerprints matrix

DrugBankID	F1	F2	F3	F4	...
drugID1	0	1	1	0	...
drugID2	1	0	0	1	...
drugID3	1	1	0	0	...
...	...	...	...	...	...
drugIDn	0	0	1	1	...

3. Drug interaction kernel

DrugBankID	drugID1	drugID2	drugID3	drugID4	...	drugIDn
drugID1	1	0.3	0	0.75	...	0.33
drugID2	0.3	1	0.9	0.22	...	0.68
drugID3	0	0.9	1	0	...	0.47
drugID4	0.75	0.22	0	1	...	0.92
...	...	...	...	...	...	...
drugIDn	0.33	0.68	0.47	0.92	...	1

4. Drug side effects matrix

DrugBankID	side1	side2	side3	side4	...
drugID1	1	0	0	1	...
drugID2	1	1	0	0	...
drugID3	0	0	0	1	...
...	...	...	...	...	...
drugIDn	0	0	1	1	...

5. Drug target proteins matrix

DrugBankID	target1	target2	target3	target4	...
drugID1	0	1	0	1	...
drugID2	1	0	0	0	...
drugID3	0	0	1	1	...
...	...	...	...	...	...
drugIDn	0	0	1	0	...

6. Because it involves ATC code tree structure to find the shortest path, different data sets involve different ATC, so you should prepare the shortest path file for ATC codes at different levels. It is also in CSV format, as shown below

	ATCcode1	ATCcode2	ATCcode3	ATCcode4	...	ATCcodem
ATCcode1	0	2	2	4	...	4
ATCcode2	2	0	2	8	...	6
ATCcode3	2	2	0	4	...	8
ATCcode4	4	8	4	0	...	2
...	...	...	...	...	...	...
ATCcodem	4	6	8	2	...	0

You should put this file in the PDATC-NCPMKL/shortest_path/ folder, and it should have the same file name as mine. (For example, the second level ATC code file is named new_2ATC_shortest_path_length_matrix.csv)
In addition, in order to prevent the accuracy of SPro kernel matrix calculation, ensure that the order of ATCcode here is consistent with that in the adjacency matrix of drug-ATC code.

2.2 Cross verification

You just need to modify the following code in the main.py file to run it:

def file_path(self):
    drug_fingerprint_path = 'your own drug fingerprint file path'
    drug_side_effects_path = 'your own drug side effect file path'
    drug_target_protein_path = 'your own drug target protein file path'
    drug_interaction_path = 'your own drug interaction file path'
    return drug_fingerprint_path, drug_side_effects_path, drug_target_protein_path, drug_interaction_path

if __name__ == "__main__":
    drug_atc_path = 'your drug-ATC code adjacency matrix file path'
    op = Options(drug_atc_path=drug_atc_path, level=4, omega=0.9)
    op.train(k=10)

The results predicted by the model

After running our model, the PDATC-NCPMKL_predict.csv file and PDATC-NCPMKL_actual.csv file will be generated, where the PDATC-NCPMKL_predict.csv file will store the predicted score, the actual value is saved in the PDATC-NCPMKL_actual.csv file.

Result

The PR curves and ROC curves predicted by our model on the dataset are shown below:

The PR curves
The ROC curves

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Network_consistency_projection		Network_consistency_projection
Top_Similar		Top_Similar
data		data
data_split		data_split
kernel		kernel
shortest_path		shortest_path
.gitattributes		.gitattributes
AUC.png		AUC.png
PR.png		PR.png
README.md		README.md
main.py		main.py
model.png		model.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning

Requirements

Files:

Usage

How to use it?

1. Use the data set we provide

1.1 Cross verification

1.2 Modify model parameters

2. Use your own data set

2.1 Preprocessed data set

1. The adjacency matrix of drug-ATC code associations

2. Drug fingerprints matrix

3. Drug interaction kernel

4. Drug side effects matrix

5. Drug target proteins matrix

6. Because it involves ATC code tree structure to find the shortest path, different data sets involve different ATC, so you should prepare the shortest path file for ATC codes at different levels. It is also in CSV format, as shown below

2.2 Cross verification

The results predicted by the model

Result

About

Releases

Packages

Languages

jingxuyy/PDATC-NCPMKL

Folders and files

Latest commit

History

Repository files navigation

PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning

Requirements

Files:

Usage

How to use it?

1. Use the data set we provide

1.1 Cross verification

1.2 Modify model parameters

2. Use your own data set

2.1 Preprocessed data set

1. The adjacency matrix of drug-ATC code associations

2. Drug fingerprints matrix

3. Drug interaction kernel

4. Drug side effects matrix

5. Drug target proteins matrix

6. Because it involves ATC code tree structure to find the shortest path, different data sets involve different ATC, so you should prepare the shortest path file for ATC codes at different levels. It is also in CSV format, as shown below

2.2 Cross verification

The results predicted by the model

Result

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages