Skip to content

An up-to-date list of papers and code relating to progress in the burgeoning field of AI-Drug Discovery

License

Notifications You must be signed in to change notification settings

aced125/AI_in_Drug_Discovery_Progress

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

AI in Drug Discovery Progress

This repository contains an up-to-date list (as of September 2019) of progress (papers, github repos etc) made in applying AI to drug discovery.

2019

General

Lead Author Group Title Citations Code
Brown BenevolentAI GuacaMol: Benchmarking Models for de Novo Molecular Design 13 Yes
Mater Coote Deep Learning in Chemistry 6 -
Preuer Unterthiner Interpretable Deep Learning in Drug Discovery 6 -
Krenn Aspuru-Guzik SELFIES: a robust representation of semantically constrained graphs with an example application in chemistry - Yes

Brown, Fiscato, Segler, Vaucher

A set of benchmarks used to assess the quality of generative models. Benchmarks divided into distribution-learning, goal-directed, and assessment of compound quality.

Mater, Coote

An up-to-date, accurate and comprehensive review of where deep learning currently is with respect to chemistry.

Preuer, Klambauer, Rippmann, Hochreiter, Unterthiner

Description of a method to a) interpret parts of a molecule that results in biological activity and b) construct pharmacophores from graph convolutional neural networks.

Krenn, Hase, Nigam, Friederich, Aspuru-Guzik

Alternative representation to SMILES, which seems to perform better at reconstruction accuracy in generative tests.

Generative

Lead Author Group Title Citations Code
Zhavoronkov Insilico Deep learning enables the rapid identification of potent DDR1 kinase inhibitors 1 Yes
Jensen Jensen A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space. - Yes

Zhavoronkov et al

This paper sparked much public press. Uses deep RL to optimize for properties in an GRU-encoded latent space.

Jensen

Genetic algorithm that performs well on GuacaMol benchmarks.

Predictive

Lead Author Group Title Citations Code
Wang Huang SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction - -
Cortes-Ciriano Bender KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images 2 Yes
Lee Lee Ligand biological activity predicted by cleaning positive and negative chemical correlations 2 Yes
Withnall Chen Attention and Edge Memory Convolution for Bioactivity Prediction - -

Withnall, Lindelöf, Engkvist, Chen

One of the first few examples of neural attention being used in drug discovery.

Wang, Guo, Wang, Sun, Huang

Taking inspiration from the recent monumental progress in NLP, Wang applies Google's language-model ideas to massive amounts of chemical data.

Cortes-Ciriano, Bender

Using modern CNN architectures and transfer learning from ImageNet to predict activity from RDKit-rendered skeletal structures of the ligand.

Lee, Yang, Bassyouni, Butler, Hou, Jenkinson, Price

Lee's original RMT (random matrix theory) algorithm is extended to incorporate information from inactive compounds.

Reaction Prediction and Retrosynthesis

Lead Author Group Title Citations Code
Schwaller Lee Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction 6 Yes
Lee Lee Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space - -

Schwaller, Laino, Gaudin, Bolgar, Hunter, Bekas, Lee

Uses the latest cutting-edge research from the NLP community (transformer networks), viewing reaction prediction as a machine translation problem. Word1/Molecule1 + Word2/Molecule2 ----translates_to----> Word3/Molecule3

Lee, Yang, Sresht, Bolgar, Hou, Klug-McLeod, Butler

Applies the above technology to retrosynthesis

2018

General

Lead Author Group Title Citations Code
Wu Pande MoleculeNet: a benchmark for molecular machine learning 208 DeepChem
Chen Blaschke The rise of deep learning in drug discovery 169 -
O'Boyle NextMove DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures 5 Yes

Chen, Engkvist, Wang, Olivecrona, Blaschke

A review on the latest developments in the field.

Wu, Ramsundar, Feinberg, Gomes, Geniesse, Pappu, Leswing, Pande

The Pande group curated a set of datasets to assess the quality of a machine learning model on chemistry/drug discovery/molecular problems.

Generative

Lead Author Group Title Citations Code
Segler Benevolent Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks 188 Yes
Jin Jaakkola Junction Tree Variational Autoencoder for Molecular Graph Generation. 93 Yes
Popova Tropsha Deep reinforcement learning for de novo drug design 82 Yes

Segler, Kogej, Tyrchan, Waller

A straightforward example of the application of RNNs (specifically LSTMs) to generation of molecules and exploration of chemical space, using SMILES as input featurization.

Jin, Barzilay, Jaakkola

Featurizes molecules based on its component fragments. Large improvement over other featurizations in many tasks.

Popova, Isayev, Tropsha

Illustration of the use of deep reinforcement learning to generate molecules with a bias towards certain properties (bioactivity, logP etc). Features the use of a Stack Neural Network for encoding, introduced by Facebook researchers recently.

Retrosynthesis and Reaction Prediction

Lead Author Group Title Citations Code
Segler Waller Planning chemical syntheses with deep neural networks and symbolic AI 216 -

Segler, Waller

Nature paper on automated retrosynthesis. Clever data augmentation methods to generate more negative data.

2017

General

Lead Author Group Title Citations Code
Goh Vishnu Deep Learning for Computational Chemistry 157 -
Wallach Atomwise Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization 20 -
Axen Keiser A Simple Representation of Three-Dimensional Molecular Structure 9 Yes

Goh, Hodas, Vishnu

A review of the current use cases for deep learning in computational chemistry.

Wallach, Heifets

Introduction of the Asymmetric Validation Embedding (AVE) bias to better assess the domain of applicability of a machine learning model.

Axen, Huang, Caceres, Gendelev, Roth, Keiser

First introduction of the concept of a 3D fingerprint. Performs moderately in benchmarks.

Predictive

Lead Author Group Title Citations Code
Gilmer Google Neural Message Passing for Quantum Chemistry 522 -
Altae-Tran Pande Low Data Drug Discovery with One-Shot Learning 166 DeepChem
Faber Lilienfeld/Google Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error 143 -
Goh Vishnu Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models 44 -
Kearnes Vertex/Pande Modeling Industrial ADMET Data with Multitask Networks 30 -

Gilmer, Schoenholz, Riley, Vinyals, Dahl

Message passing neural networks on molecular graphs are shown to outperform previous state-of-the-art on a quantum chemistry dataset (QM9)

Altae-Tran, Ramsundar, Pappu, Pande

Goh, Siegel, Vishnu, Hodas, Baker

One of the first examples of (almost naively) applying convolutional neural networks to pictures of molecules. Surprisingly (or not) it performs as well as conventional models that require domain knowledge to create.

Kearnes, Goldman, Pande

A solid use case of multitask networks.

Faber, Hutchison, Huang, Gilmer, Schoenholz, Dahl, Vinyals, Kearnes, Riley, Lilienfeld

A thorough of the power of machine learning models, and in particular, deep methods, to the application of prediction of quantum mechanical properties of molecules. Suggests that given enough data with electron correlation, ML models could outperform hybrid DFT.

Generative

Lead Author Group Title Citations Code
Sanchez-Lengeling Aspuru-Guzik Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC) 46 Yes

Sanchez-Lengeling, Outeiral, Guimaraes, Aspuru-Guzik

Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC)

Uses the recently popular combination of combining a GAN with reinforcement learning to direct generative examples towards a defined prior.

Retrosynthesis and Reaction Prediction

Lead Author Group Title Citations Code
Segler Waller Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction 94 -
Liu Pande Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models 70 -

Segler, Waller

One of the first examples of the use of RNNs for reaction prediction and retrosynthesis. Makes use of the attention mechanism.

2016

General

Predictive

Lead Author Group Title Citations Code
Kearnes Google/Pande Molecular Graph Convolutions: Moving Beyond Fingerprints 327 -
Lee Brenner/Colwell Predicting protein-ligand affinity with a random matrix framework 20 Yes

Kearnes, McCloskey, Berndl, Pande, Riley

Further demonstrate of the possible merits in using graph convolutions for molecular machine learning.

Lee, Brenner, Colwell

Development of a simple algorithm based on PCA (principle component analysis) and RMT (random matrix theory) to classify bioactivity of molecules, and gain interpretability of pharmacophores.

2015

Generative

Lead Author Group Title Citations Code
Bombarelli Aspuru-Guzik Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules 410 Yes

Bombarelli, Wei, Duvenaud, Hernandez-Lobato, Sanchez-Lengeling, Sheberla, Aguilera-Iparraguirre, Hirzel, Adams, Aspuru-Guzik

A SMILES variational autoencoder maps molecules to a latent space, which is continuous and differentiable, and can be optimized for certain properties (logP, QED, SAS, bioactivity etc)

Predictive

Lead Author Group Title Citations Code
Duvenaud Aspuru-Guzik/Adams Convolutional Networks on Graphs for Learning Molecular Fingerprints 749 -
Ma Sheridan Deep Neural Nets as a Method for Quantitatitve Structure-Activity Relationships 380 -
Ramsundar Google/Pande Massively Multitask Networks for Drug Discovery 222 DeepChem
Wallach Atomwise AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery 165 -

Duvenaud, Maclaurin, Aguilera-Iparraguirre, Gomez-Bombarelli, Aspuru-Guzik, Adams

First example of elucidating the potential of graph convolutions on molecules.

Ma, Sheridan, Liaw, Dahl, Svetnik

Follow-up paper to the Merck Kaggle challenge, which was won by a researcher in Hinton's lab. One of the first examples of the pushing of deep learning into the limelight for drug discovery.

Ramsundar, Kearnes, Riley

Using a shared representation of hundreds of thousands of molecules to predict activity at multiple targets simultaneously. Some analysis is done to elucidate on the multitask effect.

Wallach, Dzamba, Heifets

First known example of CNNs being applied to ligand-based drug discovery in the literature.

2014

Predictive

Lead Author Group Title Citations Code
Dahl Salakhutdinov Multi-task Neural Networks for QSAR Predictions 156 -

Dahl, Jaitly, Salakhutdinov

First description of multi-task networks for drug discovery in the literature. Provides a short account of their application in the Merck Kaggle challenge of 2012.

2013

General

Lead Author Group Title Citations Code
Sheridan Sheridan Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction 84 -

Sheridan

Sheridan argues that random splitting of train and test sets results in too optimistic predictions, whereas scaffold-based splitting is too pessimistic. Time-validation splits are the most realistic split and corresponds to the data a model will face when deployed.

2012

General

Lead Author Group Title Citations Code
Bickerton Exscientia Quantifying the chemical beauty of drugs 420 Yes

Bickerton, Paolini, Besnard, Muresan, Hopkins

Introduction of a metric to assess general drug-likeness based on modelling probability distributions for Lipinski's 5 paramters using a curated set of orally active pharmaceuticals.

Generative

Lead Author Group Title Citations Code
Besnard Exscientia Automated design of ligands to polypharmacological profiles 454 -

Besnard et al

Multi-objective optimization using Bayesian methods. Generative design using priors derived from medicinal chemistry.

Predictive

Lead Author Group Title Citations Code
Montavon Lilienfeld Learning Invariant Representations of Molecules for Atomization Energy Prediction 85 Yes
Chen Voigt Comparison of Random Forest and Pipeline Pilot Naive Bayes in Prospective QSAR Predictions 60 Yes

Montavon, Hansen, Fazil, Rupp, Biegler, Ziehe, Tkatchenko, Lilienfeld, Muller

First example of the use of the Coulomb matrix for inferring quantum mechanical properties of the molecule.

Chen, Sheridan, Hornak, Voigt

These authors argue that, although Random Forest is computationally expensive, it often outperforms Naive Bayes significantly.

2010

General

Lead Author Group Title Citations Code
Rogers ? Extended-connectivity fingerprints 1638 -

Rogers, Hahn

Fingerprints are one of the best featurizations for chemoinformatics and machine-learning tasks for chemistry. A key paper in the field.

1998

General

Lead Author Group Title Citations Code
Weininger Weininger SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules 2719 -

About

An up-to-date list of papers and code relating to progress in the burgeoning field of AI-Drug Discovery

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published