AI in Drug Discovery Progress

This repository contains an up-to-date list (as of September 2019) of progress (papers, github repos etc) made in applying AI to drug discovery.

2019

General

Lead Author	Group	Title	Citations	Code
Brown	`BenevolentAI`	GuacaMol: Benchmarking Models for de Novo Molecular Design	13	Yes
Mater	`Coote`	Deep Learning in Chemistry	6	-
Preuer	`Unterthiner`	Interpretable Deep Learning in Drug Discovery	6	-
Krenn	`Aspuru-Guzik`	SELFIES: a robust representation of semantically constrained graphs with an example application in chemistry	-	Yes

Brown, Fiscato, Segler, Vaucher

GuacaMol: Benchmarking Models for de Novo Molecular Design

A set of benchmarks used to assess the quality of generative models. Benchmarks divided into distribution-learning, goal-directed, and assessment of compound quality.

Mater, Coote

Deep Learning in Chemistry

An up-to-date, accurate and comprehensive review of where deep learning currently is with respect to chemistry.

Preuer, Klambauer, Rippmann, Hochreiter, Unterthiner

Interpretable Deep Learning in Drug Discovery

Description of a method to a) interpret parts of a molecule that results in biological activity and b) construct pharmacophores from graph convolutional neural networks.

Krenn, Hase, Nigam, Friederich, Aspuru-Guzik

SELFIES: a robust representation of semantically constrained graphs with an example application in chemistry

Alternative representation to SMILES, which seems to perform better at reconstruction accuracy in generative tests.

Generative

Lead Author	Group	Title	Citations	Code
Zhavoronkov	`Insilico`	Deep learning enables the rapid identification of potent DDR1 kinase inhibitors	1	Yes
Jensen	`Jensen`	A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space.	-	Yes

Zhavoronkov et al

Deep learning enables the rapid identification of potent DDR1 kinase inhibitors

This paper sparked much public press. Uses deep RL to optimize for properties in an GRU-encoded latent space.

Jensen

A graph-based genetic algorithm and generative model/Monte Carlo tree search for the exploration of chemical space.

Genetic algorithm that performs well on GuacaMol benchmarks.

Predictive

Lead Author	Group	Title	Citations	Code
Wang	`Huang`	SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction	-	-
Cortes-Ciriano	`Bender`	KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images	2	Yes
Lee	`Lee`	Ligand biological activity predicted by cleaning positive and negative chemical correlations	2	Yes
Withnall	`Chen`	Attention and Edge Memory Convolution for Bioactivity Prediction	-	-

Withnall, Lindelöf, Engkvist, Chen

Attention and Edge Memory Convolution for Bioactivity Prediction

One of the first few examples of neural attention being used in drug discovery.

Wang, Guo, Wang, Sun, Huang

SMILES-BERT: Large Scale Unsupervised Pre-Training for Molecular Property Prediction

Taking inspiration from the recent monumental progress in NLP, Wang applies Google's language-model ideas to massive amounts of chemical data.

Cortes-Ciriano, Bender

KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images

Using modern CNN architectures and transfer learning from ImageNet to predict activity from RDKit-rendered skeletal structures of the ligand.

Lee, Yang, Bassyouni, Butler, Hou, Jenkinson, Price

Ligand biological activity predicted by cleaning positive and negative chemical correlations

Lee's original RMT (random matrix theory) algorithm is extended to incorporate information from inactive compounds.

Reaction Prediction and Retrosynthesis

Lead Author	Group	Title	Citations	Code
Schwaller	`Lee`	Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction	6	Yes
Lee	`Lee`	Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space	-	-

Schwaller, Laino, Gaudin, Bolgar, Hunter, Bekas, Lee

Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction

Uses the latest cutting-edge research from the NLP community (transformer networks), viewing reaction prediction as a machine translation problem. Word1/Molecule1 + Word2/Molecule2 ----translates_to----> Word3/Molecule3

Lee, Yang, Sresht, Bolgar, Hou, Klug-McLeod, Butler

Molecular Transformer unifies reaction prediction and retrosynthesis across pharma chemical space

Applies the above technology to retrosynthesis

2018

General

Lead Author	Group	Title	Citations	Code
Wu	`Pande`	MoleculeNet: a benchmark for molecular machine learning	208	DeepChem
Chen	`Blaschke`	The rise of deep learning in drug discovery	169	-
O'Boyle	`NextMove`	DeepSMILES: An Adaptation of SMILES for Use in Machine-Learning of Chemical Structures	5	Yes

Chen, Engkvist, Wang, Olivecrona, Blaschke

The rise of deep learning in drug discovery

A review on the latest developments in the field.

Wu, Ramsundar, Feinberg, Gomes, Geniesse, Pappu, Leswing, Pande

MoleculeNet: a benchmark for molecular machine learning

The Pande group curated a set of datasets to assess the quality of a machine learning model on chemistry/drug discovery/molecular problems.

Generative

Lead Author	Group	Title	Citations	Code
Segler	`Benevolent`	Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks	188	Yes
Jin	`Jaakkola`	Junction Tree Variational Autoencoder for Molecular Graph Generation.	93	Yes
Popova	`Tropsha`	Deep reinforcement learning for de novo drug design	82	Yes

Segler, Kogej, Tyrchan, Waller

Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks

A straightforward example of the application of RNNs (specifically LSTMs) to generation of molecules and exploration of chemical space, using SMILES as input featurization.

Jin, Barzilay, Jaakkola

Junction Tree Variational Autoencoder for Molecular Graph Generation

Featurizes molecules based on its component fragments. Large improvement over other featurizations in many tasks.

Popova, Isayev, Tropsha

Deep reinforcement learning for de novo drug design

Illustration of the use of deep reinforcement learning to generate molecules with a bias towards certain properties (bioactivity, logP etc). Features the use of a Stack Neural Network for encoding, introduced by Facebook researchers recently.

Retrosynthesis and Reaction Prediction

Lead Author	Group	Title	Citations	Code
Segler	`Waller`	Planning chemical syntheses with deep neural networks and symbolic AI	216	-

Segler, Waller

Planning chemical syntheses with deep neural networks and symbolic AI

Nature paper on automated retrosynthesis. Clever data augmentation methods to generate more negative data.

2017

General

Lead Author	Group	Title	Citations	Code
Goh	`Vishnu`	Deep Learning for Computational Chemistry	157	-
Wallach	`Atomwise`	Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization	20	-
Axen	`Keiser`	A Simple Representation of Three-Dimensional Molecular Structure	9	Yes

Goh, Hodas, Vishnu

Deep Learning for Computational Chemistry

A review of the current use cases for deep learning in computational chemistry.

Wallach, Heifets

Most Ligand-Based Classification Benchmarks Reward Memorization Rather than Generalization

Introduction of the Asymmetric Validation Embedding (AVE) bias to better assess the domain of applicability of a machine learning model.

Axen, Huang, Caceres, Gendelev, Roth, Keiser

A Simple Representation of Three-Dimensional Molecular Structure

First introduction of the concept of a 3D fingerprint. Performs moderately in benchmarks.

Predictive

Lead Author	Group	Title	Citations	Code
Gilmer	`Google`	Neural Message Passing for Quantum Chemistry	522	-
Altae-Tran	`Pande`	Low Data Drug Discovery with One-Shot Learning	166	DeepChem
Faber	`Lilienfeld/Google`	Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error	143	-
Goh	`Vishnu`	Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models	44	-
Kearnes	`Vertex/Pande`	Modeling Industrial ADMET Data with Multitask Networks	30	-

Gilmer, Schoenholz, Riley, Vinyals, Dahl

Neural Message Passing for Quantum Chemistry

Message passing neural networks on molecular graphs are shown to outperform previous state-of-the-art on a quantum chemistry dataset (QM9)

Altae-Tran, Ramsundar, Pappu, Pande

Low Data Drug Discovery with One-Shot Learning

Goh, Siegel, Vishnu, Hodas, Baker

Chemception: A Deep Neural Network with Minimal Chemistry Knowledge Matches the Performance of Expert-developed QSAR/QSPR Models

One of the first examples of (almost naively) applying convolutional neural networks to pictures of molecules. Surprisingly (or not) it performs as well as conventional models that require domain knowledge to create.

Kearnes, Goldman, Pande

Modeling Industrial ADMET Data with Multitask Networks

A solid use case of multitask networks.

Faber, Hutchison, Huang, Gilmer, Schoenholz, Dahl, Vinyals, Kearnes, Riley, Lilienfeld

Prediction Errors of Molecular Machine Learning Models Lower than Hybrid DFT Error

A thorough of the power of machine learning models, and in particular, deep methods, to the application of prediction of quantum mechanical properties of molecules. Suggests that given enough data with electron correlation, ML models could outperform hybrid DFT.

Generative

Lead Author	Group	Title	Citations	Code
Sanchez-Lengeling	`Aspuru-Guzik`	Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC)	46	Yes

Sanchez-Lengeling, Outeiral, Guimaraes, Aspuru-Guzik

Optimizing distributions over molecular space. An Objective-Reinforced Generative Adversarial Network for Inverse-design Chemistry (ORGANIC)

Uses the recently popular combination of combining a GAN with reinforcement learning to direct generative examples towards a defined prior.

Retrosynthesis and Reaction Prediction

Lead Author	Group	Title	Citations	Code
Segler	`Waller`	Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction	94	-
Liu	`Pande`	Retrosynthetic Reaction Prediction Using Neural Sequence-to-Sequence Models	70	-

Segler, Waller

Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction

One of the first examples of the use of RNNs for reaction prediction and retrosynthesis. Makes use of the attention mechanism.

2016

General

Predictive

Lead Author	Group	Title	Citations	Code
Kearnes	`Google/Pande`	Molecular Graph Convolutions: Moving Beyond Fingerprints	327	-
Lee	`Brenner/Colwell`	Predicting protein-ligand affinity with a random matrix framework	20	Yes

Kearnes, McCloskey, Berndl, Pande, Riley

Molecular Graph Convolutions: Moving Beyond Fingerprints

Further demonstrate of the possible merits in using graph convolutions for molecular machine learning.

Lee, Brenner, Colwell

Predicting protein-ligand affinity with a random matrix framework

Development of a simple algorithm based on PCA (principle component analysis) and RMT (random matrix theory) to classify bioactivity of molecules, and gain interpretability of pharmacophores.

2015

Generative

Lead Author	Group	Title	Citations	Code
Bombarelli	`Aspuru-Guzik`	Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules	410	Yes

Bombarelli, Wei, Duvenaud, Hernandez-Lobato, Sanchez-Lengeling, Sheberla, Aguilera-Iparraguirre, Hirzel, Adams, Aspuru-Guzik

https://pubs.acs.org/doi/full/10.1021/acscentsci.7b00572

A SMILES variational autoencoder maps molecules to a latent space, which is continuous and differentiable, and can be optimized for certain properties (logP, QED, SAS, bioactivity etc)

Predictive

Lead Author	Group	Title	Citations	Code
Duvenaud	`Aspuru-Guzik/Adams`	Convolutional Networks on Graphs for Learning Molecular Fingerprints	749	-
Ma	`Sheridan`	Deep Neural Nets as a Method for Quantitatitve Structure-Activity Relationships	380	-
Ramsundar	`Google/Pande`	Massively Multitask Networks for Drug Discovery	222	DeepChem
Wallach	`Atomwise`	AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery	165	-

Duvenaud, Maclaurin, Aguilera-Iparraguirre, Gomez-Bombarelli, Aspuru-Guzik, Adams

Convolutional Networks on Graphs for Learning Molecular Fingerprints

First example of elucidating the potential of graph convolutions on molecules.

Ma, Sheridan, Liaw, Dahl, Svetnik

Deep Neural Nets as a Method for Quantitatitve Structure-Activity Relationships

Follow-up paper to the Merck Kaggle challenge, which was won by a researcher in Hinton's lab. One of the first examples of the pushing of deep learning into the limelight for drug discovery.

Ramsundar, Kearnes, Riley

Massively Multitask Networks for Drug Discovery

Using a shared representation of hundreds of thousands of molecules to predict activity at multiple targets simultaneously. Some analysis is done to elucidate on the multitask effect.

Wallach, Dzamba, Heifets

AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in Structure-based Drug Discovery

First known example of CNNs being applied to ligand-based drug discovery in the literature.

2014

Predictive

Lead Author	Group	Title	Citations	Code
Dahl	`Salakhutdinov`	Multi-task Neural Networks for QSAR Predictions	156	-

Dahl, Jaitly, Salakhutdinov

Multi-task Neural Networks for QSAR Predictions

First description of multi-task networks for drug discovery in the literature. Provides a short account of their application in the Merck Kaggle challenge of 2012.

2013

General

Lead Author	Group	Title	Citations	Code
Sheridan	`Sheridan`	Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction	84	-

Sheridan

Time-Split Cross-Validation as a Method for Estimating the Goodness of Prospective Prediction

Sheridan argues that random splitting of train and test sets results in too optimistic predictions, whereas scaffold-based splitting is too pessimistic. Time-validation splits are the most realistic split and corresponds to the data a model will face when deployed.

2012

General

Lead Author	Group	Title	Citations	Code
Bickerton	`Exscientia`	Quantifying the chemical beauty of drugs	420	Yes

Bickerton, Paolini, Besnard, Muresan, Hopkins

Quantifying the chemical beauty of drugs

Introduction of a metric to assess general drug-likeness based on modelling probability distributions for Lipinski's 5 paramters using a curated set of orally active pharmaceuticals.

Generative

Lead Author	Group	Title	Citations	Code
Besnard	`Exscientia`	Automated design of ligands to polypharmacological profiles	454	-

Besnard et al

Automated design of ligands to polypharmacological profiles

Multi-objective optimization using Bayesian methods. Generative design using priors derived from medicinal chemistry.

Predictive

Lead Author	Group	Title	Citations	Code
Montavon	`Lilienfeld`	Learning Invariant Representations of Molecules for Atomization Energy Prediction	85	Yes
Chen	`Voigt`	Comparison of Random Forest and Pipeline Pilot Naive Bayes in Prospective QSAR Predictions	60	Yes

Montavon, Hansen, Fazil, Rupp, Biegler, Ziehe, Tkatchenko, Lilienfeld, Muller

Learning Invariant Representations of Molecules for Atomization Energy Prediction

First example of the use of the Coulomb matrix for inferring quantum mechanical properties of the molecule.

Chen, Sheridan, Hornak, Voigt

Comparison of Random Forest and Pipeline Pilot Naïve Bayes in Prospective QSAR Predictions

These authors argue that, although Random Forest is computationally expensive, it often outperforms Naive Bayes significantly.

2010

General

Lead Author	Group	Title	Citations	Code
Rogers	`?`	Extended-connectivity fingerprints	1638	-

Rogers, Hahn

Extended-connectivity fingerprints

Fingerprints are one of the best featurizations for chemoinformatics and machine-learning tasks for chemistry. A key paper in the field.

1998

General

Lead Author	Group	Title	Citations	Code
Weininger	`Weininger`	SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules	2719	-

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
LICENSE		LICENSE
README.md		README.md

License

aced125/AI_in_Drug_Discovery_Progress

Folders and files

Latest commit

History

Repository files navigation

AI in Drug Discovery Progress

2019

General

Brown, Fiscato, Segler, Vaucher

Mater, Coote

Preuer, Klambauer, Rippmann, Hochreiter, Unterthiner

Krenn, Hase, Nigam, Friederich, Aspuru-Guzik

Generative

Zhavoronkov et al

Jensen

Predictive

Withnall, Lindelöf, Engkvist, Chen

Wang, Guo, Wang, Sun, Huang

Cortes-Ciriano, Bender

Lee, Yang, Bassyouni, Butler, Hou, Jenkinson, Price

Reaction Prediction and Retrosynthesis

Schwaller, Laino, Gaudin, Bolgar, Hunter, Bekas, Lee

Lee, Yang, Sresht, Bolgar, Hou, Klug-McLeod, Butler

2018

General

Chen, Engkvist, Wang, Olivecrona, Blaschke

Wu, Ramsundar, Feinberg, Gomes, Geniesse, Pappu, Leswing, Pande

Generative

Segler, Kogej, Tyrchan, Waller

Jin, Barzilay, Jaakkola

Popova, Isayev, Tropsha

Retrosynthesis and Reaction Prediction

Segler, Waller

2017

General

Goh, Hodas, Vishnu

Wallach, Heifets

Axen, Huang, Caceres, Gendelev, Roth, Keiser

Predictive

Gilmer, Schoenholz, Riley, Vinyals, Dahl

Altae-Tran, Ramsundar, Pappu, Pande

Goh, Siegel, Vishnu, Hodas, Baker

Kearnes, Goldman, Pande

Faber, Hutchison, Huang, Gilmer, Schoenholz, Dahl, Vinyals, Kearnes, Riley, Lilienfeld

Generative

Sanchez-Lengeling, Outeiral, Guimaraes, Aspuru-Guzik

Retrosynthesis and Reaction Prediction

Segler, Waller

2016

General

Predictive

Kearnes, McCloskey, Berndl, Pande, Riley

Lee, Brenner, Colwell

2015

Generative

Bombarelli, Wei, Duvenaud, Hernandez-Lobato, Sanchez-Lengeling, Sheberla, Aguilera-Iparraguirre, Hirzel, Adams, Aspuru-Guzik

Predictive

Duvenaud, Maclaurin, Aguilera-Iparraguirre, Gomez-Bombarelli, Aspuru-Guzik, Adams

Ma, Sheridan, Liaw, Dahl, Svetnik

Ramsundar, Kearnes, Riley

Wallach, Dzamba, Heifets

2014

Predictive

Dahl, Jaitly, Salakhutdinov

2013

General

Sheridan

2012

General

Bickerton, Paolini, Besnard, Muresan, Hopkins

Generative

Besnard et al

Predictive

Montavon, Hansen, Fazil, Rupp, Biegler, Ziehe, Tkatchenko, Lilienfeld, Muller

Chen, Sheridan, Hornak, Voigt

2010

General

Rogers, Hahn

1998