From 5fd97ee4564bcce8d8c46fdc08693fc5b6c03a66 Mon Sep 17 00:00:00 2001
From: George Batchkala <george.batchkala@gmail.com>
Date: Wed, 16 Sep 2020 17:58:12 +0100
Subject: [PATCH] add more links, fix typos

---
 README.md | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/README.md b/README.md
index 021b1d5..090a3cb 100644
--- a/README.md
+++ b/README.md
@@ -34,9 +34,9 @@ This work is mostly based of four papers:
 
 In this dissertation I aim to achieve three primary goals:
 
-1. Reproduce a subset of solubility-related prediction results from the MoleculeNet benchmarking paper;
-2. Improve upon the reproduced results; and
-3. Use uncertainty estimation methods with the best-performing models to get single prediction uncertainty estimates to evaluate and compare these methods.
+1. **Reproduce** a subset of solubility-related prediction results from the MoleculeNet benchmarking paper;
+2. **Improve** upon the reproduced results; and
+3. Use **uncertainty estimation** methods with the best-performing models to get single prediction uncertainty estimates to evaluate and compare these methods.
 
 ## Data
 
@@ -55,7 +55,7 @@ I use the following four models for the regression task of physicochemical prope
 
 ## Obtaining Confidence Intervals
 
-I obtaing per-prediction confidence intervals with:
+I obtained per-prediction confidence intervals with:
 
 - Gaussian Processes ([notes, chapter 7, section 7.2](https://github.com/ywteh/advml2020/blob/master/notes.pdf))
 - Bias-corrected Infinitesimal Jackknife estimate for Random Forests ([paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4286302/))
@@ -64,11 +64,11 @@ I obtaing per-prediction confidence intervals with:
 
 All the data preparation, experiments, and visualisations were done in Python.
 
-To convert molecules from their [SMILES](https://pubs.acs.org/doi/abs/10.1021/ci00057a005) string representations to either Molecular Descriptors or [Extended-Connectivity Fingerprints](https://pubs.acs.org/doi/10.1021/ci100050t), I used the open-source cheminformatics software, [RDKit](https://www.rdkit.org/).
+To convert molecules from their [SMILES](https://pubs.acs.org/doi/abs/10.1021/ci00057a005) string representations to either Molecular Descriptors or [Extended-Connectivity Fingerprints](https://pubs.acs.org/doi/10.1021/ci100050t), I used the open-source cheminformatics software, [RDKit](https://www.rdkit.org/) ([GitHub](https://github.com/rdkit/rdkit)).
 
-[Wu *et al.*](https://pubs.rsc.org/en/content/articlelanding/2018/SC/C7SC02664A#!divAbstract) suggest to use their Python library, [DeepChem](https://www.deepchem.io/), to reproduce the results. We decided not to use it, since the user API only gives high-level access to the user, while I wanted to have more control of the implementation. To have comparable results, I decided to use the tools which the DeepChem library is built on.
+[Wu *et al.*](https://pubs.rsc.org/en/content/articlelanding/2018/SC/C7SC02664A#!divAbstract) suggest to use their Python library, [DeepChem](https://www.deepchem.io/) ([GitHub](https://github.com/deepchem/deepchem)), to reproduce the results. We decided not to use it, since the user API only gives high-level access to the user, while I wanted to have more control of the implementation. To have comparable results, I decided to use the tools which the DeepChem library is built on.
 
-For most of the machine learning pipeline, I used Scikit-Learn ([article](https://www.jmlr.org/papers/v12/pedregosa11a.html), [GitHub](https://github.com/scikit-learn/scikit-learn)) for preprocessing, splitting, modelling, prediction, and validation. To obtain the confidence intervals for Random Forests, I used the forestci ([article](https://joss.theoj.org/papers/10.21105/joss.00124), [GitHub](https://github.com/scikit-learn-contrib/forest-confidence-interval)) extension for Scikit-Learn. The implementation of a custom Tanimoto (Jaccard) kernel for Gaussian Process Regression and all the following GP experiments were performed with [GPflow (article](http://jmlr.org/papers/v18/16-537.html), [GitHub)](https://github.com/GPflow/GPflow).
+For most of the machine learning pipeline, I used [Scikit-Learn](https://www.jmlr.org/papers/v12/pedregosa11a.html) ([GitHub](https://github.com/scikit-learn/scikit-learn)) for preprocessing, splitting, modelling, prediction, and validation. To obtain the confidence intervals for Random Forests, I used the [forestci](https://joss.theoj.org/papers/10.21105/joss.00124) ([GitHub](https://github.com/scikit-learn-contrib/forest-confidence-interval)) extension for Scikit-Learn. The implementation of a custom Tanimoto (Jaccard) kernel for Gaussian Process Regression and all the following GP experiments were performed with [GPflow](http://jmlr.org/papers/v18/16-537.html) ([GitHub](https://github.com/GPflow/GPflow)).
 
 # Set-up