Stacking Gaussian Processes to Improve pKa Predictions in the SAMPL7 Challenge
Authors
- Robert M. Raddi
- Department of Chemistry, Temple University
- Vincent A. Voelz
- Department of Chemistry, Temple University
Prediction of relative free energies and macroscopic pKas for SAMPL6 and SAMPL7 small molecules using a standard Gaussian process regression as well as a deep Gaussian process regression.
GPR/: the code to perform standard and deep GPR, processing, analysis, etc.scripts_and_notebooks/: scripts that call onGPR/e.g., script to get features, analysis notebooksscripts_and_notebooks/compile_results.ipynb: master notebook for creating figures and tablesscripts_and_notebooks/database_info.ipynb: notebook for analyzing the databasescripts_and_notebooks/features.py: script for computing descriptorsscripts_and_notebooks/standardGPR.py: script for runningsklearn.GaussianProcessRegressorscripts_and_notebooks/runme_deepGP.py: script for running deep Gaussian process regression usingdeepGPy
Structures/: input smiles strings, input microtransitionsSubmissions/: SAMPL7 submission (only for the standard GP model)predictions/: directories separating results & prediction files for SAMPL6 and SAMPL7. Each consisting of relative free energies and macroscopic pKa values for each small molecule.predictions/SAMPL6_deepGP/: free energies and macro-pKa predictionspredictions/SAMPL6_stdGP/: free energies and macro-pKa predictionspredictions/SAMPL7_deepGP/: free energies and macro-pKa predictionspredictions/SAMPL7_stdGP/: free energies and macro-pKa predictionspredictions/RandomForestSAMPL6/: free energies and macro-pKa predictionspredictions/RandomForestSAMPL6_without_filter/: free energies and macro-pKa predictionspredictions/RandomForestSAMPL7/: free energies and macro-pKa predictionspredictions/RandomForestSAMPL7_without_filter/: free energies and macro-pKa predictions
pKaDatabase/: curated database from various sources.pKaDatabase/pKaDatabase.pkl: pickle file of database (loads with pandas) NOTE: feature calculations already performed and stored insidepd.read_pickle('pKaDatabase.pkl')pKaDatabase/Sulfonamides.pkl: pickle file of only sulfonamides database (loads with pandas)
tables/: LaTeX tables for free energies and macroscopic pKasfigures/: macroscopic pKa comparison between std GP and deep GP