Skip to content

Online solubility prediction tool (streamlit) that runs the top-performing ML model (AqSolPred).

Notifications You must be signed in to change notification settings

mcsorkun/AqSolPred-web

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About AqSolPred

AqSolPred is an highly accurate solubility prediction model that consists consensus of 3 ML algorithms (Neural Nets, Random Forest, and XGBoost). AqSolPred is developed using a quality-oriented data selection method described in [1] and trained on AqSolDB [2] largest publicly available aqueous solubility dataset.

AqSolPred showed a top-performance (0.348 LogS Mean Absolute Error) on Huuskonen benchmark dataset [3].

alt text

AqSolPred Web Version

Currently, web version is running on Streamlit Share (the related repository is inside streamlit folder). You can visit from the following URL: https://share.streamlit.io/mcsorkun/aqsolpred-web/main/streamlit/app.py

aqsolpred web version: 1.0s (lite version of v1.0 described in the paper with reduced RFs(n_estimators=200,max_depth=10) but the same performance)

If you are using the predictions from AqSolPred on your work, please cite these papers: [1, 2]

Special thanks: This web app is developed based on the tutorials and the template of DataProfessor's repository.

Note: Main folder was prepared for Heroku deployment, however it passes the Heroku slug size therefore it is not online on heroku.

PS: Check out dockerfile in gcloud folder to know how I installed conda + rdkit on google cloud platform.

Contact: Murat Cihan Sorkun

References

[1] Sorkun, M. C., Koelman, J.M.V.A. & Er, S. (2020). Pushing the limits of solubility prediction via quality-oriented data selection, Research Square, DOI: https://doi.org/10.21203/rs.3.rs-84771/v1.

[2] Sorkun, M. C., Khetan, A., & Er, S. (2019). AqSolDB, a curated reference set of aqueous solubility and 2D descriptors for a diverse set of compounds. Scientific data, 6(1), 1-8.

[3] Huuskonen, J. Estimation of aqueous solubility for a diverse set of organic compounds based on molecular topology. Journal of Chemical Informationand Computer Sciences 40, 773–777 (2000).