ChemDataExtractor is a toolkit for extracting chemical information from the scientific literature.
- HTML, XML and PDF document readers
- Chemistry-aware natural language processing pipeline
- Chemical named entity recognition
- Rule-based parsing grammars for property and spectra extraction
- Table parser for extracting tabulated data
- Document processing to resolve data interdependencies
To install ChemDataExtractor, simply run:
pip install chemdataextractor
Or if you are an Anaconda user, run:
conda install -c chemdataextractor chemdataextractor
Alternatively, try one of the other installation options.
Full documentation is available at http://chemdataextractor.org/docs
ChemDataExtractor is licensed under the MIT license, a permissive, business-friendly license for open source software.