Converts chemistry containing RDF files stemming from Scifinder or Reaxys. A new addition is the support for Infochem's ICsynth RDFs.
It fixes missing molecule blocks by removing corresponding entries entirely and some potential small errors (remove certain empty lines, or use uppercase for certain tags)
The resulting fixed RDF file is saved, as well as being converted to a tab separated CSV file.
Structures in CSV are in SMILES format.
Other sources e.g. MarvinSketch or ChemDraw should work with these converted files but have not been thoroughly enough tested.
Because RDF files that contain a missing structure might throw errors in certain programs or even make them crash.
Examples are MarvinSketch or MarvinView. They sometimes are able to handle missing reaction structures, sometimes not.
In Knime, the Erlwood extenstion "Chemical Reaction File Reader" won't work at all.
Python >= V3.8.
Windows or Linux. MacOS not tested.
Type of installation shouldn't matter (Vanilla/Conda/Mamba/venv).
pip install rdf-fixer
If you downloaded/cloned the code:
pip install .
Yet another way, directly from the repository:
python -m pip install git+https://github.com/DocMinus/chem-rdf-fixer.git`
This is only if you want to run the .ipynb file from your browser:
conda install -c anaconda jupyter
from rdfmodule import rdf_fixer
To fix a single RDF file, or a whole folder containing multiple RDF files:
rdf_fixer.fix("file or directory name")
There is an optional flag (True/False), with default being True, creating csv files as well. To skip csv creation, set flag to False.
rdf_fixer.fix("file or directory name", flag=False)
convert_example.py "./filename.rdf"
for single file usage (with or without quotes)
convert_example.py /directory/
for RDF files in directory including all subdirectories
The testfiles folder contains three RDF files for a quick test; where e.g. the Scifinder one contains an erroneous (i.e. missing) structre.
Please note that copyright for the enclosed test data lies with the respective companies (see also License section).
The parsing is by no means perfect, though a best effort was made. Suggestions for changes are welcome, please submit an issue or do your own fork.
Converting the current function(s) into a class has also been abandoned, there is no point really, since it doesn't have to be persistent the way it is applied here.
See the "VERSIONS.md" readme file.
Independent of the code or whatever license, the test files provided are not to be included for further distribution other than ones initial testing.
The copyright for the data for these files lies with the providers (Deepmatter/Infochem, ACS, Elsevier Life Sciences IP Limited) and not with the author or anyone reusing/changing this code.
For the code section: Copyright (c) 2021-2024 DocMinus, MIT License (see also LICENSE file).
If you add a shout-out to your code, I don't mind!